Warwick Analytics has been featured on Data Science Central discussing how ‘Data Scientists need Designer Labels Too’.
Overview
When we want to understand what people believe or perceive, we do it by analysing their communication either written or spoken. Let’s say we’re wanting to analyse voice of customer text data.
The classical way to approach this is text mining based on keywords and rules to drive topic analysis e.g. using TFIDF or some other kind of ‘vectorization’, and sentiment analysis of the opinion terms.
Thankfully this no longer needs to be the case thanks to the latest technologies. Imagine a world where AI-based labelling and machine learning for text is cheap and plentiful, where data scientists are not required to tune and drive models.
There are issues here. Firstly, what are we supposed to do with all the topics? If we build a word cloud how useful is that? If they use synonyms which aren’t in a dictionary, do we group these together in advance? We are essentially trying to second-guess and group terms, which might not match the intentions of the customers, or be different for different situations. Things for sentiment analysis are even more dissonant and we haven’t begun to explore the technical challenges with sarcasm, context, comparators and double negatives which all perform very poorly in such analyses.
So how else are we meant to analyse text data, apart from painfully compiling dictionaries and constant manual checking? Well, say hello to the wonderful world of labels. The labels being referred here are generated from machine learning i.e. by replicating human judgment based on a training sample of manually labelled data. The machine doesn’t need to be told keywords, it figures out common patterns which might be a lot more than single keywords, and might include where they are in the sentence and whether they are nouns or verbs, just as a human might.
Comments