Putting data analysis in human terms

Alex Walter

By Alex Walter, Head of Client Services, Infegy

Google was mentioned more than once per second on Twitter during 2015. That’s nearly 43 million mentions in a single year!

Google is but one example of the variety of brands that Infegy had the opportunity to honor in our analysis, which was foundational to this year’s Best Global Brands report.

Infegy made use of unbridled access to the historical Twitter firehose, poring through every tweet mentioning BGB-considered brands over the specified period of analysis. This unlocked more than 4.2 billion pieces of brand-related content to be analyzed, spanning nearly every industry and corner of the globe.

With a hard-earned reputation as one of the industry’s preeminent providers of global brand valuation reports, Infegy is proud to partner with Interbrand on Best Global Brands 2016, performing analysis and providing expertise in two key areas:

  • Quantitative metrics, including post volume and number of unique authors
  • Linguistic analysis, including sentiment and emotions

While the numbers were immense, quantifying mentions was the easy portion of the analysis. Determining what those mentions actually say, and mean, was no small task, however.

Rather than using the antiquated approach of counting “like” and “don’t like” statements, we employed our powerful natural language processing algorithms to gain a much more detailed picture of how consumers are sharing their likes (and gripes) about brands across the Twittersphere.

“Sentiment accuracy is difficult!”

Thanks to endless grammatical variations, misspellings, slang, and other challenges, accurate automated analysis of natural language has long been considered one of the harder problems in computer science.

Traditional approaches to sentiment analysis tend to overpromise and under deliver. The designs falter when complicated wordplay is encountered or when contextual information is required to correctly assign sentiment to a phrase.

Consider the tweets below:
BGB_example_tweet1

 

This first example is simple, and most sentiment engines would correctly address this as a positive statement since the identifier of the “like” is apparent and the semantics are clear: subject (customer) verb (likes) object (brand).

The second example, below, isn’t so straightforward.

BGB_example _tweet2

Most systems will misinterpret this statement, seeing the word “bad”—or even the phrase “so bad”—and score the sentence as negative. Contextual understanding is critical for a system to approach human-level accuracy.

Infegy’s linguistic sentiment engine operates differently (read: more effectively) than others in the space.

Our approach to sentiment analysis aligns much more closely to typical human interpretation of text. While parsing, the system breaks down each sentence, identifying the subject, any modifiers, and grammatical structure for each. Rather than simply scoring the phrase as positive or negative through a static list of identifier words such as “bad” or “don’t like,” the system understands the context of each word, enabling dramatic improvements in performance.

In the first example above, the word “like” could be interpreted easily since it very clearly references the object of the sentence, in this case Hilton Hotels.

With our McDonald’s example, Infegy’s system understands that the phrase “so bad,” in this context, is actually a modifier to the earlier verb “craving.” In turn, we’re able to correctly score this sentence as strongly positive.

Bold claims, backed up

When validating a sentiment analysis system, the testing methodology is crucial.

The data source, cleanliness of language, methods employed, subject matter, and volume of data tested are all significant variables that can dramatically affect results.

For benchmarking, nearly two million rated user reviews from a major online retailer were collected during the validation process. Ensuring a wide range of subject matter, the reviews cover 81 categories, from consumer packaged goods to food to technology.

There are three key areas of focus when considering sentiment precision:

  • Precision/Accuracy: A measure of how often a sentiment rating is correct when scored.
  • Recall: A measure of how many documents containing sentiments are rated as sentimental, rather than neutral or unscored.
  • F1 Score: This is a combination of precision and recall. The F1 score is in a range of 0.0–1.0, where 1.0 is perfect. It is commonly used among experts and researchers in the linguistics and natural language processing fields to simply quantify the performance of such systems. The formula for calculating an F1 Score is F1 = 2 * (precision * recall) / (precision + recall).

To perform the test, the text from each of 1,912,958 unique consumer reviews was sent through Infegy Linguistics, with the system rating sentiment for each. Every review was scored by its author as positive or negative, and the sentiment rating was validated against the author’s.

So, how well did Infegy Linguistics perform? With an F1 score of 0.952: nearly perfect!

Humans express their feelings toward particular brands in complex and surprising ways. By challenging the industry standard of deriving sentiment statically from a list of word meanings, Infegy makes it possible for brands to truly hear what people are saying about them—not just what words they’re using. Researchers and engineers at Infegy expand the capabilities, possibilities, and utility of social listening every day. The opportunity and honor of working with Interbrand is not just an occasion to apply our technology, but to prove its reliability in a field that is full of hype.

Head of Client Services, Infegy