Text analysis in practice: Twitter Sentiment in the World Cup football

The Football World Cup in Russia offers great opportunities to be creative with data analysis. Nowadays it is unthinkable to disregard data in modern football. Top clubs already employ young laptop trainers who try to mantle the ultimate football machine using statistical analysis en lot’s of data as their weapon of choice.

All dry statistics aside: football still thrives largely on emotions. Especially during the WC football whole nations hold their breaths when their national pride battles for eternal glory. I also love to ride this emotional rollercoaster when our Clockwork Orange faces a tough opponent.

This presented a perfect opportunity to measure all these rollercoaster emotions. Sadly, the Dutch national football teams shines in absence. Therefore I’ve chosen to analyse the semi finals face off between Croatia and England. All is done from English perspective; as a Dutchman, my proficiency of the Croatian language is virtually non-existent.

How do you do this?

Twitter is a great way to find out what large groups of people feel and think about a certain topic. Every tweet contains positive and negative words. Do you count more positive words than negative ones? Then the tweet has a positive sentiment overall. The opposite is also very true.

This simpel method however has a drawback: sentimental values tend to shift by reading it within the context of other words. For example, irony, sarcasm or sayings can switch the polarity of the sentiment.

This drawback becomes relatively smaller by gathering lot’s of tweets. For this game a total of 268.000 tweets where downloaded. This amount could be enough to plot a reasonably reliable sentiment-curve.

How did it go? The facts

It turned out to be an epic match. The English rapidly scored a fifth minute leading goal, but the Croatians marked an equalizer in the 68th minute and pushed the game into extra time. There they dipped the English in mourning by scoring again: 2-1 final score, shattering England’s dream of new WC success since 1966.

How did it go? Twitter sentiment

The figure below is crystal clear. You can easily see the flow of events from English perspective. I summarized my the most striking observations

Before the match (19:00 – 20:00)

Before the match, English fans felt it was already decided: #Footballiscominghome dominated t Twitter for days. This positive sentiment is clearly seen in this time period.

“Got my red England shirt on not been washed smells of beer but who cares it’s coming home #ENGCRO”

National Anthems and kick-off (19:55 – 20:00)

The national anthems create a small spike of positive emotions. Many tweets reflect this sentiment:

“I half expected the English fans to start singing “FOOTBALLS COMING HOME” during the national anthem and I’m lowkey…#ENGCRO “

0-1 (20:08 – 20:45)

De beautiful free kick of Kieran Trippier causes extacy among English twitter fans. The big spike upwards around 20:08 hours is clearly seen.

“Yes TRIPPIER! You beauty! #BuryLad #ENGCRO #ThreeLions #ITSCOMINGHOME

1-1 and full time first 90 minutes (21:27-22:00)

Ivan Perisic heads in the 1-1 equalizer in the 65ste minute and this causes the most negative sentiment in the entire match. English tweets are not happy at all:

“Shit! Croatia equalise. That was coming, to be fair… #ENGCRO #WorldCup2018 England 1 – Croatia 1.”
“That’s bad. That’s very bad. #ENGCRO”

2-1 after extra time (22:24 – 22:34)

In extra time Mario Madzukic scores the liberating 2-1 for the Croatian side. It’s remarkable that the disappointment amongst Twittering England is much less negative than the 1-1

“So, it’s not coming home then?! U0001f62e #ENGCRO”
“WOW! England finally breaks! 2-1 Croatia! #WorldCup #ENGCRO”    

Aftermath (22:34 e.v)

Shortly after the final whistle, tweets cumulate to a very positive sentiment. A glance at these tweets shows that the whole nation is proud at what the Three Lions have achieved. Many fans act as real sportsmans and congratulate Croatia with their victory.

“As much as I wanted an English victory, has to be said the best team won…congratulations #Croatia good luck on Sunday. #WorldCup #ENGCRO”
“Proud of the #eng team, we outlasted some of the best international teams and have been a joy to watch.…  #ENGCRO”

Broader application

The data science techniques behind this analysis is suitable with many forms of unstructured tekst. Think about public information like fora, customer reviews or page content. Also private informatie like e-mail and correspondence is suited for the job. Do you want to try this technique for your business? Please feel free to contact me

Technical details

Behind this analysis is a technical step-by-step-plan and programming code in R. Please send me a message if you would like to learn more about this. I’d love to discuss this with you. 

Roeland van der Molen

Roeland van der Molen is managing consultant at Leissner & Van der Molen. As a legal and data professional he has developed himself the last ten years into an allround number cruncher. He helpt clients to get more grip on their data, facts and figures and build an effective strategy from this. ”Innovation starts from your base”.

2 gedachten over “Text analysis in practice: Twitter Sentiment in the World Cup football

  1. Klaudia Beantwoorden

    Dear Roeland,

    I would like to do the sentiment analysis for Premier League matches, but I am wondering where to get the train data from. I downloaded the tweets but have no idea how to add sentiment labels to the train set. Could you tell me where you got the labels for training the model from?

    I would be very grateful for any hint or advice.

    Kind regards,

  2. Marijne Kramer Beantwoorden

    Hi Roeland,

    Cool onderwerp, ziet er gaaf uit wat je met Twitter hebt gedaan. Voor mijn scriptie aan de TU Delft ben ik eigenlijk heel kwalitatief bezig met interviews en het analyseren van media en publicaties, en ik zou hier graag een wat kwantitatiever element aan toevoegen en moest aan Twitter analyse denken!

    Mijn scriptie gaat namelijk over het vormen van beleid omtrent Airbnb in verschillende Europese steden. Eerst was er niets, en hoe zijn ze er toen toegekomen bepaalde regels te vormen en waarom verschilt dat proces in al die steden?
    Ik zie het gebruik van Twitter als volgt voor me:
    – Benadrukken wat belangrijke, invloedrijke momenten waren (wanneer was Airbnb bijvoorbeeld een erg besproken onderwerp op Twitter?)
    – Uitzoeken wanneer het sentiment rondom Airbnb veranderde. Ik weet ongeveer wanneer dat plaatsvond, maar zou het met twitter analyses wat verder willen aanstippen.

    Ik ben dan ook op zoek naar afbakening van regio: Ik wil alleen Airbnb in Amsterdam bijvoorbeeld onderzoeken, en dan voor een bepaalde periode (ook historisch). Voor het sentiment gedeelte heb ik dus een definitie nodig van welke woorden als positief en negatief gemarkeerd kunnen worden. Deze ‘woordenboeken’ zijn wel voor Engels te vinden, maar bestaan ze ook in het Nederlands online?

    Ik hoop heel erg dat je me op weg kunt helpen met goede links, tips, om dit efficiënt en effectief uit te kunnen voeren. Dankjewel!

    Hartelijke groeten,

    Marijne Kramer

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

I agree to these terms.