What Does Twitter Think of Socialism?

Mitch Schiller
5 min readAug 5, 2021

--

NLP Sentiment Analysis from Hundreds of Thousands of Users

Is socialism when no iPhone? Authoritarian disaster? Or have people started to see through that propaganda and challenged their own worldviews?

The possibilities of a system outside of global capitalism have never felt quite so close or urgent, as we watched capitalism’s handling of the pandemic do exactly what it was intended to do: put profit and the free market above people’s lives and wellbeing. I wanted to see just how divided Twitter could be on the subject, typically reserved as a notorious cesspool of barely regulated mouthpieces clamoring for retweets and likes. Over the Trump years and going even further back, Twitter has become the defacto political podium as well, so it felt like the appropriate place to turn in terms of social listening.

Utilizing a technique in machine learning called Sentiment Analysis, we can assign each tweet into one of three buckets: positive, negative, or neutral. From there, with a simple conglomerative weighting score, we can assess whether socialism is leaning more positive or negative at this point in history. With all the misinformation flying around in regards to it, I was particularly interested in the results. If I could access Facebook’s data, I’m sure the story would be quite different.

First, it may be good to have a definition in mind before we see what Twitter thinks:

“Socialism is a political and economic theory of social organization which advocates that the means of production, distribution, and exchange should be owned or regulated by the community as a whole.”

Now that we have that out of the way, let’s move on to the actual analysis.

Coding Time: Necessary Packages

!pip3 install -qq twint
!pip install -qq whatthelang
!pip install snscrape
import twint
import snscrape.modules.twitter as sntwitter
import pandas as pd
import nest_asyncio
nest_asyncio.apply()

After importing the things we need, I was able to get a nice big file of tweets generated. I was aiming for a few million, but throughout the data cleaning process you will see a good proportion dropped out. Still, I believe I achieved a reasonable sample considering time and hardware restraints. The Twint package was a great find, since it doesn’t suffer from the same restrictive limits of the official API.

Script for gathering tweets:

# Creating list to append tweet data to
tweets_list = []
# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('socialism since:2020-01-01 until:2021-08-03').get_items()):
if i>5000000:
break
tweets_list.append([tweet.date, tweet.id, tweet.content, tweet.username])
# Creating a dataframe from the tweets list above
tweets_df = pd.DataFrame(tweets_list, columns=['Datetime', 'Tweet Id', 'Text', 'Username'])
tweets_df.head()

As you can see, I restricted the output to only dates after the end of 2019.

Now that we have a working set of tweets, it’s time to do some cleaning and tokenizing. This will allow us to more accurately understand sentiment when it gets to that stage. Some additional packages will be needed here.

import re
import string
#for text analysis
import nltk
from nltk.tokenize import sent_tokenize
from nltk.corpus import words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *
nltk.download('stopwords')
nltk.download('vader_lexicon')
# for visuals
from collections import Counter
from matplotlib import pyplot as plt
from matplotlib import ticker
import seaborn as sns
import plotly.express as px

Various cleaning tasks are needed, and we really only need Username, Date and the Text of the tweet. You can find the full code on my Github here.

In short, we cleaned the dataframe by:

  1. Removing URLs
  2. Removing mentions (@ )
  3. Lowercasing the entire Text column
  4. Removing punctuations
  5. Removing stopwords (both the stopwords package list and some customized ones relating to our subject)

After this, we could take a look at the most common words used with the following snippet:

word_counts = Counter(words_list).most_common(50)
words_df = pd.DataFrame(word_counts)
words_df.columns = ['word', 'freq']
words_df.head()
px.bar(words_df, x='word', y='freq', title="Most Common Words")

As we can see from the above, my initial guess would be that the sentiment ends up fairly positive. Words like “good”, “democratic”, and “social” seem rather positive, and it makes complete sense that “capitalism” comes up a lot when discussing socialism.

The final part in our analysis will be to assign a score for positivity, neutrality and negativity to each tweet, use a threshold/average to assign and overall label, and visualize the counts over time of these three.

Code for the sentiment assignment:

#assign sentiment
sid = SentimentIntensityAnalyzer()
ps = lambda x: sid.polarity_scores(x)
sentiment_scores = df.Text.apply(ps)
sentiment_scores
sentiment_df = pd.DataFrame(data=list(sentiment_scores))
#label as positive, negative or neutral
labelize = lambda x: 'neutral' if x==0 else('positive' if x>0 else 'negative')
sentiment_df['label'] = sentiment_df.compound.apply(labelize)
#join back to the original dataframe
data= df.join(sentiment_df.label)
#find counts
counts_df = data.label.value_counts().reset_index()
counts_df
#plot the counts
sns.barplot(x='index', y='label', data=counts_df)
Counts
Counts of Socialism Sentiments

As we can see, even with some of the very loud voices from reactionaries and fascists, socialism is growing in popularity and showing promising trends overall. Perhaps unsurprisingly, there was a huge spike in conversation relating to political ideologies after and around the 4th of July! Independence Day is becoming a divisive day as we battle as a country with our stated goals (not that these were overly righteous) versus the reality experienced by citizens. It’s empowering to see how many people have been discussing socialism over the past year or so.

Although a revolution may not start on Twitter, it is a very interesting place to peek into the hivemind of the world. I hope you enjoyed this little exercise!

--

--

No responses yet