[ un ] theoretical

Notes Research About

Some Insights On Twitter's IRA Botnet

Contents

  1. Wordclouds and Hashtags
  2. Graph Topology

Summary

I’ve been spending some time going through the Twitter’s data dump of the Russian Internet Research Agency (IRA). I read a few articles about this - lots of bar charts and time series. But I’m a visual guy and was more interested in the botnet from a graph-theoretic point of view. Other than just looking at the frequency of tweets over time, I thought I’d build some wordclouds from the user profile descriptions and hashtags. For the purposes of this, I only considered accounts listing English as their language.

Code can be found [ on github ]

Words and Hashtags

Below are hashtags included in each tweet - 442754 hashtags.

Below is a wordcloud of hashtags pulled from twitter profile descriptions - around 1000 hashtags from unique profiles.

alt text

But I did look at the frequency of tweets from Russian and English (US) accounts. I'm going to take the word of other journalists that the spikes in tweets in Russian are related to Crimea.

alt text

Graph Topology

To follow up, I'm just adding some network visualization that I was really interested in doing. The below images show a network of Twitter accounts related by retweets - IRA account nodes are listed in green while non-IRA accounts are red. It's interesting to see that there are a ton of inactive accounts represented as isolated nodes, whereas others are strongly connected. Unfortunately, the images don't really do justice to the size of the network - its much more immersive an interactive visual tool. The node size is related to the degree of connectedness - more edges -> bigger node size.

Here is an overview of the entire network after some clustering.

Twitter bots with very little activity on the outer ring

One cluster that I found particularly interesting.

One strongly connected node.

It goes without saying that there is a whole lot more that could go into this. For one, I didn't take into account multiple instances of edges between accounts - or in other words, an account retweeting multiple times. But I also think that there is a wealth of information to be explored if given the time.