TWITTER REPORT

TWITTER REPORT

Dataset contains external users as mentions and replies.

Input data: Twitter JSON all_filtered

Start time: Fri Jun 8 11:22:29 2018

Data Description

Import Summary

This is an overview of the import process that was used to create the dataset.

Twitter file(s) imported /Users/justinlittman/Data/usher/beltway_reporters/tweets_feb_to_march_2018/all_filtered.json
Twitter file format Twitter JSON
Dynamic meta-network? No, all tweets are in one meta-network

Import Data Statistics

This is an overview of the tweet activity in the dataset. The dataset contains only one meta-network and all tweets are analyzed.

Network Twitter JSON all_filtered
First tweet date 2006-04-16 21:19:49-04
Last tweet date 2018-03-31 23:58:57-04
Number of tweets 851296
Number of tweets with geotag 469
Number of tweets with URL 498399
Number of retweets 290609
Number of tweeters 2259
Number of verified tweeters 1278
Number of news agency tweeters 50
Number of mentions 146265
Number of distinct hashtags 27288
Number of distinct hashtags used more than once 19859
Number of distinct words 0
Number of distinct words used more than once 0
Number of distinct locations 256

The following links give more detailed statistics by category.

Tweet Statistics

All Tweeters Statistics

Verified Tweeter Statistics

News Agency Tweeter Statistics

Available Analyses

Click on an analysis below for detailed results.

Analysis of Tweeters - Super Spreaders

Analysis of Tweeters - Super Friends

Analysis of Tweeters - Other Influencers

Analysis of Tweeters - Attributes

Analysis of Hashtags

Analysis of Words

Analysis of Locations

Analysis of Tweets

Analysis of Tweets - Attributes

Research Notes

Paths in re-tweet networks tend to be small and hierarchical which makes many traditional social network measures empirically uninteresting.

Closeness is not calculated for twitter data because: a) most nodes are not reachable, and b) it is expensive to calculate given the size of the data.

Traditional betweenness is not calculated because: a) given the length of chains, most nodes have only a small value and it does not discriminate among nodes, and b) it is expensive to calculate given the size of the data.

Eigenvector centrality is not calculated because: a) there are rarely cases of mutual retweeting and so dense interconnected networks that would lead to high values, and b) it is expensive to calculate given the size of the data.

Verified actors have a plus(+) appended to their name. News sources have an asterisk(*) appended to their name.

Produced by ORA-NetScenes, a joint product of the CASOS center at Carnegie Mellon University and Netanomics