TwitterWorldCup2014 dataset

Dataset Information

We collect tweet samples using the Twitter Streaming API for the World Cup 2014 season (June 12–July 13). We filter the tweets by popular/official World Cup hashtags, such as #worldcup, #fifa, #brazil, etc. We construct entity-entity co-mention temporal graphs on 5 minute sample rate (8640 time points). The ground truth contains the goals, penalties, and injuries in all the matches that involve at least one of the renowned teams (specifically, at least one of Brazil, Germany, Argentina, Netherlands, Spain, France).

Source (citation)

Less is More: Building Selective Anomaly Ensemble with Application to Event Detection in Temporal Graphs. Shebuti Rayana, Leman Akoglu, SIAM SDM, Vancouver, BC, Canada, April 2015

Less is More: Building Selective Anomaly Ensemble. Shebuti Rayana, Leman Akoglu, Transactions on Knowledge Discovery from Data (TKDD), May, 2016


Files: TwitterWorldCup2014, GroundTruth

Description: The dataset contains three columns, the first column is the timestamp (date) and other two columns contain two entity names co-mentioned at that timestamp. GroundTruth contains dates for major world incidents with description.