NYTNews dataset

Dataset Information

New York Times News corpus contains all of the published articles in New York Times over 7.5 years (Jan 2000–July 2007) (available from https://catalog.ldc.upenn.edu/ LDC2008T19). The named entities (people, places, organizations) are hand-annotated by human editors. We construct weekly temporal graphs (390 time points) in which each node corresponds to a named entity and edges depict co-mention relations in the articles. The data contains around 320, 000 entities, however no ground truth events.

Source (citation)

An Ensemble Approach for Event Detection and Characterization in Dynamic Graphs. Shebuti Rayana, Leman Akoglu. ACM SIGKDD 2nd Workshop on Outlier Detection and Description (ODD^2@KDD 2014), New York, NY, USA, Aug. 2014.

Less is More: Building Selective Anomaly Ensemble with Application to Event Detection in Temporal Graphs. Shebuti Rayana, Leman Akoglu, SIAM SDM, Vancouver, BC, Canada, April 2015

Less is More: Building Selective Anomaly Ensemble. Shebuti Rayana, Leman Akoglu, Transactions on Knowledge Discovery from Data (TKDD), May, 2016


Files: To access the data please email at srayana@cs.stonybrook.edu