Complete Public Reddit Comments Corpus (2007-2015)
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Repository of televised news, including (for many) captions and rough statistics for content
Unstructured dataset of open-source media articles
Dataset of 8 million annotated YouTube videos, including a variety of audio and visual features.
List of current and historical datasets related to consumer spending and income, including data broken down by various demographic measures and family size
List of all data collections available through the U.S. Department of Justice’s Bureau of Justice Statistics.
Dataset of internal newsletters from the Signals Intelligence Directorate of the U.S. National Security Administration (NSA), released from 2003-2012. Dataset is slowly being released in small batches.
Dataset of information derived from and related to one million contemporary songs, with more than 50 variables (including information on track metadata, social networks, and more)
API for current and historical flight information, including flight path, weather, aircraft type, airport details, connections, and more
API for current and (recent) historical flight information, including flight path, speed, aircraft type, airport details, and more