Complete Public Reddit Comments Corpus (2007-2015)
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Repository of televised news, including (for many) captions and rough statistics for content
Unstructured dataset of open-source media articles
List of datasets used to study opinion mining, sentiment analysis, and opinion spam detection
Data from U.S. presidential speeches (1789 - 2010), including transcript, audio, and/or video (available modalities vary by speech)
Data from public speeches, including transcript, audio, and/or video (varies by speech)
Dataset of internal newsletters from the Signals Intelligence Directorate of the U.S. National Security Administration (NSA), released from 2003-2012. Dataset is slowly being released in small batches.
Dataset on more than 1800 U.S. criminal conviction exonerations (beginning in 1989), including information on the individual exoneree and their case
Dataset examining how European Union member states’ gender equality policies impact a number of areas (e.g., health, economics) from 2005-2015
Dataset with approximately 30 years' worth of information about companies and trusts in 10 offshore countries, including officer information and more