Complete Public Reddit Comments Corpus (2007-2015)
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Repository of televised news, including (for many) captions and rough statistics for content
Unstructured dataset of open-source media articles
Dataset of 8 million annotated YouTube videos, including a variety of audio and visual features.
Blog post from Stanford’s Computational Journalism Lab that includes a list of freely available data sources
Repository of datasets from Yahoo (e.g., search, image, news)
Curated repository of UK-related digital data
List of datasets from various government agencies and initiatives
Repository of data released by San Francisco city and county
Lightly curated repository of self-stored UK-related digital data