Complete Public Reddit Comments Corpus (2007-2015)
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Complete dataset of public comments posted to Reddit (http://www.reddit.com) comments from October 2007 to May 2015.
Repository of televised news, including (for many) captions and rough statistics for content
Unstructured dataset of open-source media articles
Dataset of 8 million annotated YouTube videos, including a variety of audio and visual features.
List of datasets and code used for a variety of automatic classifications, including team behavior, consumer behavior, and face detection
Dataset of over 13,000 images of faces (labeled with names) taken from the internet, including over 1,600 people with multiple pictures
Datasets for training affect recognition and for perception studies
List of datasets useful for deep learning and categorization, including datasets of faces, speech, text, and other images
Various datasets to assist in machine learning
List of numerous open-government initiatives, including many that make data available