Data Resources: Repositories and Lists

Here you'll find a list of all of the data resources that we classify as repositories or lists. A repository contains multiple separate datasets in a single interface, while a list is simply a series of links to repositories or individual datasets. (More information on our taxonomy of data resources can be found on our About Us page.)

A subset of useful information about each resource is included on this page, but more information (including links, example publications, and how to gain access to it) can be found by clicking on the resource's name.

Title Type Description Applicable Fields
The Language Goldmine List

List of language-related corpora and databases across multiple languages

linguistics, communication, learning, language
TV News Archives Repository

Repository of televised news, including (for many) captions and rough statistics for content

affect, affective contagion, argumentation, attention, behavioral contagion, cognitive psychology, collaboration, communication, competition, controversy, culture, debate, dynamical systems, emotion, expertise, group behavior, group identity, human interaction, individual differences, informatics, interpersonal relationships, language, language use, network analysis, networks, persuasion, political psychology, political science, pragmatics, psychology, public policy, rhetoric, sentiment, social network analysis, social psychology, social sciences, social trends, event recognition, event segmentation, computer vision, gesture, crime, gaze, language production, language use, law, nonverbal communication, object recognition, perception, speech, vision, visual and object recognition, visual attention
U.S. City Open Data Census List

List of freely available datasets with up to 18 public metrics of cities within the U.S. (e.g., crime, zoning, health inspections, transit)

public policy, behavior trends, behavioral contagion, decision making, cultural trends, law, political psychology, economics, search, imitation, network analysis, education, consumer behavior
U.S. Department of Labor: Consumer Expenditure Survey List

List of current and historical datasets related to consumer spending and income, including data broken down by various demographic measures and family size

aging, behavior trends, behavior change, childhood development, consumer behavior, cultural trends, decision making, equality, gender, group behavior, health, political science, race, social trends
UCI Machine Learning Repository Repository

Various datasets to assist in machine learning

classification, decision making, psychology, information science
UK Data Archive Repository

Curated repository of UK-related digital data

humanities, social sciences, economics
Wikipedia Repository

Data and reports released by Wikipedia

Wikipedia: List of online music databases List

List of music-related databases

tagging, categorization, social trends, expertise, search, imitation, exploration
Word Association Lexicons List

List of lexicons for word-emotion, word-sentiment, and word-color associations derived from a variety of sources (including Amazon, Yelp, Amazon Mechanical Turk, and Twitter)

language use, emotion, sentiment, categorization, tagging, communication, linguistic variation, linguistics, consumer behavior, behavior trends, decision making
Wordbank Repository

Repository of archived data on child language acquisition and vocabulary growth from multiple languages, as measured by MacArthur-Bates Communicative Development Inventory (MB-CDI)

childhood development, language acquisition, language, linguistic variation, cross-cultural analysis, gender