Santa Barbara Corpus of Spoken American English
Data from audio recordings of human interaction across various regions of the United States and including a variety of speakers and contexts
Data from audio recordings of human interaction across various regions of the United States and including a variety of speakers and contexts
Repository of archived data on child language acquisition and vocabulary growth from multiple languages, as measured by MacArthur-Bates Communicative Development Inventory (MB-CDI)
Various Twitter dataset collected for academic studies (largely focusing on news)
Dataset of timestamped tweets and corresponding demographic information about authors (i.e., gender and location)
List of lexicons for word-emotion, word-sentiment, and word-color associations derived from a variety of sources (including Amazon, Yelp, Amazon Mechanical Turk, and Twitter)
Crowdsourced dataset of associations between words and emotions and valence (in English and other languages), with some visualization tools
Repository of speech data
Repository of spoken and text corpora in multiple languages (including Arabic, English, German, Japanese, Mandarin, Spanish, and more)
Dataset of English speech (and accompanying demographic data about the speaker) using standardized elicitation paragraph