Repository of televised news, including (for many) captions and rough statistics for content
Unstructured dataset of open-source media articles
Dataset of 8 million annotated YouTube videos, including a variety of audio and visual features.
Transcripts from British speeches (1895 - 2015), categorized by date, speaker, party, and title
Dataset of internal newsletters from the Signals Intelligence Directorate of the U.S. National Security Administration (NSA), released from 2003-2012. Dataset is slowly being released in small batches.
Repository of speech data
Dataset of English speech (and accompanying demographic data about the speaker) using standardized elicitation paragraph