The Enron Dataset

In 2000, Enron was one of the largest and companies in the world, praised far and wide for its innovations in energy distribution and many other markets.  By 2002, it was apparent that many bad apples had been cooking the books, and billions of dollars and thousands of jobs disappeared.  

In the aftermath, surprisingly, one of the greatest datasets in all of machine learning was born--the Enron emails corpus.  Hundreds of thousands of emails amongst top executives were made public; there's no realistic chance any dataset like this will ever be made public again.  

But the dataset that was released has gone on to immortality, serving as the basis for a huge variety of advances in machine learning and other fields. 

Link: MIT Technology Review: The Immortal Life of the Enron E-mails