Library of Congress drinks from Twitter firehose

When he was in Cambridge to give the second Arcadia Lecture recently, Dan Cohen mentioned that Twitter had agreed to give the Library of Congress its archive of tweets. Here’s the NYT report of that decision.

“Twitter is tens of millions of active users. There is no archive with tens of millions of diaries,” said Daniel J. Cohen, an associate professor of history at George Mason University and co-author of a 2006 book, “Digital History.” What’s more, he said, “Twitter is of the moment; it’s where people are the most honest.”

Last month, Twitter announced that it would donate its archive of public messages to the Library of Congress, and supply it with continuous updates.

Several historians said the bequest had tremendous potential. “My initial reaction was, ‘When you look at it Tweet by Tweet, it looks like junk,’ said Amy Murrell Taylor, an associate professor of history at the State University of New York, Albany. “But it could be really valuable if looked through collectively.”

Ms. Taylor is working on a book about slave runaways during the Civil War; the project involves mountains of paper documents. “I don’t have a search engine to sift through it,” she said.

The Twitter archive, which was “born digital,” as archivists say, will be easily searchable by machine — unlike family letters and diaries gathering dust in attics.

As a written record, Tweets are very close to the originating thoughts. “Most of our sources are written after the fact, mediated by memory — sometimes false memory,” Ms. Taylor said. “And newspapers are mediated by editors. Tweets take you right into the moment in a way that no other sources do. That’s what is so exciting.”

Twitter messages preserve witness accounts of an extraordinary variety of events all over the planet. “In the past, some people were able on site to write about, or sketch, as a witness to an event like the hanging of John Brown,” said William G. Thomas III, a professor of history at the University of Nebraska-Lincoln. “But that’s a very rare, exceptional historical record.”

Ten billion Twitter messages take up little storage space: about five terabytes of data. (A two-terabyte hard drive can be found for less than $150.)