Search me: what Mad Men and brave moles can do for historical records

August 15th, 2012 by

Ever since we began helping the National Library of Finland correct mistakes in its old newspaper archive, I have noticed myself developing a slightly anti social interest in historical texts. I say ‘anti social’ because of its effect on conversation: what I have found is that while most people claim to be interested in history, the best way to get unwanted guests to leave your house after a dinner party is to start discussing the technical challenges presented by digitizing historical records.

To overcome this problem in the Digitalkoot project, we sacrificed thousands of cute moles: if volunteers failed to enter the correct words the poor moles fell to their digital deaths. (While we felt bad about manipulating people’s love for cartoon animals, we can still sleep at night because it was for a good cause.)

How to get people interested in history (without hurting animals)

This experience is the reason I admire historian Ben Schmidt’s recent success in getting people enthusiastic about history and digitization.

What Schmidt has ingeniously done is take two of today’s most popular TV shows – Mad Men and Downton Abbey – and check their dialogue for historical accuracy against millions of texts published over the last few hundred years. (For example, apparently even the phrase “I need to”, so common in Mad Men, is not something that people said in the 1960s).

This study was made possible with Google’s Ngram viewer. As we discussed about a year ago, Ngram allows you to chart how many times a different combination of words or letters has appeared in Google Books’ huge corpus of 5 million texts published between 1500 and 2008.

Along with the many fascinating results, the great thing about this project is that it has managed to get the media interested in Ngram again. My own Ngram-like analysis using just normal old Google Search found that after some initial excitement in 2010 and 2011, almost everyone had forgotten about Ngram until Schmidt found a way to include it in the same sentence as Don Draper.

Don’t forget about the crowd

The reason I bring this up is that I am disappointed with the lack of interest the world’s institutions have shown in the technology we now possess to preserve, analyse and search historical records. (Even Google Books seems to have lost momentum in its efforts digitize the world’s texts.)

While OCR technology still makes too many mistakes when digitizing texts, we know that people are actually willing to give their time to correct them, assuming we can find a way to include cute animals or handsome Mad Men in the tasks.

As we found with Digitalkoot, the gains from such projects actually go beyond the statistical results: not only have hundreds of thousands of people freely volunteered to help, it has also been a great way to get them interested in the historical records themselves. The only downside of this is that bringing up the digitization of records is no longer a reliable way to make the last dinner guests leave my house.