Google abandons master-plan to archive the world's newspapers

Published May 19 2011, 05:29 PM by Carly Carioli

In an email today to publishers including the Boston Phoenix, Google told partners in its News Archive project that it would cease accepting, scanning, and indexing microfilm and other archival material from newspapers, and was instead focusing its energies on "newer projects that help the industry, such as Google One Pass, a platform that enables publishers to sell content and subscriptions directly from their own sites."

The five-year-old News Archive project was Google's attempt to do for old newspapers what Google Books has been attempting to do for the world's libraries. As part of the project, newspapers opened their morgues to Google, which promised to scan, index, and host the digital files it made from the archives. Google and the newspapers would then share revenue on the pageviews of those archives. Google says it eventually scanned 60 million pages, covering 250 years.

Was this cool? It was kind of cool. For instance, here's 21 articles about the Sex Pistols' final US concert in '78. And here's some fresh-off-the-press news from 1860.

Some newspapers complained that Google, after quickly scanning their archives, was slow to process the scans. The Phoenix sent Google a stash of archives covering several decades; some fraction of those have made their way online.

News Archive was generally a good deal for newspapers -- especially smaller ones like ours, who couldn't afford the tens or hundreds of thousands of dollars it would have cost to digitally scan and index our archives -- and a decent bet for Google. It threaded a loophole for newspapers, who, in putting pre-internet archives online, generally would have had to sort out tricky rights issues with freelancers -- but were thought to have escaped those obligations due to the method with which Google posted the archives. (Instead of posting the articles as pure text, Google posted searchable image files of the actual newspaper pages.) Google reportedly used its Maps technology to decipher the scrawl of ancient newsprint and microfilm; but newspapers are infamously more difficult to index than books, thanks to layout complexities such as columns and jumps, which require humans or intense algorithmic juju to decode. Here's two wild guesses: the process may have turned out to be harder than Google anticipated. Or it may have turned out that the resulting pages drew far fewer eyeballs than anyone expected.

In an email, Google said it would continue to support the existing archives it has scanned and indexed. It added, "We do not, however, plan to introduce any further features or functionality to the digitized news product."

It remains to be seen whether Google will complete the process of indexing the newspapers it has scanned. We'd guess not. Are we mad at that? Ehhh, not really. The deal Google struck with partner newspapers stipulated that, somewhere down the line, a paper could purchase Google's digital scans of its content for a fee. That fee is now being waived, and Google is not only giving publishers free access to the scanned files, but also the rights to publish them with other partners. In essence, Google just scanned a huge chunk of the newspaper industry's valuable long-tail content, and then handed it to the publishers. (It's been a couple of rough years. We'll take it.) Are any of us is in a position to exploit those resources without Google's help? Jeez, we sure hope so.

BROWSE: a few hundred issues of the Boston Phoenix from the 1980s and 1990s at Google News Archive