A library bigger than any building
By Giles Turnbull
July 31, 2007
An ambitious project to create an online catalogue of every book in every language ever published is underway. Public goodwill is not in doubt, but some libraries remain to be convinced.
A few years ago, the idea of getting random people around the world to write their own encyclopaedia would have been madness – but that didn’t stop the founders of Wikipedia doing just that, and it has turned out to be one of the most successful web projects of recent years.
With that in mind, does it sound mad to want to try and build an online catalogue of every book ever published, anywhere in the world? The Open Library, newly launched in the USA but global in scope, is designed to make that happen. In the words of its creators, the idea is to build a virtual library that stores details of not just “every book on sale, or every important book, or even every book in English; but simply every book.” Which would include The Curious Incident of the Dog in the Night-time, The Koran , the full text of The Adventures of Huckleberry Finn, and of course Harry Potter .
But what’s the Open Library really for? Aaron Swartz, leader of the technical team working on Open Library, suggests that every book ever published needs a single authoritative page on the internet, a bit like a personal homepage. “Right now, if you want to link to a book on the web, the main place people go is Amazon. It’s kind of a bad idea for one commercial site to be the definitive source for book information on the internet, so we want to have a site that brings together information from commercial publishers, reviewers, users, libraries, everywhere. This site will become the place where you can find interesting books and information about them, whether they’re in print, out of print, out of copyright or whatever.”
Such a library has to be virtual. No building would ever be large enough to house all books; no single group or government could afford to build it, or employ the necessary staff. If the Open Library is to succeed, it has to be a virtual space, and open to everyone, Wikipedia-style. “There are tons of books out there and tons of information about those books. There’s no way even a large group of librarians is going to be able to collect it all. We think of it as an analogue to Wikipedia. There are some great encyclopaedias written by small groups of experts, but to get something as wide-ranging and varied as Wikipedia, you need to let everyone in.”
To start things off, the Open Library is calling on other libraries to donate their catalogues. This alone presents huge technical challenges, since the data sets come in different formats and different languages, and each set comes with its own quirks, repetitions and errors. What’s important is keeping the data in a structured form, so that the database working behind the scenes knows the difference between an author, a title and a publisher. “We had to build this new type of wiki software which was an exciting challenge, because you had to set it up so that instead of just having one kind of page people can edit, we have lots of different kinds. People can edit authors, they can edit books, they can edit text pages, and so on. So there’s a lot of new stuff we had to build. And that’s just the infrastructure – there were also lots of things to import, and book data to merge and make searchable.”
An Open Library page is meant to be as comprehensive as possible. There are data fields for every possible bit of information that could exist about each published work. If copyright allows, there will a copy of the book to download, or links to copies of it elsewhere (such as the Gutenberg Project to digitise cultural works).
For the time being, funding comes from the Internet Archive, another non-profit project that has the simple aim of keeping copies of the internet for the benefit of generations to come. But in future, the Open Library will depend on donations and taking a cut of any book sales it hands over to the big online booksellers. Income will matter more in the face of commercial competition. The Google Books Library Project, part of the larger Google Book Search service, has broadly similar aims. The Google Book Search Library Project sets out “to work with publishers and libraries to create a comprehensive, searchable, virtual card catalogue of all books in all languages that helps users discover new books and publishers discover new readers.” Naturally, Google has its own commercial interests to protect and invest in. The Open Library’s approach is the opposite, committed as it is to the ultimate in freedom of information acts: not only can anyone browse, search, and read the books in its catalogue–they can re-write the catalogue itself as they go.
But while the rise of Wikipedia proves there is no shortage of enthusiasm among the public to build informative sites for general consumption, not all libraries are signed up to the Open Library ethos, including the British Library.
Stephen Bury, head of European and American Collections at the British Library in London, has some reservations about contributing to the Open Library project.
“In the short term, I don’t think we will send them a copy of our catalogue. We only have limited resources and we need them to concentrate their efforts on our own digitisation projects,” he says. “We have always supported digitisation, and the more the merrier. But there’s some scepticism as to whether one day the Open Library might become a commercial site with adverts and so on.”
Mr Bury was not keen on the idea of allowing ordinary people to edit library catalogues themselves. “I think there’s a need for balance and some degree of control. You might get people maliciously changing things.”
See also: The Open Library