Pages

Friday, August 05, 2005

DOI and BibTeX

For the last few hours, for reasons that I will not get into, I have been trying to track down bibtex entries for papers. Usually if the paper has an ACM DL entry, there is a bibtex entry that one can web-scrape, but for many papers (especially IEEE publications), this doesn't work because IEEE doesn't have bibtex entries on their website (and it's harder to web-scrape them).

Most of the complication comes from the fact that often I have a title, and need to match it to an actual citation of some kind. Google Scholar is quite helpful in this regard, allowing me to search for a title and more often than not returning the ACM DL link to the paper (and BibTeX entry).

But the ACM doesn't have everything, and this is where DOI numbers come in. The Document Object Identifier is a unique identifier that maps to a document entity, analogous to the URL for a web page. Similarly to a web page, the actual location of the document can be hidden from the user, and changed easily by the publisher, allowing for both portability and the ability to integrate a variety of sources. There is even a proxy server that you can supply a DOI number to; it returns the web page of the publisher that currently maintains that document.

What would be very cool would be a DOI to BibTeX converter. Note that a BibTeX entry maps to a single document, like a DOI. DOIs of course address a smaller space, since they govern only published work. If publishers exported some standard format (XML?), then it would be a trivial matter to write such a thing. Right now, all you get is the web page, from which you either have to scrape a bibtex, or construct one by hand. Neither options scales or is particularly appealing.