Saturday, July 24, 2010

An unresolved problem faced by all technology writers is what to do with creative capitalization. When you want to lead off a sentence with a word like "iPad" or "eBook", how do you capitalize it? Do you go with "Ipad" and "Ebook"? Or perhaps "IPad" and "EBook"? Do you stay with "iPad" and "eBook" and consider them to be capitalized versions of "ipad" and "ebook"? Horror of horrors – you could put in a dash. Maybe you just finesse the issue by changing your sentence around to avoid having a camelCased word leading off the sentence. Even then, you have the problem of what to do if the word is in the title of your article, for which you probably use Title Case unless you're a cataloging librarian, in which case you use Sentence case, not that the problem goes away! If you're using an iPhone, you know it has a mind of its own about the first letter of your email address being capitalized.

The practice of capitalizing titles presents issues particularly when the titles are transported into new contexts, for example via an RSS feed or search engine harvest. ALL CAPS MIGHT LOOK OK AS A <TITLE> ON YOUR WEB PAGE, but a search engine might hesitate to scream at people.

This is not by any means a new problem, but it's one that changes from era to era because of the symbiotic relationship between language and printing technology. Here's what Charles Coffin Jewett wrote in 1853 when discussing how libraries should record book titles:

The use of both upper-case and lower-case letters
in a title-page, is for the most part a matter of the printer's taste,
and does not generally indicate the author's purpose. To copy them in a
catalogue with literal exactness would be exceedingly difficult, and of
no practical benefit. In those parts of the title-page which are
printed wholly in capitals, initials are undistinguished. It would be
unsightly and undesirable to distinguish the initials where the
printer had done so, and omit them where he had used a form of letter
which prohibited his distinguishing them. It would teach nothing to
copy from the book the initial capitals in one part of the title, and
allow the cataloguer to supply them in other parts.

The standard practice of libraries in English speaking countries has been to record book titles in Sentence case, in which the first word of the title is capitalized and the rest of the words are capitalized it only if the language demands it (unless the first word is an article like "A", then the second word is also capitalized). The argument for this is that this capitalization style allows for the most meaning to be transmitted; a reader can tell which words of a title are proper names or other words that are capitalized. Which begs two questions: Why are libraries alone in presenting titles this way? Why do libraries persist in this practice when no one in recorded history has ever asked for sentence case titles?

In German and other languages, nouns are capitalized; this used to be true of English (take a look at the US Constitution). In German, it's easy to tell nouns from verbs, which might be very useful if we still had it in English. Still, I enjoy being able to write that something is A Good Thing. It gives me a way to intone my text with an extra bit of information.

The rules for how English should be capitalized have become quite complicated. Here and here are two web pages I found devoted to collecting capitalization rules. Some of them are pretty arcane.

It's fun to speculate on the future of capitalization. In the late 19th century, there was a fashion to simplify spelling, grammar and capitalization, led by people like Melvil Dewey. I'm guessing part of the reason was the annoyance of needing to press a shift key on those newfangled typewriters. But spelling and capitalization reform didn't get very far. Perhaps they tried to publish articles and got stopped in their tracks by a unified front of copy editors.

If anything, the current trend is in the direction of making capitalization even more idiosyncratic. In addition to a proliferation of Product names like iPod and eBay that have crossed over into the language mainstream, the shift from print to electronic distribution of text does a better job of preserving the capitalization chosen by the author, thus allowing it to better transmit additional meaning.

The ability to increase the information density in text is useful in a wide range of situations, for example, when you have only 140 characters to work with, or when you want a meaningful function name, like toUpperCase(). If your family name is McDonald, you probably have strong feelings on the issue.

My guess is that life will become increasingly case sensitive. You may already be aware that it takes 8 seconds, not one, to transmit a 1 GB file over a 1 Gb/s link. And that SI unit Mg is a billion times the mass of a mg. If you are a Java programmer, If you know the difference between an integer and an Integer, you'll quickly learn about NullPointerExceptions.

The shift from ascii to Unicode has made it much easier to cling to language specific capitalization rules. Did you know that there are a small number of characters that are different in "upper case" than in title case? They are: Letter DZ, LETTER DZ WITH CARON, LETTER LJ, LETTER NJ, and LETTER DZ. The lower case versions are ǳ, ǆ, ǉ, and ǌ; the upper case versions are Ǳ, Ǆ, Ǉ, Ǌ, and the title case versions are ǲ, ǅ, ǈ, and ǋﬁ, ﬂ, ﬃ, ﬄ, ﬅ, ﬆ. And don't forget your Armenian ligatures, ﬓ, ﬔ, ﬕ, ﬖ, ﬗ. For this reason, being "case insensitive" is poorly defined- two strings that are equal when you've changed both to uppercase are not necessarily equal after you've changed them to lower case!

So what do I do when I write about ebooks I don't use a dash? When the word appears in a title, I capitalize the "B". I can't wait till they translate this rule into Armenian.

Wednesday, July 21, 2010

The cost of a book is more than what you pay for it. If you intend to keep it, as libraries do, you need to pay for a shelf to put it on, some space in a building to put the shelf in, air-conditioning for the building so the book doesn't rot away, and some labor and assorted administration expenses to make sure you know where it is.

Similarly, the indirect cost of an electronic resource, whether it's a piece of software, an ebook, or a database can also be significant. You don't need a shelf or a building, which can save you a lot of money, but you may need some computing hardware or some space on a hard drive. Usually, there's a license, and it's easy to overlook the cost of managing it. The cost of license management can be substantial in institutions, even if it's negligible for individuals. Institutions need to minimize liability exposure and monitor compliance. At the very least they need to be aware of what licenses they've agreed to.

When my startup began providing services to libraries, I was very grateful to discover a resource called the CLIR/DLF Model license. This had been produced through discussions between university libraries and academic publishers, and by copying and adapting the model license, I saved money on legal fees. I'm sure that the libraries I dealt with also saved, because my license agreement looked just like many others they had seen. Lawyers are expensive, so a new or unusual license can rapidly generate legal bills comparable to the license fee itself.

I am amazed how many publishers that deal with institutions fail to understand the expense burdens (and thus, barriers to sale) that license agreements can put on potential customers. To pick a trivial example, state universities often have to abide by state laws that prevent them from agreeing to choice of law provisions that lawyers put in contracts by default. The time spent going back and forth on things like this benefits no one. Tacked-on provisions almost never add value and almost always causes expense.

Using a standard license is of particular importance when the license seeks to accomplish something creative. Such is the case with the licenses drafted by the Free Software Foundation (FSF). These licenses, the GNU General Public Licenses, seek to use copyright law to enforce "copyleft". When you release software with copyleft, the recipient is free to redistribute the software so long as they use the same copyleft license to do so. In the ideology of the FSF, this allows for freedom for users of the software. It's ok for you to charge for the software, or for using the software, but it's not ok to prevent users of your code from improving it and passing along those improvements for others to improve again. Non-copyleft licenses are less restrictive. The "Apache" license, for example, only requires the licensee to release the licensor from any liability or warranty, respect the trademark of the licensor and to retain the attribution and license in any redistribution.

There are currently three flavors of GPL licenses, each with its own set of restrictions. The least restrictive of these is the " GNU Lesser General Public License" (LGPL). This license allows the licensee to redistribute the software as a component in larger works, including proprietary software, as long as any alterations inside the component are released under LGPL. The GNU General Public License, or GPL, is more restrictive; you're not allowed to use a GPL work as a component in proprietary software, as any software linked to the work must also use GPL. Both the LGPL and GPL have been around long enough that any lawyer who works with software will be familiar with their benefits, drawbacks and limitations.

The Affero License

A newer FSF license is even more restrictive than GPL. The GNU Affero General Public License, or AGPL, requires licensees to release the source code for any improvements if they are used to deliver networked services. The AGPL is designed to facilitate the release of software that works "in the cloud", for example, in a "Software as a Service" configuration.

To understand the benefits of AGPL, suppose that you have developed software that does optical character recognition (OCR). Your business deploys the software over the network- images come in and text goes out. If you released the software under GPL, your competitors would be entitled to use your code, add their improvements and use them to deliver a competing service, without having to release those improvements. Since they don't redistribute the software, GPL's copyleft provisions don't kick in. Under AGPL, they could still run a competing service, but would have to release their improvements. You could use those improvements in your own service. Both you and your competitor can benefit from the resulting virtuous cycle of improvement.

These benefits do not come without cost, however. Compared to GPL, the Affero license is relatively new and untested; this can result in increased costs for licensees, and the license may prove to be unenforceable in practice. The added restrictions of AGPL put additional burdens on licensees who use the software to deliver network services. This is to your benefit if you want to pursue a dual licensing strategy such as the one pursued with such success by MySQL. The developers of MongoDB, for example, have chosen to use AGPL as part of this strategy.

Using AGPL may not be the best choice, however, if you hope to soon attract a diverse community of deployers-developers. Few corporate development processes have been designed with the requirements of code release simultaneous with deployment in mind. When I released software as open source, there would always be a bit of "sanitization" before final release. My pre-sanitized source code comments were always too rude and too meager to show in public. An AGPL-aware process would have resulted in better code, but perhaps less sleep. Familiarity with AGPL and better support in integrated development environments is likely to improve corporate acceptance of AGPL over the next few years.

Another application where AGPL will be inappropriate is software that needs to integrate proprietary components. Enterprise "portal" software will often need to hook into a variety of systems many of which have non-public APIs. GPL software can be used with care in this situation, because copyleft provisions are only triggered if derivative works are distributed.

The recent controversy surrounding the "Thesis" theme for WordPress is illustrative of the boundaries of GPL copyleft. Usually, when you use some GPL software as a platform, you don't expect that works using the platform would, by themselves, trigger GPL's copyleft. For example, if you change a presentation template and HTML code in a database-driven website system, you don't expect that your templates would have to be governed by GPL as well. Thesis does more that just change HTML, it also changes some PHP code used by WordPress. While the controversy is yet to be resolved, it seems to me that WordPress's developers have a strong case. Update (7/23): the Thesis developers have decided to GPL the part of Thesis that is clearly derivative of WordPress.

Would anything have been different if WordPress had used AGPL instead of GPL? If we assume that the distribution of Thesis violates GPL, then any similar modification of WordPress, even if you just run it on your own site, would trigger copyleft and you'd be forced to make source of your modification available.

Koha's License

When the developers of Koha, an Open-Source Library Management System, first released their code, they chose GPL as their license. The copyleft provisions appealed to them because they believed that the library community should share development just as they share books. Since at least 2002, the license distributed with the software has specified GNU GPL version 2 (or later).

Last year, many of the original Koha developers were upset when it emerged that the largest Koha support company, LibLime, had decided to pull out of the public development process for their Software-as-a-Service offering, LibLime Enterprise Koha (LLEK), and delay the release of their improvements and modifications. I wrote three articles about this situation in the early part of this year. For the current discussion, the important thing to note is that LibLime was perfectly entitled to do what they've done. Since they've not redistributed the software they are running, they have not triggered the copyleft provisions of GPL.

Since my last article, the acquisition of LibLime by PTFS was completed. In May, PTFS released (using the GPL license) a package of its improvements to Koha. The distribution, labeled "Harley", can be downloaded and run by anyone, the changes are being absorbed by the community; most of the enhancements and bugfixes are scheduled for merger with the 3.4 community version of Koha. According to PTFS, a much larger set of enhancements, developed as part of development contracts such as one with WALDO, is in the pipeline for release.

Nonetheless, there remains considerable friction and animus between PTFS-LibLime and the rest of the Koha Community. This friction has culminated in a call to change the license for Koha to AGPL. Participants in a IRC chat meeting decided to call a vote on the possible license change, to take place sometime this summer. The choices on the ballot are to be:

Stay with GPL version 2.1 or later.

Change to GPL version 3 or later.

Change to AGPL version 3 or later.

The electorate is to consist of any individual who self-identifies as a Koha stakeholder. While that may seem to be a simple choice, it turns out that changing a GPL license is not as easy as having a vote. To do so legally would require the affirmative assent of each and every contributor to the software. If poor records have been kept as to the contributors, it is virtually impossible to legally change the license. In the case of Koha, the source code record is reasonably complete. For example, the opac-main.pl script was touched by users Chris Nighswonger, Henri-Damien LAURENT, Nahuel ANGELINETTI, savitra.sirohi, Chris Cormack, Galen Charlton, Joshua Ferraro, Frederic Demians, Paul POULAIN, hdl, tipaul, toins, bob_lyon, kados, rangi, acli, finlayt, and tonnesen. It seems unlikely that the contributors associated with LibLime would agree to a licence change that would force them to change their preferred development process.

GPL version 3 is not compatible with version 2, but since most of the Koha code appears to allow version 2.1 or later, it should not be a major problem to change to version 3. However, should enforcement of version 3 provisions ever be required, questions about the license status could complicate enforcement. The significant new additions in version 3 concern the use of patents and hardware locks to subvert license terms.

AGPL is another license entirely, not a new version of GPL. However, it is allowed to include GPL works as components in an AGPL licensed work. It is NOT allowed, however, to include AGPL components in a GPL work, just as it is not allowed to include GPL code in an LGPL work. I assume this is the mechanism that AGPL-Koha proponents advocate for changing the Koha license. Section 13 of the AGPL states:

Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such.

For this to work, a new work (not a copy) to contain the Koha work (licensed under GPL) would have to be created and licensed under AGPL. In my view, this would be too far a stretch to be allowed under the GPL License. If the license used for Community Koha or any of its components was changed to AGPL, I would strongly advise anyone considering use of Koha to first seek legal advice to determine whether such use is a license violation. (I am not a lawyer!) It would be a shame, in my opinion, if libraries were forced to incur this expense. Nicolas Morin, who led the French Koha support company BibLibre until his recent move to PRES de Toulouse gave me a very practical view of this issue:

I think focusing on the legal aspects of Koha, the copyright, TM and license, kind of misses the point. The problems Koha faces are much more mundane : lack of clarity and focus. Project management issues are real, legal issues are pretty theoretical at this point. I feel it's a bit of a distraction.

I'll add more information here about the IRC-called vote when it becomes available.

Wednesday, July 14, 2010

One of my secret pleasures at American Library Association meetings is going to Standards sessions. Now before you think I have a completely hopeless case of nerdiness, let me explain myself.

There's never just one Standards session at ALA, there are at least two and often three or more. I'm not sure why, but I think it's because librarians feel that standards are Important, and because there are so many Standards in the library world that people forget which ones were the subject of a Standards session at the last meeting. Its not that librarians are interested in Standards, it's just that they have lots of data problems that might magically go away, if only there were a Standard. Or not.

Because there are so many session on Standards, each one tends to be sparsely attended. That's why I like them. You can go and sit in a room with some really smart and influential people (the panelists), ask them bizarre Standards questions, have some other really smart audience member join in the discussion, and feel like you're a member of some hidden clique of powerful numerologists.

One of the things that has the Standards people concerned this year is the way ISBNs are being applied to ebooks. At the session I went to, Brian Green, the Executive Director of the International ISBN Agency, was giving his standard ISBN Standards update. Brian has been doing this long enough that he expects and parries my pestering questions with aplomb.

So here's this year's burning question: How many ISBN's should be issued when ebooks are published in different formats? Should the ebook have the same ISBN as the print book? If a different ISBN, should different file formats get separate ISBNs?

And here's the burning answer from ISBN International: each ebook file format for a book should get its own ISBN:

Do different formats of an electronic or digital publication (e.g., .pdf, .html) need separate ISBNs?

Different formats of an electronic or digital publication are regarded as different editions and therefore need different ISBNs in each instance when they are made separately available

And here's the language of the Standard itself, (ISO 2108:2005) adopted through the international standards process in 2005:

Each different format of an electronic publication (e.g. ".lit", ".pdf", ".html", ".pdb") that is published and made separately available shall be given a separate ISBN.

So forgive me for having been confused in March, when I read that the “E-book ISBN Mess Needs Sorting Out,” Say UK Publishers. Why are the publishers still talking about this, more than ten years after the question was raised and thoroughly discussed? Why are we having panels at ALA to learn about this? Has the numeracy of the world's book industry been entirely depleted during ISBN's switch to 13 digits???

ISBN stands alone in the world of identifiers because of its widespread pre-internet adoption and success. Even the Internet Engineering Task Force set aside some URI space for it back in the days before "HTTP" became a religious invocation. But most people outside the book industry have had no idea of what it really identified- they usually think it identifies a book or perhaps a book version.

If the book industry had a Facebook profile, it would list its relationship with ISBN as it's complicated. Consider ISBN 978-1593967574. It is a "Year 5 Harry Potter Bust" manufactured by Diamond Comics. It has no author, pages or even words; it is not a book in any sense. Yet it is well-behaved in the ISBN world, because it is (or was) an item distributed by the world's book supply chain to bookstores and ultimately consumers.

In the print world, it is more or less understood that a paperback has a different ISBN from the hardcover, which has a different ISBN from the library-bound version, and may have a different set of ISBNs when issued in a different country. At the deepest level, the ISBN is just a solution to a problem: "How does an item get tracked through the book supply chain?"

If you see a book on the shelves of a bookstore, you can be pretty sure that it got there through the "supply chain". Book publishers don't sell books to book stores, they mostly sell to distributors such as Ingram and Baker & Taylor. Bookstores use ISBNs to order books, and the distributors use the ISBN to report sales back to the publishers. When books don't sell, they get shipped back to warehouses, which track them using...ISBN.

When there's a question about whether a different ISBN should or should not be issued, the overriding principle is "a product needs a separate identifier if the supply chain needs to separately identify it." This clarity about the function of an ISBN is what has resulted in its overwhelming success. When people try to use the ISBN for other things, it's less successful. Supplemental services such as xISBN (which I helped put into production at OCLC), thingISBN, and emerging identifiers such as ISTC are useful for filling in the gaps between what ISBN really is and what people would like it to be.

Let's look at ebooks with the prism of the supply chain. If an ebook is issued in print, PDF and EPUB formats, it's important to the publisher to know how many of each are sold, thus the separate ISBN's. Similarly, if different DRM wrapping is used by two different channels, in many cases the publisher will need to track sales or manage the product separately. Although in many cases the DRM could be tracked by retailer, and thus wouldn't need a separate ISBN, the ISBN Standard says to give it a different ISBN. As Green has written previously,

Where publishers are selling e-books exclusively from their own websites or through another single channel and do not wish to have them listed in books in print databases then [...] publishers may not wish to bother with ISBNs. However, publishers should beware of taking a short-term view that makes them reliant on a single channel.

Unfortunately some publishers have obstinately refused to give separate ISBNs to ebooks in different formats. The US division of Random House is perhaps the most prominent example. There are excellent arguments for the "single ISBN" approach, but the worst possible situation for the emerging supply chain is for each publisher to use their own inconsistent rules for applying ISBN to ebooks. However strong the argument is for "single ISBN", its inconsistent application negates the advantages and threatens the ISBN system as a whole.

The ultimate problem with ISBN and ebooks is that ebooks are sufficiently adaptable that they expose ambiguities and limitations of the ISBN identification architecture. For example, suppose you're in the business of selling customized digital coursebooks. You allow professors to choose 10 chapters from 100 available. That means there are exactly 17,310,309,456,440 different ebooks that you could sell. That's about 9,000 times more books than can be identified by all the ISBNs in the galaxy. But you don't need to give them ISBNs, because you sell direct and the ebooks never touch the supply chain. The chapters themselves may need to be tracked so you can pay author royalties, but you need only 100 ISBNs to do that.

How about if a retailer changes (or eliminates) the DRM wrapping an ebook? Do the ISBN's of the ebooks on a consumer's ebook reader magically change? (Transubstantiation is one of my favorite words!) The answer is no, and that's because the the supply chain is not involved.

Are there enough ISBNs for the ebooks that could be sold? The EPUB format is actually an archive file format that uses a dialect of XHTML for its insides, so you might imagine that any website or portion thereof can be packaged as an ebook. In fact, BookGlutton has a tool that (sort of) does this. As of May 2009, over 100 million websites operated, so you can easily imagine that ebooks could use up all available ISBN's almost overnight.

The "supply chain" for ebooks is rapidly mutating. The adoption of an "agency model" is an example of a change that has put new demands on ISBN; "agency" requires a retailer to identify an item's publisher before the moment of sale so that the correct sales tax can be applied. The agency model shift won't be the last or biggest change to the ebook supply chain, either. As one example, I've previously written about ebook pay-per-view and demand-driven acquisition. Another huge change would occur if a substantial advertising revenue stream for ebooks, such as Apple's iAd system, emerges. Advertising would put new demands on reporting systems and thus on the ISBNs that enable them.

What is an ebook anyway? Ten years ago, a committee of the American Association of Publishers came up with this not-so-useful definition:

An ebook is a literary work in the form of a digital object consisting of one or more standard unique identifiers, metadata, and a monographic body of content, intended to be published and accessed electronically.

I'll bet you never realized that blog posts were really ebooks!

The truth is that we really have no idea what an ebook is or what it will become. There are certainly e-things that correspond to print books, and these are easy to recognize as ebooks. But don't be surprised if there comes a flood of things to read on our connected devices that are too long to be called "articles" or "posts". For these, "eBook" may be the best label we can come up with.

Truth be told, the internet and especially ebooks are causing change, and libraries struggle to adjust. That's why Library Journal and School Library Journal are putting on a "Virtual Summit" called ebooks: Libraries at the Tipping Point. It's a fascinating idea, reasonably priced, and the list of speakers is impressive.

To help provide framing for the issues to be discussed at the conference, the organizers have asked me to write a series of articles for Library Journal about ebooks and the changes they will bring to libraries. I'm really excited to be doing this. If you have suggestions for topics, I'd love to hear them.

Yesterday, I went to the Metropolitan Museum of Art in New York City to see the installation at the roof garden, Big Bambú, by Doug and Mike Starn. The Met's roof garden han a spectacular view of Manhattan and Central Park, and the installations it hosts are always visitor-friendly. If you don't like the art, you can always get a cold drink (or coffee, in cold weather) and enjoy the beauty of the setting.

I really liked Big Bambú. It's a structure built from bamboo that allows the visitor to stroll inside it. If you have appropriate shoes and a free ticket, you can climb up on it like you were Han Solo visiting the Ewoks on Endor. What I loved about is was the way it related to people. Big Bambú isn't about the sticks, though they are strong and graceful. It isn't about the way the sticks are tied together with colorful nylon line. Rather, it's about cutting up space and putting it together in a way that forces people to navigate that space differently. You wander through and encounter objects and people that you didn't expect to see, viewpoints that open and close. It creates conversations by framing them, in the same way that a museum demands attention for art by putting a frame around a painting.

Friday, July 2, 2010

Here's the most important thing I've learned about intellectual property law: the lawyers who say "yes" when you ask if you can do something are much, much more expensive than the lawyers who say "no".

Brewster Kahle, the founder, and through a foundation, the funder, of the Internet Archive, can afford a very expensive lawyer. He sold Alexa to Amazon for about $250 million of Amazon stock. He'll need that expensive lawyer; on Tuesday, the Internet Archive announced that its Open Library had started to facilitate the lending of out-of-print (but in-copyright) digitized books, a move that seems designed to spur a legal reaction from publishers.

To some extent, this isn't really news. Kahle has been advocating digital lending of books for some years now. In 2001, he published an article in D-Lib Magazine advocating the use of Inter-Library Loan (ILL) for digital materials. In October, the Internet Archive unveiled its Bookserver software, whose goal was to enable the lending of digital materials over the internet, and the University of Toronto was one of the original partners in its development.

The lending library announcement was also modest in scope. The big numbers were associated with programs unlikely to stir any controversy. Over a million out-of-copyright works are available, and Open Library has integrated access to the 70,000 ebooks licensed to subscriber libraries through Overdrive. Only 187 of the available books fall into the category of un-licensed out-of-print but in-copyright books.

In another sense though, the announcement, which was fed directly to the Wall Street Journal, was a declaration of war on barriers to fair use of digitized books. From the Journal's article:

The effort could face legal challenges from authors or publishers. Paul Aiken, the executive director of the Authors Guild [...] said "it is not clear what the legal basis of distributing these authors' work would be." He added: "I am not clear why it should be any different because a book is out of print. The authors' copyright doesn't diminish when a work is out of print."

Mr. Kahle said, "We're just trying to do what libraries have always done."

Having to receive prior permission from a copyright owner in order to scan a book is onerous, said Mr. Blake, of the Boston Library. "If you own a physical copy of something, you should be able to loan it out. We don't think we're going to be disturbing the market value of these items."

Stewart Brand, author of the 1988 book "The Media Lab"—one of the scanned books that will now be available for loan—said he didn't mind seeing his title made available this way. Mr. Kahle at the Internet Archive asked his permission, he said, and he gave it because he thinks digitizing books has the potential to improve knowledge.

The fact that at least one author was asked for permission suggests that the Archive is being very careful about what it chooses to make available through the lending program. A look at the 187 items in the lending library supports this view. There are

Works by well-known copyright reform advocates such as Brand and Lawrence Lessig.

In short, if you wanted to take legal action to stop the digital lending library, each of the books included in the lending library would pose some sort of problem for you.

It does not appear that the Internet Archive is attempting to rely on the statutory exceptions for libraries in US copyright law. These exceptions have rather technical requirements, and the lending library program does not appear to have been crafted to take advantage of these exceptions. Rather, it appears to be staking out fair use grounds. As James Grimmelmann writes:

The argument here would likely center on the Archive’s nonprofit purpose, the negligible harm to the market for some long-out-of-print books (quite possibly including some orphan works), and the nearby public policies of first sale and library exceptions. The natural counter-argument, however, is that distributing complete copies of books for readers to consume is so close to the core of copyright’s rights and goals that fair use simply cannot stretch that fair. These are non-transformative, substitutive, complete copies of expressive works—so while the Archive would have an argument, the fair use factors arguably tip 4-0 against it. Should it win, it would be a revolution in fair use caselaw. A good revolution, for some, but a revolution nonetheless.

Consider the case of Digital Systems by Ronald Tocci, published by Prentice Hall in 1977. The lending library offers only the first edition; a tenth edition was published in 2006. It would be hard for Prentice Hall to argue that offering a single copy of the 1977 version would reduce its economic value, while it would be easy to argue that sales of the 2006 edition will be enhanced. In addition to the probable difficulty in proving damages, it's likely that Prentice Hall might have difficulty proving that it has electronic publication rights for the 1977 edition, as author contracts of that era could not have anticipated today's internet distribution channels.

The Internet Archive's legal strategy would appear to be one of fair use creep, a sort of adverse possession by the public domain. If no one steps forward to tell the Internet Archive to stop lending these works, then the public gains a sort of right-of-way to use them. Even if a lawsuit occurs, it's quite possible that a jury would consider single-copy lending use of any of the 187 to be fair, even if the majority of copyright lawyers might disagree. If enough works become available in this way, then a political constituency for library lending of ebooks could develop and strengthen.

It looks to me as though the Archive is setting a trap, hoping that someone will take the bait and file a lawsuit despite problematic subject matter. The publicity exemplified by the Wall Street Journal article then looks like chum on the water to trick the legal sharks of publishing into striking on poisoned bait.