Thursday, August 9th, 2007

LibraryThing has a small but dedicated cadre of author-picture adders. The most active, alibrarian, has uploaded more than 3,000 of them.* Today’s subject, DromJohn, has entered fewer—183 at last count—but almost all of them required permission. That is, he wrote to the author, agent or publisher and got permission to post the images on LibraryThing. I am awed by this.

DromJohn wrote to the McIlhenny Company, the people who make Tabasco. They’ve also published a few books under their company name, so they have an author page. Wouldn’t it be cool to have the Tabasco logo on that page? Here’s their reply:

Please be advised that McIlhenny Company hereby grants you permission to use the forthcoming “Brand Products” logo on the LibraryThing author page … for six months from the date of this email.

Any changes in your intended use of our Intellectual Property must be submitted to us for prior approval.

We will follow up with you at the end of the six month period to see if the logo is still being used.

Further, we note that “Tobasco” is misspelled on the author page. Please make the necessary revisions to the pages in which it is misspelled. It should read “TABASCO(r)” with an “A” and it should be in all caps with a superscript registered symbol.

I will review the page in a few days to ensure the necessary revisions have been made.

…

Should you have any questions, please do not hesitate to contact me.

Thank you and have a great day.

[NAME OMITTED]Trademark and LicensingMcIlhenny Company

A couple of points:

The owner of one of their (not-so-popular) cookbooks is volunteering to promote them. This should be a call for celebration.

The “tobasco” spelling error came about because some loyal customer of theirs couldn’t find the book they published in any source, but was so insistent on including it on their virtual shelf that they cataloged it by hand. These are problems you want!

Neither Amazon nor the Library of Congress nor any other source I can find put the registration marks in. Personally, I’m glad.

Good luck getting the Tabasco tag pages on Del.icio.us and Flikr to use the ® symbol.

“TABASCO®”? HASN’T ANYONE TOLD THEM THAT ALL CAPS IS SHOUTING?

I’ll bet you that, on today’s web, half the time you fire off an asinine letter like this to someone with a blog you get a post like this, and another 10, 100 or 1,000 people out there who think you’re clueless.

Wednesday, August 8th, 2007

Hat tip to the (recently news-worthy) Fake Steve for juxtaposing that blog title with this photo. Juxtaposing someone’s fuddy-duddy opinion with a photo of them in a Donald Duck suit is an unfair, but totally effective, way to cut the opinion down.

Tuesday, August 7th, 2007

AquaBrowser, which makes one of the few really interesting online library catalogs, has teamed up with us to offer LibraryThing tags and recommendations within AquabBrowser.

The product is called My Discoveries. Basically, it gives AquaBrowser a series of desirable social features, like tagging, list-making, ratings and reviews—and not in some half-assed way either. LibraryThing comes in as a way to kick off the tag data (a 21-million-tags kick) and to add recommendations to it. My Discovery customers who choose to go with LibraryThing data will be able to see both LibraryThing’s as well as their own patron’s efforts.

Putting tags and recommendations in AquaBrowser is a natural step. LibraryThing for Libraries is showing what LibraryThing can do to a library catalog and more generally the importance of having large amounts of data to help “social” features reach their full potential. But some sort of LibraryThing-AquaBrowser project has been written in the stars for a while now. Writing up this blog post I did some blog searching around LibraryThing and Aquabrowser. Apparently we should have hooked up long ago—the idea is positively rampant on the biblio-blogosphere. As NeoArch puts it:

“What would happen if we put traditional cataloging data, LibraryThing, and a highly visual OPAC in a blender?* Probably something special. It’s just my opinion, but I think if both types of data could be incorporated and added to an OPAC with a powerful interactive visual interface, like AquaBrowser, we would see a fopac [folksonomic OPAC] that every patron could fall in love with.”

We finally met up at ALA in Washington, DC. The core team is whip-smart, and as a relatively small company they have a development culture not unlike our own.** High on my list of virtues, they have a larger sense of what they’re doing. The co-founder and the Marketing director put it in a book, Risen: Why Libraries are Here to Stay. I don’t agree with all of it, but the basic point is dead-right, that innovative and user-centered technology from libraries can avert everyone’s worst-case scenario, the “fading out” of the library. We think projects like this might play some small role here—and that would be something. Also, I’m dying to take a “business trip” to their offices in Amsterdam.***

Lastly, we should be sure to say that LibraryThing for Libraries still very much in play. LTFL is designed for all library catalogs, not just one. We have a number of planned improvements, and a frankly absurd number of customers waiting to try it out. (We’re hiring someone to take it on full time in August.) But working directly with AquaBrowser is going to give their customers what’s good about LTFL with perfect back-end integration and much more baked into the software from the start.

We’d be only to glad to partner with or work more closely with other vendors. This is clearly the future, and everybody’s going to get there eventually.

*We definitely need a LibraryThing edition of Will it blend?**I suspect they do test, however. AquaBrowser is headquartered in Amsterdam. It’s something of a happy coincidence that yesterday was LibraryThing’s big push into Dutch-language books. The effort was a coincidence, but two of their top people have generously offered to scout out some potential sources.***I’ve been there four or five times on the way to Turkey—KLM has great lay-overs. And my brother, best friend and I stopped there on the way to my bachelor party in, um, Lithuania (desperately random on the part of my brother). But I’ve basically only done the Rijksmuseum, the Anne Frank House and walked around till I was lost. Now that we’re tying in to all this Dutch data, and we have work to do with AquaBrowser, a longer visit is surely necessary! Now, what accounting category does hash fall under—”office supplies”?

Thursday, July 26th, 2007

In the spirit of fraternal concern, I post that the Internet Archive is looking for a systems engineer with PHP experience for their book-scanning project. (Also they promised to send us their discards. We need one too.)

If I had the skills, I’d be tempted to take it. The Internet Archive is a great institution. The people are great, and they have the best office space ever. LibraryThing’s second-story apartment steps from the Portland waterside pales in comparison. They have this adorable jewel-box in San Francisco’s Presidio, with the Golden Gate Bridge right outside the window.

Anyway, I flogged the ARC for so long that, when it came out, LibraryThing bought a small box of hardcovers direct from the publisher–to give out at conferences, to thank people for inviting me to talk, and so forth. I still have half a box. So I’m going to open it up to the whole LibraryThing community.

We’re going to give out ten copies. We like contests*—we have a Harry Potter book photo and another review contest going—so we’re going to make a contest of it.

Tuesday, July 24th, 2007

Short version: I’ve just gone live with a new feature called “tagmash,” pages for the intersections of tags. This is a fairly obvious thing to do, but it isn’t trivial in context. In getting past words or short phrases, tagmash closes some of the gap between tagging and professional subject classifications.

For example, there is no good tag for “France during WWII.” Most people just don’t tag that verbosely. Tagmash allows for a page combining the two: France, wwii. If you want to skip the novels, you can do france, wwii, -fiction. The results are remarkably good.

Tagmash pages are created when a user asks for the combination, but unlike a “search” they persist, and show up elsewhere. For example, the tagmash for France, Germany shows France, wwii as a partial overlap, alongside others. Related tagmashes now also show up on select tag and library subject pages, as a third system for browsing the limitless world of books.

That’s the short version. But stop here and you’ll never know what Zombie Listmania is!

Long version. LibraryThing has shown some of the things that book tags are good for, such as plain language, genre fiction, capturing identity and perspective, academic schools, staying current and changing over time. (Details and examples in footnote.*)

As I’ve argued elsewhere and in my Library of Congress talk, problems 1, 2 and 3 are mitigated by having LOTS of tags. Idiocy, malice and personal junk fall out statistically. A tag here or there can’t be trusted, but a large body of tags in agreement is different.

Some day–when I become a better programer?–I’m going to try this on LibraryThing data. It will help with ambiguity—the secondary tags on the various meanings of “leather” are surely wildly divergent! But I suspect it separates better than it clarifies. Flickr supposes that tags fall into discrete clusters, but subjects interact with books in extremely complex ways. On a more basic level, I am suspicious of the too-quick resort to algorithms against user data.*** After all, if computers are so good at figuring out meaning, why were users necessary in the first place? It smacks of technological revanchism.

So, where Flickr’s clusters are automated, tagmash is a semi-automated process. LibraryThing does the statistics, but users decide what the meaningful clusters are. Some mashes are interesting and useful. Some aren’t. By and large, uninteresting clusters won’t last.****

This certainly helps with ambiguity. Take the problemmatic tag leather, which divides easily into tagmashes like:

Now let’s take the “focusing” power of hierarchy. As mentioned above, there is no good way to get at “france during wwii.” The tag Vichy covers some of the ground, but not enough. Tagmash provides an answer.

The book list is good, and a simple union gets around an imposed hierarchy. Looking at the related LCSHs, for example, one is left in doubt whether France is part of World War II, or World War II part of France—or what:

Of course, both trees are equally artificial. David Weinberger writes how, in the real world, a leaf can be on many branches. But it’s equally true that what’s trunk and what’s branch are largely about where you start–dirt or pinecone. Either way, branching happens. The order of the branches isn’t necessarily important.

Even as it borrows some of the virtues of subject classification, tagmash keeps the strenghts of tagging. Subject systems are pre-built things. Now and then they get larger, but it takes deliberation and effort. What gets “blessed” is often surprising. I would have never predicted the unusually staid LCSH would have embraced:

Of course, tagmash only narrows the gap. It doesn’t eliminate it. Tagmash: poetry, San Francisco still can’t distinguish between poetry about and poetry from San Francisco–it involves whatever is tagged “San Francisco” and that’s probably a mixed bag.***** Well-planned and carefully executed subject systems have strengths that no ad hoc, regular-person system can match.

Lastly—let there be no doubt—tagmash needs a very large quantity of tags to work. For tagmash after tagmash, the data is simply insufficient.

You’ve made it to Zombie Listmania! There are some obvious directions this can go:

The syntax can improve, for example to allow alternates (eg., humor, cats/dogs)

The syntax can include non-tag factors, such as formal subject headings (Tag: zombies, LCSH: love stories), languages, dates, authors and so forth.

The syntax can include weights (eg., Zombies 50%, vampires 50%, love stories 90%). Abby and I experimented with just such a system, creating algorithmic proxies for BISAC (bookstore) headings. It isn’t that hard to do.

Complex mashes could acquire titles and other metadata.

Users could follow a tagmash, and be alerted whenever new material enters the list.

Amazon calls its static, or dead, lists “Listmania.” All these tend to create a “Zombie Listmania,” lists of books that “won’t stay dead.” Instead, they change over time, as the underlying social and non-social data change. There’s no reason you couldn’t create “Zombie” versions of formal subject headings—a series of tags and other markers which approximated the content of a professionally-assigned subject heading.

Pretty cool idea, I think. We’ll see what we can do about it.

Details.

Tagmashes can be made from any tagmash or tag page. Just search for a tag or two or more tags with a comma between them. The URLS are the same /tag/ plus a tag or tags separated by commas.

The weighting of tags is wiggly. We’re trying to get at both raw numbers of tags on an item and the relative salience (number divided by total number of tags), and then cross this data tag-by-tag. There is no obvious answer. In an ideal world, some tags would about salience (eg., humor) and others would be threshholds (eg., fiction)–that is, when you’re looking for humor, fiction you want the funniest fiction, not the most fictional humor.

You can enter the tags in any order, but it will reformat your URL in alphabetical order, with the minuses at the end, such that “wwii, france” is the same as “france, wwii.”

A single minus (-fiction) “discriminates” against items tagged “fiction.” A double minus (–fiction) disqualifies all books with the fiction tag.

Tagmashes don’t get built until someone builds them. The first time can take a while to generate. There is currently no system to expire older or underused tagmashes.

UPDATE: I’m seeing a lot of part/whole tagmashes. These rarely work. When you search for “Einstein, science” or “Manet, art” you’re not doing much more than putting a statistical cramp on the smaller of the two tags—a few Manet books won’t have an art tag, and that will be the end of them. Tagmashes work with different things, not a thing and its category.

Footnotes!

*What’s good about tagging:

Tags use everyday terms (the tag cooking vs. the subject cookery)

Tags are great for genre fiction that subject systems can’t keep up with as fast or as well as their readers (chick lit, cyberpunk, paranormal romance)

Tags are good for schools of thought (intelligent design, austrian economics)

Tags respond quickly to change (hurricane katrina)

Tags “keep happening” in a way that systems like LCSH do not, getting added to books where LCSH misses the “first wave” of anything new (memetics, sociobiology)

**I’ve left out one problem, not covered at the LC—how “democratic” weighting can put Angela’s Ashes at the top of the Ireland tag. books. I want to write a blog post on the topic sometime. I think there are ways around it, and algorithmic solutions that nobody has really tried.

Aside: Much LIS anti-tagging polemic focuses on the most trivial of problems—spelling mistakes and “incorrect” tags. The former underestimates technology, the latter insults our intelligence. LibraryThing has dealt with the spelling problem, and has seen very few “wrong” tags. In fact, there are some serious problems with tagging. But you have to understand tags before you can see the problems, and many refuse to get past the idea that people will spell “white” wrong, or tag white horses as black.***This is half formed. I have a problem with the reflexive “turn” from people-centered data to algorithms. I see this pattern again and again in software. Something transformative happens–something human. But it’s imperfect, so programmers conclude that programs will fix humans. In a way, it’s a reassertion of importance. More often, humans fix humans. To adapt David Weinberger, the answer to user-generated data is MORE user-generated data.****Probably there’s got to be some system to expire unused clusters.*****UPDATE: After turning the feature loose I watched what new tagmashes would be created. One was children, cooking. Should I call the police?

Saturday, July 21st, 2007

I cover the basics of LibraryThing and some of what LibraryThing “means” to libraries, including a long section on tagging. It has a short section—a sermon, really—on open data, in anticipation of the launch of Open Library, and another on the upcoming Everything is Miscellaneous.*

To my regret, it ends abruptly. They didn’t include the 20+ minute Q&A**, which went a lot deeper on some of the interesting issues (particularly tagging), and with the nation’s top library talent!

Being asked to talk in front of the LC was a great honor. There aren’t many institutions I hold in higher regard. And it was fun. I got to be myself—PowerPoint-less, off-the-cuff and passionate–and was greeted warmly and given the benefit of the doubt when I pushed the limits. Also, I got to have lunch with some of their top people. It was a blast.

*The subtext of that section is that I just had a lunch conversation about open data, and heard more about the whys, wherefores and finances involved.**Apparently they felt that they needed permission from everyone who appears on tape, and that the questions were not well miked.