On Libraries and Media

Access 2012 notes

Access has always been one of my favourite library IT conferences. In terms of pure bang for the buck, you just can’t beat its mixture of good talks, interesting people, and stimulating conversations. Plus, this year’s Montreal version featured a conference first: a Sunday-morning bagel delivery service for attendees. Next year, Access will be held in St. John’s, Newfoundland. Don’t miss it.

What follows are some notes I jotted down in various sessions. My editorial comments are in italics to differentiate them from the speaker’s words and thoughts.

Access 2012 Montreal, Quebec October 18-21, 2012

Opening Keynote – We were Otaku before it was coolAaron Straup Cope, Cooper-Hewitt National Design Museum, Smithsonian Institute

Otaku – Japanese term with light pejorative overtones used to describe collectors of lowbrow objects.

Asserts that the boundaries between museums, archives, and libraries are collapsing. Holds the speed of delivery (e.g.- Amazon’s next day delivery) to be an unmitigated good, worth ignoring other impacts and issues. His wording was that if this stuff works for what he calls the trivial things, when will we start expecting it for the important things.

His worldview is that of a person who expects everything to be online. Not interested in millions of objects sitting ‘hidden’ in archives. Makes sense given his background at Flickr.

Pointed out that everyone will not experience explosive growth with what they create; we need to have confidence in what we’re doing even if it doesn’t immediately show a huge impact. “History has always been lossy.” Not sure if that’s an original line or not, but it’s a good one to remember.

How do you preserve the content on Flickr (his question)? Answer: you buy it. It’s the only way to maintain the trust.

Ignite Talk – Social Feed ManagerDaniel Chudnov, George Washington University

Application being created to allow students and faculty to collect data from social media. Question: how/why do people study social media? If you can’t hack, it’s manual labour.

Three main Twitter-licensed data providers. Gnip best for the full “firehose.” Not cheap.

But researchers don’t need the firehose; want specific users/keywords, time periods, basic values, 1000s not 10000000s, delimited files to import. Can use the free public API to do this stuff. Selectively one will have to buy data (esp. for historical queries). One way to get data is to start now and collect it moving forward. Going backward means paying some significant fees.

Locked in the cloud: What lies beyond the peak of inflated expectations?John Durno & Corey Davis, University of Victoria

Vendor-managed is perhaps a better word for cloud when discussing having your applications offsite. What does “technology managed by vendor, not library” actually mean? Where does their work stop and yours start?

Lock-in has two periods. In first, there are price wars and competition, but in the second, switching costs are so high and you’re so buried in the existing system that change becomes impossible. Lock-in is the antithesis of innovation. Very simple truth about the software used in libraries.

Durno pretty much exploded the myth of managed technology. It’s not, as he said, all that hard. Information management is really hard; by comparison technology is fairly straightforward.

Irony of all those forward-looking ideas those people articulated in the 2007 CiL article on the future of the ILS (Tennant, Pace, et al.) is that many of them work for vendors who are not meeting the goals they set out.

If you have enough IT resources and skill, SaaS may not be the best option, according to Durno. Can only agree with that statement.

Adventures in Linked Data: Building a Connected Research EnvironmentLisa Goddard, Memorial University

Scale matters – scale is a new area for intellectual inquiry; there are human phenomena that appear only at scale (she was paraphrasing Liu). In a linked data/semantic Web world, every entity/object (not just text documents) requires a unique URI: people, orgs, places, things, etc. These URIs should be minted with care, and should be clear and sensible. That means human interpretable, no file extensions, no query string markers, etc. Use mod_rewrite and other tools to achieve this, regardless of what the app requires or creates. It actually requires multiple URIs per object, one for the object, one for its rdf data, and so on. Current Web links are just dumb, i.e.- they connect entities but say nothing about the relationship.

“Linked data is basically an accessibility initiative for machines.” Hope that’s original to her, because it’s a great line.

Reuse ontologies whenever possible. This helps with processing and reasoning. Find them via an ontology search engine such as Linked Open Vocabularies (LOV). If one must create an ontology, remember it needs to live beyond the scope of a given project. As she put it, ontologies are for life.

Major change in big data is that fast, big, and varied and the cost of having all three in your analysis, is approaching zero and dropping rapidly. When we do a Google search, we’re using big data, and that’s a demonstration that this big/fast/varied nexus is with us.

Paradox: the more efficient we become, we don’t use less, we actually consume more (used the coal example). Jevon’s paradox.

Big data is about cheap and abundant access to information. We don’t as people ask questions to which we really want answers; we seek confirmation of our beliefs. Big data is good at predicting things, and prediction can come close to prejudice. There are good and dark sides to it.

His project builds on the Pirate Box created by David Dart, which is a standalone (no power, no network) file server with wifi. Also builds on LibraryBox, which was an adaptation of the pirate box by Jason Griffey, a US librarian.

Used OpenWRT – Linux-based router software, made for embedded systems. Installed it on a commodity router (TP-LINK). Open Publication Distribution System – specialized version of Atom for ebooks. Good example of taking open tools to create something simple and elegant. He went into the details of his Web programming framework and his templating tool.

Question Answering, Serendipity, and the Research Process of Scholars in the HumanitiesKim Martin, Western University

Talk was built around the notion of seredipity in humanities research. Open minds pursuing threads that were not anticipated. One central idea: we don’t need a simulacrum of the physical library on the Web. That form—both of the library and the book as object—no longer has relevance in an ebook environment.

Quip: take stuff out of the basement of the library, do research on it, and turn it into a compelling visualization (Australia example).

Open Source OCR for Large Collections of Scanned DocumentsArt Rhyno, University of Windsor

The way newspapers were filmed makes scanning and OCRing challenging – too many pages in one image, stacked pages, etc. Scanning is cheap, but OCR is the “enabling technology.” ABBYY is the industry leader from the commercial side, as most would agree and know. Abbyy can produce coordinates for every character. Even thought he’s a big fan of OS, he admits that ABBYY is very good and has its place as a solution.

Tesseract is OS, but has fewer languages. Also no UI, but can be embedded in other apps or used via command line. Open source since 2005 and Google is involved, although he said it’s hard to know what their involvement is.

Accuracy of OCR is relative for taking papers off of film. Even 50% can be acceptable since it creates some access (better than none). To improve the accuracy, image preprocessing is key. Removes dots and imperfections. Image processing with GIMP and Image Magick. IM is the go-to tool for things like batch rotation, and GIMP does cleanup.

They spin out Ubuntu virtual machines to their library lab computers so that they can do image processing and OCR using their capacity during downtime. Brilliant reuse of resources. Coordination of this via Hadoop streaming, of which I understood little other than that it’s useful for streaming other languages.

Talked about using cloud services. The processing is cheap and effective, but moving the tonnage of images across networks makes it less than ideal (noted O’Reilly’s notion that data locality is the critical component of big data, i.e.- cloud processing only makes good sense when the data is collected and stored in the cloud as well).

ABBYY good when one has varied images, needs a one-off, or is tied to a Windows environment and has some money at hand. Tesseract fine if image quality is good and work is being done in a Unix environment. Good for large volume (no licensing fees). Good if you need it to work with another application framework. ABBYY has a trial version for testing purposes, including an online processing version.

Tesseract has an HTML output, tries to reflect italics and bold in this version. Nice features in general. Art has published his Tesseract mods via GitHub.

Answered the “why bother” question (why bother if Google will do it all) by pointing out how their newspaper project was abandoned. Leaving this newspaper work to the private sector seems fragile.

For a visualization – pick your question and work with it as a guide. What, where, how. Avoid why, since that’s a bit tricky. Source trinity: designer + audience + data. Remember the interactions here. For audience – remember attention spans; what will the viewing time be, e.g. Know your data, what can be done with it, how much there might be, what facets/aspects might be interesting.

Books are, to some degree, excluded from the Web world because their business model doesn’t allow easy sharing (paraphrasing a ton). As he kept asking, what’s the business model that will facilitate the transition to making them available?

“Information wants to be used.” This goes beyond the information wants to be free idea.

He posits a merger between books and the Web, but he framed the whole talk in terms of business models, and that’s exactly the rub. When will it tip?

Made the clear distinction between an ebook and a webbook. These are pretty clear. The webbook has analytics that are easily accessible. Faster spread, easier to find, can optimize interactions (tie-ins). Last, but not least, they can have different business models (because no middleware—Amazon, Kobo, etc.—is involved).

Making ebooks is easy. Lots of tools (Booktype, Vook, Atavist, et al.) already available and free, and more on the way. The book avalanche that results from this will be considerable.

Metcalf had a very clear insight around the difference between Amazon recommendations, which he views as indicative of what one buys to have a bookshelf that fits their sense of a collection, and recommendations based on usage patterns. The latter shows, as he put it, how the books are being used together, in context, for real work. What emerges are connections between, say, a graphic novel, and academic works on feminist theory, slavery, etc. Showed a good example of this using Y The Last Man.

Have to counter the Harry Potter problem, i.e.- HP books end up in every recommendation because it correlates to everything because everyone bought it.

Their database structure uses caching, so that if recommendations for an ISBN have already been generated, they don’t have to be created on the fly, which is processing intensive.

They currently capture institution, but are discussing dropping it, for security reasons among others.

This is a project that resulted from multiple hackfests at various conferences. Good fusion of that phenomenon and our organizations.

At the end of the talk, they made a clear call for help and participation, and actually delineated what help they need and why one might want to be involved. Great idea and it puts the substance into calls for collaboration and openness.

Sharing the Unshareable – Dental Clinic Images in a University Image RepositoryJanet Rothney, University of Manitoba

Dental school wanted to digitize their slide collection of clinical images. Two major problems: privacy (patient information). They had rights to use them for testing and teaching within the university. Second problem: organization. Who knows what is going on in the images (requires dental expertise)?

Printer makes it easier to bring an idea into physical reality. The scanner “allows the capture of objects for preservation and/or digital manipulation.”

They didn’t do a lot of planning, just moved forward. They did discover that it has some health and safety issues. Means that the printer had to be in a place where the fumes could be safely handled. It also gets hot, and the machine is fragile. Had to more or less find their own answers from various hackerspaces. They got suggestions, but did their own analysis and worked with Dal’s health and safety unit.

While it seems a bit far-fetched, having these devices attracts interest and inspires students to start thinking about the possibilities. Wasn’t all that expensive, either. Establishes the library as a place where new technology has a home and as a place where there are people out on the bleeding edge and interested in making it accessible and usable by people who otherwise would have no access to such tools.

We Can Do Better! Integrating APIs to improve the user experienceSonya Betz and Robert Zylstra, MacEwan University

Widgets (e.g.- search widgets) are “like doors into all of these different rooms” that contain library treasures and resources. Simply spawning new windows and tabs is flummoxing. Most Websites do not perform this way. They had metrics, LibQual and others, that showed student dissatisfaction with their Web offerings, and decided to act.

They created an iOS app as a partial response. As he put it, it’s really easy to find articles with a phone, and then you just toss them in Dropbox or some other service and read them on a more suitable device. Interesting observation that reflects my own “two-device” habits, where I often pick up my phone while sitting and my desk to do something fast that I take further on the large screen. Seems to be an increasing trend.

As with all tools, it relies on a vendor API, and they have some complaints about the response time. Generally satisfied, but the response time and documentation issues are worth considering.

Have to admit that it’s a pretty slick mobile app. He showed simulations on the slides, which were quite good at highlighting how it works. Good stuff.

This talk was a great demonstration of how smaller libraries can take the lead in technical areas since they are more flexible and perhaps willing to change than larger shops.

Currently, about 33% of their catalogue access comes from mobile devices now. Fantastic result that answers the question as to whether it’s ‘worth’ having a mobile app. Apparently, if you do it right for the right audience, the answer is yes.

They built their mobile app before their CMS. That alone is a pretty radical thing. Development time from proposal to release was one year. They need one more year for the CMS. Alas, they can’t share the code because they paid a designer, and they cannot share it. It was cheaper for them that way, but they note it was a hard decision.

Browsing appears to address some human need, so it’s something she wants to provide.

Mentioned Jill Taylor’s book about her stroke, about how her right brain became ascendant when her left brain was damaged. The words she used to describe the feeling mirrored what people say about browsing physical collections.

Have now heard Daniel Kahneman’s book Thinking Fast and Slow mentioned at two different events in the space of ten days. Apparently it’s a good read.

Where are we improving: shelf reading in digital search tools. Better details about the book, but also items that would be to its left and right on a physical shelf. This shelf is better than any physical shelf, because it can span multiple physical collections and make them into one whole.

She asks, riffing on science fiction, why, if we can generate a stellar shelflist ,not go further and render it as a 3D gaming environment? “How might a library digital catalogue make a user feel embodied?”

Great talk. I made a comment afterward that when we run focus groups on future library services, we really ought to invite eight year-old children rather than current students and faculty. Dan Chudnov riffed on that and said that next year should be bring your kids to Access year. Perhaps not my kids, but it might be time to let a teenager give a keynote.

Metacommentary

As the conference closed, I had a few thoughts pop into my head, and thought I’d record them here before they evaporated.

Access is an excellent conference, particularly in two regards. For one, there are talks that present concrete solutions: this is what we did, how we did it, and how it went. That’s incredibly useful when it falls close to work one is doing or considering. The other kind of talk falls roughly into the category that outlines a problem, makes some sage commentary on it, and spurs abstract thinking about the larger philosophical problems behind some of what we do.

The category of talk that is perhaps missing here, and in general in our profession, would be talks that address the management issues around some of our decisions. Perhaps this is better defined by describing the audience at an Access conference. It is primarily, but not entirely, made up of people working directly with technical projects, at some level. There are not a lot of decision makers here, i.e.- the people who hire, sign contracts, set strategic agendas, etc. It’s not that they aren’t here, but it’s the same ones who self-define as interested in such technical work, such as myself.

What I would particularly enjoy, but have yet to find at least in a distillled form, would be a meeting of library administrators where the gritty details of our technology decisions could be raised, discussed, and examined closely. We know our relationships with vendors are a constant source of frustration for out staff (and by extension, for our users), but when or where are we going to speak about these issues in a substantive fashion? When are we going to come together to talk about creating robust mechanisms and reward structures that enhance our staffs’ ability to collaborate on open source and other projects? When or where will we come together to talk seriously about the credentials we need graduates of library programs to have so that they can come into our organizations and work successfully on increasingly complex projects: digital preservation, linked data, software creation, etc.

I know that many, including myself, could answer these semi-rhetorical questions by saying, well, that happens at this or that conference or via this channel. However, I would still assert that we’re not doing this with enough consistency or deliberate intent. There are controversial initiatives such as the Taiga Forum, but it is better at making provocative statements that spur thought than dealing with concrete questions of IT practice and management. Do we need to form a professional organization for IT administrators in libraries?

Share this:

Like this:

Related

Who I am

Dale Askey - Librarian normally located in Hamilton, Ontario in an academic library, but often found in Germany.Bibliothekar, normalerweise in einer wiss. Bibliothek in Kanada auffindbar, aber oft auch in Deutschland vorzufinden.