Thoughts on digitization & libraries while working on Hardin MD

Main menu

I wrote recently on the kinship of Google and libraries. I got the idea for that especially from a long portrait of Google co-founder and new CEO Larry Page, which brings out several qualities of Google and Page that I think show commonality with libraries and librarians. In that portrait, Farhad Manjoo contrasts the Google/Page style with the Apple/Steve-Jobs style, and says it’s unlikely that Google will “tap its inner Apple” under Page’s leadership. …

That term “Tap its inner Apple” kept bouncing around in my mind — Larry Page may not help Google find its Inner Apple, I think, but how about adding another twist? — Combining the idea of Google-Librarian temperamental connections, from my previous article, with Google Books, which resonates strongly with librarianship, and was actually conceived by Page — How about Larry Page as Google’s Inner Librarian? …

At first this idea of Larry Page as Google’s “inner librarian” seemed almost too playful to suggest. It was only when I was able to substantiate Page’s central role in creating Google Books and his conception of it in library terms that the idea seemed more credible. The general idea of his involvement in the early years of the project is commonly mentioned, but Google co-founder Sergey Brin is the one who’s gotten more attention talking about it. So it took some digging to find details of Page’s role in the creation of Google Books, which did turn up some bits of solid evidence, discussed below.

The first is the story of Page telling Google CEO Eric Schmidt about his idea for Google Books. This is from Ken Auletta’s book on Google, ironically enough, right from Google Books — Surprisingly, as interesting as the story is, especially from a library point-of-view, googling the quote turns up only a handful of fairly obscure places where it’s cited. The telling here is notable for Page’s strong emphasis of the project’s library-librarian connections:

[boldface added] Schmidt remembers the day in 2002 he walked into Page’s office and Page surprised him by showing off a book scanner he had built. It had been inspired by the great library of Alexandria … “‘We’re going to scan all the books in the world,” Page said. For search to be truly comprehensive, he explained, it must include every book ever published. He wanted Google to “understand everything in the world and give it back to you.” Sort of “a super librarian,'” he said.

The second telling of the story is also little-cited, probably because it’s buried in the middle of a recent multi-paged Wired article. Written by master tech storyteller Steven Levy, it’s notable for the clear statement that the project was Page’s idea:

[boldface added] It was Page who dreamed of digitizing the world’s books. Many assumed the task was impossible, but Page refused to accept that. It might be expensive, but of course it was possible. To figure out just how much time it would take, Page and Marissa Mayer jury-rigged a book scanner in his office, coordinating Mayer’s page-turning to a metronome. Then he filled up spreadsheets with calculations … Eventually, he became convinced that the costs and timing were reasonable. What astounded him was that even his spreadsheets didn’t dissolve the skepticism of those with whom he shared his scheme. “I’d run through the numbers with people and they wouldn’t believe them,’” he later said. “So eventually I just did it.” Page was disappointed when critics … launched a series of legal challenges … “Do you really want the whole world not to have access to human knowledge as contained in books?” Page asks. “You’ve just got to think about that from a societal point of view.”

It’s ironic that Page is taking over as Google CEO just after the rejection of the Google Books Settlement. But I suspect the Google Books project will be seen by librarians of the future as a necessary first step in the evolution of a universal digital library — An idea that might still seem impossible if it hadn’t been for Google. In fact, this process of looking back on Google Books as “history” has already started — Harvard Library director (and historian) Robert Darnton, writing in a NY Times op-ed soon after the Settlement rejection, proposes the creation of A Digital Library better than Google’s. He concludes his piece by giving credit to Google for getting the idea started:

Through technological wizardry and sheer audacity, Google has shown how we can transform the intellectual riches of our libraries, books lying inert and underused on shelves. But only a digital public library will provide readers with what they require to face the challenges of the 21st century.

And it might not have happened if Larry Page hadn’t had the audacious dream of digitizing the world’s books and scanned the first one in his office with Marissa Mayer.

When I wrote last Fall about iPad interest in different areas, libraries were far behind, and they still are, as shown in the chart at left. The blue columns are from Sept and red columns are from now, March 2011. The red numbers above the red columns are for March; for Sept numbers see the previous article.

The notable jump for “medical” in the chart since Sept is not surprising to anyone who has been following news and commentary — The iPad is proving to be very popular for doctors, hospitals and medical education.

The decline for “magazines” and “newspapers” is also not surprising — The highly-anticipated iPad boost for those media has not happened, and interest has sagged.

Whither libraries? — As I said in the Sept article, it continues to be surprising that libraries have not caught the iPad interest, since books and eBooks are so popular. With the great iPad interest in medicine, maybe medical libraries are just the ones to lead the pack in generating iPad interest in the library world.

The new data (red columns) is the average of counts done in Twitter searches on Feb 25 and March 31. The launch of the iPad 2 on March 2 had a notable effect on the these counts — The number of tweets was significantly higher on March 31 for most areas, except “libraries” and “newspapers,” for which it actually declined.

SEO (Search Engine Optimization) has been in bad repute recently, with Google’s SEO spamming problems in the news. Actually SEO has never been given much respect in the library world, and this is unfortunate because on a basic level SEO is closely related to the library-centric concept of discoverability — Making it easy for users to find good things on your website.

I’ve been thinking for some time that librarians’ apparent lack of interest in SEO was surprising. But recently I’ve been realizing that my perceptions are colored by my experience in crafting Hardin MD pages to be found by Google, beginning about 2001, before most anyone had heard of “SEO.”

I can understand why SEO has bad connotations for library people who know it only as a bag of tricks used by the dotcom-Adwords world to trick Google into giving a high ranking to their clients’ pages. But I hope the examples of my pre-SEO-Adwords experience that I’ll present here will show why I think optimizing pages so they can be found in Google is very much in the library tradition of bringing together the users and the pages.

Even in pre-Google days, standard wisdom about getting pages found by search engines emphasized the importance of a strong page title that gives a concise description of the page’s contents (advice that still holds true). Much of my early work on Hardin MD that I now think of as using SEO techniques centered on this importance of the title. I was an early booster of Google, so I noticed soon after it was launched, in 2000, that many of the pages in Hardin MD were getting high rankings in searches for title words of its pages. I also noticed that most of the pages that were highly ranked got more traffic. But not all of them. Why was this, I wondered? Finally, with the help of WordTracker (this was long before Google Analytics), I figured out that a high Google ranking goes only halfway — The other half of the high-traffic equation is people searching the term that gets the ranking. Getting a high ranking for a term that no one is searching is useless, like providing a supply of something for which there’s no demand! This simple, basic supply and demand principle is still at the heart of SEO.

The case that opened my eyes about the supply and demand principle was a Hardin MD page with the title “Respiration Medicine” — It got high rankings in Google searches but very little traffic. With WordTracker, I saw the reason why — Hardly anyone was searching for “respiration medicine” — So I used WordTracker to determine the equivalent terms that people WERE searching for, and when I put those words in the title (which is now Respiratory System & Lung Diseases), the traffic increased.

Having discovered the value of using title words that people were searching for, I adjusted Hardin MD pages accordingly. This often meant changing from medical specialty terms to terms that are more easily-understood and widely-used by the public — Ophthalmology was changed to Eye Diseases, Cardiology became Heart Disease … Pediatrics >> Childrens Diseases, Otolaryngology >> Ear, Nose, Throat.

After learning the value of choosing the best words to draw traffic, I applied this optimization lesson to creating the tags that are used at the bottom of Hardin MD pages. The same technique also showed that “pictures” should be used instead of “images” for Hardin MD pages relating to pictures.

I find basic SEO principles especially interesting from a library point-of-view because they have similarities with some of the long-standing principles of librarianship. I’ve written about tagging in Hardin MD that hearkens back to the subject headings used on library catalog cards. And, having had a bit of experience as a library cataloger, I see a similar parallel between the web page title, that I’ve discussed in this article, with the title-page of a book, that was established as the basis for cataloging books several hundred years ago — Principles of information management endure!

In recent interviews about his new book The Googlization of Everything (And Why We Should Worry), I’ve been struck by Siva Vaidhyanathan’s deep ambivalence about Google — How profoundly he realizes, even with all his doubts about its motives, how much Google has become indispensable, for himself, for the world, and for librarians. I discussed this in a previous article, based on an interview with Vaidhyanathan in Publishers Weekly.

I recently came across another interview of Vaidhyanathan in Inside Higher Ed, where his conflicted Google-sense comes out maybe even more. In the introduction, the author/interviewer, Steve Kolowich, I think does a good job of catching this sense:

As is often the case with cousins, the genetic differences between higher education and Google are more striking than their similarities. Beneath the interdependence and shared hereditary traits, tensions creep.

So, yes, the emphasis here is on “genetic differences” and “tensions.” But note the underlying context of these differences and tensions — That Google and academia are interdependent and closely related (“cousins” with “shared hereditary traits”). I want to repeat that the quote is not directly from Vaidhyanathan. But, as I said, I think it’s a good representation of his mixed views that come out in the interview.

Taking off from the idea of Google and academia being in a “cousin relationship,” in this article I’ll transfer the “cousin” idea from academia in general, more specifically to libraries. There are several things that bring this idea to mind — For one thing, Vaidhyanathan in the interview does make one notable mention of a library-Google connection, suggesting that colleges should consider hiring a librarian to be “Chief Google Officer,” to help faculty keep up with the stream of new Google tools. I’ll discuss a couple of other Google-Library connections in the conclusion, but the immediate thing that brought the idea to mind was reading an article on Larry Page, who will become the Google CEO on April 4, just after reading the Vaidhyanathan Inside Higher Ed interview.

7 Ways Larry Page is defining Google’s future, by Farhad Manjoo, is a long and penetrating portrait. As the title says, it does indeed center on Page. But with him being a Google co-founder, observations about the man and the company naturally intertwine. When I came across this article soon after reading the Vaidhyanathan Inside Higher Ed interview, the affinity between Google and libraries seemed natural.

The article is worth a read for many insightful passages, but here I’ll be looking at the parts of it that especially suggest to me the Google-Librarian relationship, mostly in a section called “Talk is Cheap” — The Google character discussed here, that I think fits librarians well also, is an understated modesty — Feeling uncomfortable shouting to the admiring bog about how great they (we) are:

Persuasion offends Google’s — and Page’s — meritocratic beliefs. The company became the biggest search engine in the world because it built a better product, not because it created better TV ads than Yahoo.

Google’s attitude (and librarians’ I think) is “We’ve got the good stuff, so why do we need to advertise it”:

Google’s build-it-and-they-will-come naïveté seems almost cute in the age of Apple. Many of Google’s advances go unnoticed by the public because nobody hears about them.

(An interesting aside in this quote is that Vaidhyanathan, in the Inside Higher Ed interview above suggests, as mentioned above, that librarians might be just the ones to help Google’s advances get noticed on college campuses.)

Manjoo mentions that Google PageRank is named for Larry Page, which brings up another little Page-Google-Library connection — As I’ve blogged before, PageRank has its origins in the mind of librarian Eugene Garfield, dubbed “Grandfather of Google” in my article — So, if Google’s grandfather is a librarian, doesn’t that make all of us librarians at least cousins? 😉

On a personal level, Manjoo’s description of Page sounds like the stereotypical librarian: “reserved, unabashedly geeky, and said to be introverted.” He contrasts Page’s Google with Apple and Steve Jobs (who would certainly never be mistaken for a librarian), suggesting that the Page style may be a good fit:

With its new CEO an introvert, perhaps Google will never tap its inner Apple. But maybe, in the bigger picture, that’s a trade-off worth making. According to some surprising forthcoming research … introverts can be more successful leaders – particularly in dynamic, uncertain, and fast-changing environments like the tech industry.

The comments here on Google and Apple segue into another Google-Library commonality that I see, which is that they both stand on the side of the Open Web — Google certainly differs from libraries in being a commercial company that needs to make money. But for its basic function — Search — to work, it depends upon the Web being an open, free environment, as libraries strive to be for their users. Apple (and Facebook), on the other hand, occupies a more closed, “walled garden” environment, with tightly controlled access to information. So, for the good of the open model of the Web and libraries, it will be a good thing if Google under Larry Page does indeed not “tap its inner Apple.”

In conclusion, circling back to an apt Google-Library remark by Vaidhyanathan — In the “many-virtues-of-Google” part of the Inside Higher Ed interview above, he says “Google made the Web usable” — A user-friendly place where people can actually find what they’re looking for — Just like libraries do for their users.

It was only after a day or two that I realized that it was a great chance to laugh at myself! …

As the lightbulb joke pokes fun at SEO specialists who are obsessed with thinking of every possible word that people might search for, I remembered the Hardin MD Chicken Pox / Chickenpox page that I made several years ago –The only page in Hardin MD for which I used a double-element title, because WordTracker indicated that people search for chickenpox as two words and as one word. Traffic data has shown, in fact, that the page does indeed get significant traffic for both terms.

I’ve been blogging on the SEO theme recently, and I’m realizing that my interest in it comes a lot from my experience with Hardin MD, much of it long before I heard the term “SEO.” As the little lightbulb example here shows, though, I guess I have a foot in that world.

It’s a little hard sometimes to explain how Twitter and blogging can be used together, to compliment each other. I experienced a nice little example of this recently, as part of a discussion I’ve been involved in.

The specifics of the subject being discussed in the tweets at left are a bit obscure to anyone outside the medical library community (clarification below) — So, disregarding the subject, the point I’m making is simply that the wording I used in the first tweet served as the basis for the title of a blog article by another medical librarian in the discussion, Alisha Miles (@alisha764), in the bottom tweet.

More specifically, in my tweet, about the announcement of the new NLM PubMed Health site, I comment “PubMed Health finally has a Face!” Alisha then responds with a blog article whose title, that’s in her tweet, builds nicely from my tweet — PubMed Health has a face but does it have a place?

For more on the specific issues involved in the PubMed Health discussion, so my earlierarticles.

In February, Google began giving prominent placement to articles in NLM’s PubMed Health. As I discussed in a previous article, NLM and Google have been strangely silent about announcing this new feature, with no discussion of it anywhere that I can find.

It’s especially surprising that Google hasn’t said anything about this because — coincidentally with the NLM boost — Google’s ranking system has been under attack recently, with charges that doctom sites (most notably JC Penney) have used Search Engine Optimization (SEO) tricks that cause Google to give high rankings to their product pages.

As I’ve discussed before, although using SEO techniques to get high rankings in Google is widely discussed in the doctom world, it’s an almost unknown subject to most librarians. This is unfortunate, because without a background understanding of SEO, the next step in the “NLM-CIA conspiracy” story seems completely bizarre …

As I discussed in my previous article, soon after Google began giving prominent ranking to NLM, Jeff Hamilton, who blogs about ADHD, raised questions in his short article PubMed Health Who?:

Where the heck did these guys come from? Try and Google “ADHD” and these guys are the #1 search result!! PubMed Health is a new online resource under development at [NLM-NCBI] … Hmmmm, CIA? Secret Government agency? How does an organization go from not being on the radar to the #1 search engine result for ADHD overnight? … You’ve heard stories about how much control and power the Government has over the Internet…….I wonder if this secret SEO organization would be interested in doing some site optimization for me …

As a member of the medical library community, it seems laughable that anyone would suggest underhanded dealings between NLM and Google, and in my previous article I described Hamilton’s idea as “hare-brained.” But thinking it over I realize that to a non-librarian who’s been reading about the recent Google-SEO controversy, Hamilton’s speculations seem more reasonable. As he says, it does indeed seem surprising that PubMed Health pages suddenly began appearing at the top of Google’s rankings, with no explanation from Google or NLM about why this is happening.

The story gets more meta-interesting because Google’s ranking of Hamilton’s SEO story itself becomes part of the story — If the article had stayed on his blog, it probably wouldn’t have gotten much attention from Google and hence the eyeballs of the world. But instead it got copied on the Psychology Today blog, and that brought it a high ranking (generally between #1 and #6 in the last two weeks) in a Google search for PubMed Health — So, let’s say you’re a health-information-seeking consumer who comes across a PubMed Health page in a Google search — You like the page, so you do some googling to find out more about PubMed Health — And what do you find? Hamilton’s NLM-CIA conspiracy article.

So what’s wrong here? Why is a standard resource by large government site like NLM not able to outrank a blogger’s speculations about its validity in a Google search? Normally, Google does a good job finding “the real thing,” the site itself. The problem, I think, is that there has been nothing for Google to link to for “PubMed Health” — It didn’t even have a home page until last week, when it was announced by NLM/NCBI in Twitter. And there still hasn’t been a press release or longer announcement by NLM or Google. If these sources existed, they and medical library bloggers discussions of them would soon dominate Google’s top ten, and leave wild NLM-CIA conspiracy speculations in the dust. I’d guess that sooner or later, NLM and/or Google will make some sort of announcement. But I’d predict that the longer they wait, the harder it will be to displace Hamilton’s article from its high ranking — In my experience, Google has a persistent memory, and it often holds on to links after they have been obsolesced by events.

I think this is an excellent example of why librarians should learn more about SEO — If people at NLM and in the wider medical library community were paying more attention to SEO, it would have been clear that the sudden appearance of a new resource from NLM at the top of Google searches needs to be explained.

Learning more about SEO — If you google for SEO be ready for a fire-hose of sites offering to help you get a Google ranking. You might want to start out with Wikipedia’s lengthy SEO article or a book on SEO in the Dummies guide series.

Until recently, the term “googling symptoms” has generally had strong negative connotations among health professionals, bringing to mind visions of patients carrying stacks of mostly-useless articles that they’ve found online to their doctor’s visits. This seems to be changing, however. As often seems to happen, a term like “googling symptoms” that starts out being pejorative and negative changes its sense, and becomes more positive.

As reflected in the screenshot for the googling symptoms search at left, a large part of the reason for the changing view of “googling symptoms” among healthcare professionals has been an article in Time Magazine by physician Zachary Meisel — Googling Symptoms: How it can Help Patients and Doctors — which takes a positive view of the matter (see excerpts below).

The search screenshot gives an interesting perspective on how the recent positive view of “googling symptoms” is nudging its way up Google’s list, “in hot pursuit” of the older, negative articles above it. The other positive article on the list, in support of Meisel’s Time article — Patients Google their symptoms, doctors need to deal with it — is by prominent physician blogger Kevin Pho (kevinmd.com). The results for the googling symptoms search, of course, will change constantly, and so won’t necessarily be exactly the same on any particular day. But it will be interesting to watch over time to see if the positive view of the term continues to move up and multiply.

A few excerpts from the Meisel Time article:

There is no question that patients routinely benefit from going online before visiting the doctor. To debate whether patients should or should not Google their symptoms is an absurd exercise. Patients already are doing it, it is now a fact of normal patient behavior, and it will only increase as Internet technology becomes ever more ubiquitous. Doctors and nurses are going to have to shed the presumption that the Internet makes patient care harder. It’s a problem if doctors continue to walk into the exam room with the belief that patients always need to be disabused of the wrong and sensationalistic information they picked up while trolling the Net.

Googling Symptoms & Patient Empowerment: A Watershed Moment

Noted activist patient-advocate Dave deBronkart (@ePatientDave) has also seen the Time article as “a sign of shifting winds … a watershed moment (boldface added).” He suggests that health information providers capitalize on the moment by “developing tools to teach smart info-shopping” to help the empowered patient find the best online medical sources – Medical Librarians take Note!

I stumbled yesterday upon a couple of passing mentions in publishing circles of a new-found appreciation for the value of metadata, and the librarians who work with it — Certainly not a new theme, but coming across these two bits on the same day struck me. The first is in Kassia Krozser’s write-up of the recent Tools of Change in Publishing conference:

[Boldface & color added] … Which leads to my final theme: metadata. Metadata is the sexy of publishing conferences. This would embarrass metadata, metadata being the type who prefers to remain in the background. It also reveals too much about publishing conferences. Metadata is useful, efficient, precise. Metadata doesn’t grace the cover of Vogue. It’s the girl next door. The really smart girl next door. The really smart, really successful girl next door.

Metadata is data that describes data. That’s meta, I know. It is the information that feeds search. Enables discovery. The better your metadata, the better your chances of discovery. Consider your book’s metadata: title, ISBN, author, editor, year of publication, format, index, table of contents, keywords, tags, reviews, so much more. The more you can describe your (collective your) book, the greater the chances of discovery.

The second mention is Hannah Johnson’s article in the Publishing Perspectives blog last month, with this concluding paragraph:

So even though metadata has a less-than-cool reputation (think solitary librarians checking ISBN numbers in their card catalogues), digitization is making it very cool.

Although a couple of comments on this article by library people express displeasure with the “solitary librarians” theme, I certainly see this as a positive appreciation by publishers of the new role of metadata and librarians, especially in light of Kassia Krozser’s laudatory sentiments above.

The kind words here from publishers about metadata and librarians continue a thread that’s been developing for a while — I’ve blogged before about how digital publishing, especially, is bringing librarians and publishers together — See my post on this, in which I discuss an article (with a cute cover picture) in Library Journal. Also see also my article on the growing importance of metadata for publishers, as discussed eloquently by Dominique Raccah.

The tweet shown here, by Dr. Ves Dimov (@DrVes), is interesting on different levels. The tweet is about Huffington Post, but it gives good advice on how to write a blog article in general — Find a juicy nugget in a news or blog article that’s unnoticed by most readers and feature it in your own article, quoting it prominently and adding your own spin to it.

But beyond its application to writing blog articles, Dimov’s tweet applies at least as much to writing tweets. Even more than a blog article, a tweet needs to strip a subject to its essence, and put it into a 140 character message that combines the arts of narrative writing and headline writing.

A twist of Meta …

Another layer of interestingness here is that Dimov’s tweet itself applies exactly the stripping to the essence technique that’s featured in the tweet — The words in the tweet are taken from far down in a NY Times story, where few human eyeballs (or the GoogleBot) are likely to see them, and brought to the attention of the Twitterverse and Google by @DrVes — Here’s the NY Times quote, with words in the tweet in boldface:

Huffington Post is a master of finding stories across the Web, stripping them to their essence and placing well-created headlines on them that rise to the top of search engine results, guaranteeing a strong audience.

A great example of combining the simple elegance of Twitter and the power of human judgment to search out an interesting nugget in a long page of text, and bring it to the attention of the Web’s eyes. With Google’s spam troubles recently, there’s been much discussion of the renewed importance of human curation, with Twitter being seen as a prime vehicle, and I think this is a nice example of that.