A blog about search, search skills, teaching search, learning how to search, learning how to use Google effectively, learning how to do research. It also covers a good deal of sensemaking and information foraging.

Friday, October 29, 2010

When Alexis said this was a hard problem, I didn’t realize how hard it was. It seems there are many answers, and which you believe depends on your reading of the scholarly octopus literature. (Yes, there is such a thing.)

[ octopus evolution ] in Google Scholar is tough.

There ARE lots of papers on the evolution of octopoid eyes (which are strikingly similar to human eyes, an example of convergent evolution), octopus sperm evolution and the development of mimicry as a survival strategy.

As a consequence, I quickly gave up on Scholar as a resource. In this case, I don’t know enough evolutionary biology to be able to sort out the wheat from the chaff. There’s probably a lot of good stuff there, but it’s too highly encoded for me to work with. (Even after trying [ octopus lineage ] and [ octopus divergence ] which was fascinating, but not helping me out.)

So I backed up and tried:

[ octopus diverged ]

.. .on the open web. One of the first things I saw was a result from Elsevier. This is good, as they're a respected scientific publisher, but it's also really annoying, because while they seem to have a few things, they cost $27 to read each article… and this is a fishing expedition! (I'm not sure who can afford that kind of price. I doubt that many marine biologists have that kind of money.)

And after a while I ended up on the Wikipedia article on Mollusca –

“There is good evidence for the appearance of gastropods, cephalopods and bivalves in the Cambrian period 542 to 488.3 million years ago.” (As differentiated from their non-swimming ancestors, monoplacophoran-like ancestors. Like half-clams, living on the sea-floor.) (from Lemche, H; Wingstrand, K.G. (1959). "The anatomy ofNeopilina galatheae Lemche, 1957 (Mollusca, Tryblidiacea)."Galathea Rep.3: 9–73)

This is good information, but doesn’t really answer our question: When did octopus as an independent speicies originate?

In another article linked from the Wikipedia page we find that “Natilus diverged from octopus around 415M years ago ± 24 million years.” This, according to: Bergmann, S.; Lieb, B.; Ruth, P.; Markl, J. (2006). "The hemocyanin from a living fossil, the cephalopod Nautilus pompilius: protein structure, gene organization, and evolution" Journal of molecular evolution62 (3): 362–374.

Given this as a start, I got to be curious about the vampyromorphs (apparently the octopus’s closest relatives) and my query became:

[ octopus vampyromorph divergence ]

Big point: When doing complex research tasks like this, you often learn a great deal about the domain. This is an important aspect of research-search, one that we need to pay attention to (and we will, in a later post--stay tuned).

Which me to all of the following resources…. There was so much of it, and it was SO confusing, I started taking notes from the resources I found most interesting.

Part of the confusion stems from the many different senses of “evolve” or “diverge” or “speciate.” In most cases you have to read really carefully to figure out which species is being discussed, and which speciation event they’re talking about.

If ever there was a time and place for careful reading, THIS is it. I found it very easy to be reading along, and only later realize that I was reading about the divergence between subspecies of octopi. Careful!

Octopuses diverged from the vampyromorphs during the Late Jurassic (about 140 million years ago) as far as we can tell—but the fossil record is too patchy to make this estimate anything more than provisional. In some ways octopuses can be thought of as vampyromorphs that have lost their shells more or less completely. Comparisons of the mitochondrial DNA of various types of living cephalopod also seems to support a closer relationship between the octopuses and vampyromorphs that either has with the squids, spirulas or cuttlefishes. Of the two octopods groups—the Cirrata and the Incirrata—it isn't at all obvious which gave rise to which, and why the octopuses lost their shells so completely is obscure.

{ This is the cephalopod web site of James Wood, who has a PhD in biology, with extensive experience in marine biology research. Wood is also associated with CephBase, a database of cephalopod data. }

The most likely scenario seems to be that during the Late Devonian (480 – 360 Mya) the first octopods stemmed from the primitive vampyromorphs almost losing any trace of an internal shell by the time of our first fossil,Pohlsepia.

{ An octopus hobbyist online magazine, but with a staff of people who actually work in marine biology, with affiliations like UC Berkeley and the Monterey Bay Aquarium.}

The data support a Paleozoic origin of the Orders Vampyromorpha, Octopoda and the majority of the extant higher level decapodiform taxa. These estimated divergence times are considerably older than paleontological estimates. The major lineages within the Order Octopoda were estimated to have diverged in the Mesozoic, with a radiation of many taxa around the Cretaceous/Cenozoic boundary.

SUMMARY:

Strugnell – Paleozoic: 541 – 251 Mya

Tonmo – Late Devonian: 480 – 360 Mya

Cephalopod Page (Wood) – Jurassic: 140 Mya

Bergmann – 415 Mya

Lemche - 542 - 488.3 Mya

So... four reputable sources agree--the Ocotopi seem to have emerged as a separate, identifiable species around 400 Mya. As often happens with these studies, it's rare to get a clear, neat date for such events. We gather the data as best we can, and then see what the consensus is. In any case, it's worth nothing that the genus Homo didn't appear until around 0.2 Mya... just to keep things in perspective.

Tuesday, October 26, 2010

In a strange twist of fate, it turns out that the AROUND operator in Google search has been operational for... oh... the past 5 or 6 years. Turns out that nobody ever bothered to write much about it.

What's odd about that is that nearly every librarian I've ever talked to about the clever uses of Google search has asked me about it...

And so today, I'm bringing it out of the closet.

AROUND is a real, working and useful search operator!

Examples:

[ "Jerry Brown" AROUND(9) "tea party" ] will find you a bunch of hits illustrating the relationship between Jerry Brown (running for governor of California) and the Tea Party. (It's strained, at best.)

The AROUND operator is a handy trick to use when you're looking for a combination of search terms when one dominates the results, but you're interested in the relationship between two query terms.

NOTE: the AROUND() operator MUST BE IN CAPS. The number sets the max distance between the two terms.

Note also that if Google can't find anything within the limit, it will just do regular ranking of the terms without the AROUND coming into play.

Using AROUND is especially useful when the documents are rather long (think book-length articles). So try this operator in Google Books.... [ slavery AROUND(4) indigo ]

Who knew?

Do you have favorite examples where the AROUND operator helps on a difficult query?

Friday, October 22, 2010

By looking at the labels, we know that the importer is probably “ASI”… something… so let’s start there.

[ ASI ]

And we see that this is the landing page of the “Advertising Speciality Institute.” That makes sense, since this is pretty clearly a schwag item—probably made by one of their members.

So, using their built-in search box (on their page www.asicentral.com) we can search for 62960. (I’m guessing here that this is the code number for the company that makes the dog; probably a member of the ASI.) If you click through on that page you on a products.asicentral.com page.

If you click through to their page, you immediately see that they make small plush stuffed animals as marketing promotions.

And, voila, that’s what it is. There are three companies (“Adva|lite” “Toppers” and “it’s all Greek to me” that are collectively owned by The Corvest Family. The relevant one here is the company named “it’s all Greek to me,” (yes, they have funny upper/lower case) which apparently made the dog.

BUT... searching on that site is hard/hard/hard! For some reason, they don’t seem to carry this particular dog on that site. Hmm.

So I backed up to Google and did a search for

[ ASI 62960 ]

Went to the first result there. Now that I know 62960 is the code for “it’s all Greek to me,” I thought that might be useful.

So I checked that site, searching within http://www.iaspromotes.com/ for:

[ dog ] (on the site)

That also was slow going, so I used their option to show more results on the page, expanding it to 60 results per page and clicked until I managed to find an item that looked right. (This is a common option on catalog pages. It often makes searching much faster.)

After scanning the page for a bit, I found my dog!

I thought it was a generic dog, turns out they call it a “pug.” Shows you how much I know about dogs.

One more note. We have the number 8819 from the IASPromotes site, and pulling up product 8819 on It's All Greek To Me ("Chubby Wubbeez", http://www.iagtm.com/product.jsp?id=983), we see that it's a close-out. (Note also that the last 2 digits of the product code have been dropped. This is often a trick that manufacturers use—they append digits for their own internal use.

Which is why you can’t find them on the IAGTM.com site…

And THIS leads me to my last search trick.

If you’re using Google Instant (where it searches as you type), you can notice something very interesting here.

Do the search [ ASI 62960 8819 ]

now… as you’re typing that last code “8819” (and you don’t KNOW that the last two digits are optional), you can notice that the Pug result shows up in position #1 before you finish typing…

Thursday, October 21, 2010

In Bruce Sterling's book Shaping Things a SPIME is "an historical entity with an accessible, precise trajectory through space and time."Spimes don't really exist yet, but you can see signs we're getting there. A case in point... Earlier this week I got a small stuffed dog from my employer. It's about 4 inches high, soft and plushy--a literal warm fuzzy. But I got curious about it. So, here's the question: Who made it? And, how could I get about 20 of them myself? In other words, how "spimey" is the information on the dog? Today's search challenge is pretty straightforward, but there's a fun twist I'll let you discover. Here's all the information you need. Search on!

Tuesday, October 19, 2010

Paradoxically, setting a limit is sometimes the most important thing you can do on the way to understanding something big and difficult.

The idea of setting a “scope” (a la programming languages, or even in the more common vernacular sense of the “scope” of an investigation) is often key to being able to see through the clutter to the essentials.

This is perhaps most easily seen in the way experts search on Google. If you think about it, every term in a query is really “setting a scope” by saying what term should be counted. If you have three terms in your query, say [ independence hall Philadelphia ], each word is implicitly focusing the results on documents that have those terms. That makes sense.

But in a larger way, setting your scope is a really important part of understanding what you’re really trying to do. And tools that implicitly define a scope help out a great deal.

For instance, www.blackwebportal.com is yet-another search engine—but this one is for the black Afro-American community. (And interestingly, it’s not for the black community of any other country, but is strongly limited to the US.) There are many specialty search engines that are defined by the population group they serve (e.g., Middle Eastern—www.mymena.com), language (e.g., Latvian—www.search.lv) , the market they serve (retail, travel, etc.) or interest areas (windsurfing, knitting, robot construction, etc.). What’s so surprising to me is that almost any interest-area / population group / language you can think of has a search engine to serve its members. My favorite is the Lolcats search engine (http://rollyo.com/rhianda/lolcats/ -- you thought I was kidding, didn’t you? Here's an example search for fuzzy lolcatz.).

You see scoping at work when you decide to use a particular kind of resource—say, when you use Amazon to look for a book, Pubmed to look up medical information, or Youtube to find a video.

“Scoping” is the choice you make to limit the range of possibilities you’re working from: that’s a good thing—it’s often the key to being able to see the signal in the noise. It can be as simple as choosing the language of search (like searching German only sites when you’d like to find a German-language article), or as sophisticated as knowing how to search only within a specific *kind* of site when you have a strong suspicion that the answer will be there.

Of course, this goes hand-in-glove with knowing that such kinds of resources exist. When looking up your family crest, it would be immensely useful to know that there are web sites (and books!) dedicated JUST to describing heraldic devices, some with lovely language that’s hyperspecialized and otherwise archaic (think of the phrase, “lion rampant within a double tressure flory counterflory gules”—that’s not a phrase that would leap to mind in daily conversation).

The biggest challenge is when the target of your search is so generic (or so little known by you) that it’s hard to figure out how to describe it, let alone choose a scope for your research.

For instance, yesterday I saw a pretty yellow flower. Try a search for that! It’s hopeless as it is.

Good searchers know that they need to add in as much contextual information as possible to limit the range of possibilities. Where did I see the flower? In California… in the summer… on the roadside… All that scoping information helps to limit the range of possibilities. Pretty soon, you’re onto a page with a table of images to sort through.

To scope effectively, you need to know what’s possible and available. You need to that Intellius.com or Spock.com can deliver a huge amount of information about a person. In the same way, you can find out the assessed value of a house in Santa Clara valley by going to the county Assessor’s website, but you can’t find out who actually owns the property. (For that, you have to physically visit the assessor’s office in downtown San Jose. Why? Because there’s a state law that prohibits them from posting the address of any elected official… and keeping a master list of all officials that should be excluded from the list is just too painful.)

Sorry about the seeming contradiction here, but you need to know what you need to know.

It goes on and on. The more you know about what’s out there, the more you can constrain your searches. This has been true since books became cheap enough to proliferate like textual bunnies.

Interestingly, the research problem used to be “how to search sufficient resources to make sure you checked all the relevant places.” Now, the research problem is often “how can you search just the good resources to be sure you haven’t missed the signal in the noise.”

While the examples here are all about search, I think this holds more generally for all of our research as well. Defining your problem, being clear about what you’re trying to accomplish—these seem like obvious steps. But it’s an ongoing problem in all researches—we need to keep reminding ourselves of what the goal is (goals are?).

Monday, October 18, 2010

Okay... THIS was a tough search problem. The good news is that two folks (fellow Goggler Benj Azose and SRS-reader Fred Leventhal) both solved the "cliche cipher" problem (from last week) with a very clever insight. Better yet, both solved the problem in basically the same way, suggesting an interesting new strategy.

Both went to Google Books and did a search for the single word "cliche" in the book, "The Codebreakers" by David Kahn. (Among cryptography fans, this is a monumental (at 1181 pages), and very well-known book.)

It's an extremely clever move and suggests a new version of the Search in a Scope strategy. By searching just within the scope of a book about codebreaking, Benj and Fred both limited the scope of the search in a meaningful way.

Normally, a search term like [ cliche ] is too common (or, in cryptographer-speak, too "high frequency") to be a useful discriminator term. "Cliche" occurs in too many contexts to help find something like "newspaper cliche" used in "writing code."

But by looking only in the text of the book, the word becomes meaningful.

As Benj writes:

Search in "The Codebreakers" (within Google Books) for the term [cliche]. It gives a result on page 799 that is clearly the story in question.

".. past the censor's eye by using the first half of a newspaper cliche as the codeword for the second half, which forms the plaintext..."

However, the full preview is not available, so you'll have to find/get the book elsewhere. Then I tried searching on that text, as in searching on the phrase:

["first half of a newspaper cliche"]

This gives a snippet which contains the name Calloway.

Searching then on:

[newspaper cliche +calloway]

(use the + sign to get the correct spelling for the name without any spell-correction or synonymization) gives you articles on newspaper cliches and the hint that this is an O. Henry story named Calloway's Code.

Fred got there via a slightly different path (by doing a social-search with a reference librarian) who led him to the Kahn book.

Note that if you do this same kind of search on Amazon.com, the "search within a book" doesn't find the term "cliche" as it's a reduced version of the book (not all pages are available for searching or viewing).

But on Google Books, the full-text IS available, even if not all the pages are viewable.

However, it's very clear from the small excerpt that this text has the answer.

The full relevant excerpt (which I got by physically going to the library this weekend) from page 799:

"O. Henry penned a sardonically amusing story about "Calloway's Code." Calloway, a newspaper correspondent, gets a scoop past the censor's eye by usting the first half of a newspaper cliche as the codeword for the second half, which forms the plaintext. Thus, FOREGONE meant conculsion; DARK meant horse; BRUTE meant force; BEGGARS, description. And--sad to say--the journalists in New York understand it."

My copy of Codebreakers is buried somewhere in a box deep in the garage--it's much faster for me to find it at the library. Which makes me wonder why I have all these boxes of books I never refer to. Maybe I should donate them all to my local library... Just thinking aloud here..

Wednesday, October 13, 2010

Here's a search problem from my friend Erin McKean, the CEO of the most excellent site Wordnik.com

This is a puzzle she sent to me a few weeks ago. I worked on it for a while but didn't have any luck.

Maybe you can do better?

(Fair warning. There may NOT be a web-only answer to this. Still, ANY answers would be very much appreciated. Please let us know how you solve it!)

Search on!

From Erin:

There is a famous (well, it was famous when I was a boy) story of a
journalist who needed to file in difficult circumstances. He couldn't
use a cipher, because it would be blocked; he couldn't file the truth,
because it would be censored. Eventually, he sent a message (telegram,
I assume) which was obviously using English words and phrases, but not
making any sense. Presumably, either the censors skimmed over it and let
it go, or were not willing to admit that it beat their knowledge of the
language. His office had great difficulty with it, too, until some lowly
type (the office boy?) suggested that he had written the whole thing in
the most hackneyed of newspaper clichés, and then only sent part of each
phrase; the story was conveyed in the missing parts. [Cf. rhyming
slang.] The only words of it that I can remember are the last few: "ye
angels incorrigible"; meaning 'mercy' and 'liar'.

Do you know the story, and where I can find the rest of it?

I would guess that I read it in Arthur Mee's 'Children's Encyclopedia'. I know the idea was used in one of the stories in 'Eagle': my favourite comic during the 1950s.

Friday, October 8, 2010

One of key skills of an expert searcher is the ability to read through a SERP (or a landing page) and spot the word(s) thaey don't know, but that still might be relevant to their task.

I call this skill anti-reading because you're not really reading the word, you're reading right up to that point, then recognizing that it's not a known word, but it still looks like it might be important.

Here's an example to show what I mean. Suppose you're looking up possible causes for white spots that appear on your skin in the summertime. Your very reasonable search might be something like:

[ white skin spots summer ]

Just scanning this page and anti-reading (scanning for unknown words) lets you find "leucoderma" and "vitiligo." A great heuristic to use when learning about a topic for the first time is to spot those words that are unfamiliar, and then go figure out what they are. If you go to a landing-page (such as the one entitled "White patches of Skin - Dr Greene.com," you'll still find unfamiliar words, pityriasis alba, for instance and a few other words that might pique your interest.

Of course, your individual mileage may vary--but everyone is a newcomer at one topic or another. And it's a great trick to have in your repertoire as you go searching.

This is often an especially handy trick to have when looking for new words or terms to help your search. An easy example to use for this is to discover what the technical term is for a fresh water pond near the ocean in Hawai'i. Given this clue (now that you know they exist), can you figure out what this kind of pond this is?

Thursday, October 7, 2010

As you can see from the comments, it's not insoluble, but you DO have to start the search with some version of a search that describes the sound! There are really two approaches here...

Stephen started with the Wikipedia article on wind chimes even though I said it WASN'T a wind chime. I wasn't trying to misdirect anyone, but it was a great move. One thing to remember about Wikipedia articles is that they often include a large amount of disambiguation information, that is, ways to discriminate different variations on a them. So Stephen did a smart thing by going to Wikipedia-wind-chimes and reading it to find information about related-but-not-the-same-thing information. Nice move! He's right--at the end of the Wikipedia article about wind chimes is a reference to "Mark trees"... which is what this sound is.

By contrast, David and Fred both did what *I* did -- I started with a simple description of what I was hearing and then looking around for samples that sounded very much like what I was searching for.

The heuristic we all used was to find the closest thing by following a description, then hunting around in the space near by.

Bravo, folks! These are both excellent methods to solve difficult-to-describe search problems.

Let me summarize them as follows:

Use diambiguation information: And, as we've seen on more than one occasion, Wikipedia often offers a "disambiguation page" for similar terms or concepts (see, for example: discrete disambiguation)
In this example, going to Wikipedia for the closest term you can think of that's related to the object in question ("wind chime"). Even if you don't get exactly a perfect match (as Stephen did), you'll often learn a good deal in the process.

Browse around in a collection of clearly-related items. A good strategy to find something that you find difficult to describe is to look in a collection of items that are clearly related, picking up terms and concepts that might prove useful to you. In essence, you're looking for an album of samples that make a collection of near-hits. The more skill you have in describing your target, the better (and smaller) collection you'll have to search.

Wednesday, October 6, 2010

Here's a real challenge -- I'd rank this one as hard... but maybe you'll surprise me.

Situation: You're a beginning music composer, and you're listening to a recording of a piece of music. In the middle, there's the sound of a wonderful wind-chime thing... that's precisely what you want -- but what's it called?

You already know what most instruments are -- but you can't figure out this one.

What's the name of the instrument that makes this sound? What's the proper name for it?