A blog about search, search skills, teaching search, learning how to search, learning how to use Google effectively, learning how to do research. It also covers a good deal of sensemaking and information foraging.

Thursday, January 27, 2011

While there are many ways to solve this problem, I realized that the most likely authoritative source for this kind of information would be--guess what?--web pages on Russian sites. (Which is the method Tasha used in her solution.)

So I used the "Translated foreign pages" under the "More search tools" link in the left hand panel.

With the "Translated foreign pages" option selected, you can search in English, and the best matches (in whatever other languages) will be shown in the search results, translated.

In this case, since we're interested in the Vladivostok fish tonnage through the port, nearly all of the pages will be in Russian.

With just a few clicks, you can get multiple pages that list the total fish tonnage.

However... if you want to be clever, you could use the pattern-match feature and do:

[ vladivostok port " * tons of fish" 2010 ]

(Note that I used the asterisk as a pattern match in the quoted phrase. I figure that somewhere in an article about the fishing capacity of Vladivostok it would say something like "and the capacity of the port is XX tons of fish.")

The remarkable thing is that this pattern will match in English against Russian documents, and show one possible answer in the snippets...

Still, getting answers from snippets can be dangerous, so I clicked through to the top three to see what kind of page each was and to double-check that the value was correctly in context.

The first result is from Novostivl.ru, a newspaper with fairly extensive business coverage for Vladivostok and the region, they give the 2010 catch as 1.5M tons. The second result is from VestiRegion.ru (a Vladivostok newspaper) and confirms the port tonnage as 1.5M tons (1.6M for the entire region).

So my best guess is that it's 1.5M tons; several local news sources agree (and a quick extrapolation from the size of the Vladivostok harbor and the number of fishing vessels show that this is a pretty reasonable number... for vessels fishing in the north Pacific).

Search lesson: When looking for information from another country, often your best source is local news from the country of origin. And the fastest way to get that information is via Translated foreign pages. Don't forget this easy access to pages from other languages / other cultures.

Friday, January 21, 2011

As written, this isn't that difficult of a question. You can just look up Grand Hyatt Tokyo on Google Maps, then switch into StreetView to get a pretty good look at it. When you look at the map, it's easy to tell that the Hyatt is next to the Mori Towers, in the region of Tokyo known as Roppongi Hills.

You can check the Photos layer at that location, but I didn't spot any good images of the sculpture there. (That's often a great resource for questions like this because people will take pictures, geo-locate them and label the things in the picture. Not this time.)

So we found the first answer to my question...

What is it? It's a sculpture of a mini-landscape. By zooming in on the StreetView image, you can pretty easily see that it's supposed to be some kind of mountains, and you can (in some angles) see that there's a little waterfall in the sculpture.

But you know me--now that I've got that little bit, how could I figure out what this sculpture is called and who the sculptor was?

Getting this last little bit of information was tricky, but my solution leads to a great insight for real searchers and real search tasks.

After about 45 minutes of plugging away like this I realized that I was going to need a different strategy.

I sat back in my chair and though "Well... who would know what this thing is called?" and the answer hit me like a World Book encyclopedia dropped from 30,000 feet as the jet of thought passed overhead: the concierge at the hotel would know.

Now you might think this is cheating, but I don't think so. I'd tried about 20 searches, and when you've done that many searches, it's a strong indicator that something is wrong. You really need a different approach. And people are great information resources.

Besides, Google helped out here too. I just went to my GMail and used the "Call phone" function. I did the obvious search for [ Grand Hyatt Tokyo telephone ] and got the direct number in a couple of seconds. I asked for the concierge and had the answer about 15 seconds later.

If only I'd thought about this 44 minutes ago!

For the record, the concierge at the Grand Hyatt Tokyo (who was a lovely person and incredibly helpful, I might add) told me that the sculpture is entitled "High Mountain, Flowing Water" by Chinese artist Cai Guo-Qiang.

Armed with that information, it's an easy search:

[ high mountain flowing water cai guo-qiang ]

The first hit is the artist's web site, that tells us the full name of the sculpture is High Mountain Flowing Water: 3-D Landscape Painting, and was Mori Arts Center, Tokyo, commissioned by Mori Arts Center, Tokyo and put into place in Oct. 2003. And his website has this much better photo... from the parking lot, not from the street.

I wondered--is that even possible? Or is this a completely Photoshopped image?

First, how big is the ISS? Here's a picture from NASA to give a sense of scale:

In other words, it's about one flying-football-field.

How big would a football field look if if were flying at the altitude of the ISS?

Well... here's a bit of math I did to do a quick check on it.

I did a few obvious searches to figure out that the ISS is 109m maximum width and flies at an average distance of 350km. With these two bits of info and a little geometry, you can work out what the subtended linear angle of the ISS would have to be. The geometry is easy...

54.5m is half the ISS width, which you need as the base of the triangle. 278.000005 is the length of the hypotenuse according to Pythagoras.)

To work out the subtended angle, you just compute the arcsin ( 54.5m / 278.000005km ) -- that's about 0.011 degrees.

Luckily, this is really easy with Google.

How did I do that? Once I'd worked out the hypotenuse length (350.0004 km) I just turned to Google Calculator. Here's my query:

[ arcsin ( 54.5m / 350.0004 km ) in degrees ]

That is, this bit of trig computes "what is the subtended angle of the ISS?"

( NOTE: I added "in degrees" at the end because the Google Calculator gives back sin/cos/tan/arcsin... (etc) measurements in radians. But I wanted degrees, because I happen know (from another query) that the sun is 0.53 degrees wide. )

This Google Calculator expression tells us that half the ISS width is 0.008 degrees wide (remember, we divided the image in half in order to do the right-angle trig up above). So the subtended linear angle of the ISS at 350 km overhead is 0.016 degrees.

So...

... if you measure the photographed width of the ISS in the image, the width of the ISS image should be about 1/50th the width of the sun image. (How do I know what? The subtended linear angle of the sun is 0.53 degrees. Divide 0.53 by 0.01 and you get 53.

I measured it quickly by doing a little copy/pasting on the image (see below), and got it at 1/52cnd... close enough. (That's well within measurement error on my part.) Here's the diagram I drew to measure it. I took the image and drew a light blue line below the ISS that's exactly the width of the ISS. I then drew a bunch of dark blue bars that same width across the radius of the sun.

Each of the dark blue bars is 1 ISS width. Each of those light blue lines is 10 ISS widths across... so it looks to me like the sun is 52 ISS widths wide.

I also looked around a bit, and found multiple OTHER images made by other amateur astronomers, all showing the ISS at more-or-less the same size with respect to the sun.

Overall, the picture checks out. It's internally consistent (that's why I was measuring it) and it's been replicated by other astronomers. So yeah... I believe it.

Thursday, January 13, 2011

As I mentioned, this isn't that hard, but this is one of those cases when it helps to know about the use of double quote.

Generally speaking, double quotes, as in the query:

[ "Sea of Gazelles" ]

finds that exact phrase "Sea of Gazelles" (note that the capitalization doesn't make any difference).

Contrast this with the query:

[ Sea of Gazelles ]

...which is dominated by ships with the name Gazelle and gazelles, the antelope. When you see this kind of thing (that is, the exact phrase not begin found), that's your cue to try using double quotes to find the phrase.

Once you do it this way, you can find that the Bahr al Ghazal (translates as the "Sea of Gazelles") is a river in southern Sudan(in Arabic,بحر الغزال‎ ) that connects to the White Nile in the southeastern part of Sudan. The Bahr al Ghazal drains a basin larger in area than France, and although the drainage area is large, most of the water evaporates from the slow moving stream, and the discharge of the Bahr al Ghazal into the White Nile is minimal.

Locating the exact location of the "Sea of Gazelles" is tough--but we can find the White Nile and the location of Lake No, which is the place where the wide and broadly flowing Sea of Gazelles enters the collection point.

To find Lake No, I just went to Google Maps and did a search for [ Sudan ], then [ Lake No ] I did it in that sequence to make sure I got the "Lake No" in Sudan, rather than some other Lake No elsewhere in the world.

Now, to get the lat-long, I just activated one of the Lat-Long tools in Google Maps Labs.

To get this, click on the "New" button in upper toolbar, which will show you this set of options:

I activated the LatLng Tooltip, and then just shift+clicked on the map at Lake No.

So... the lat-long is pretty clear: 9.4883, 30.454.

It's handy to know that you can just type (or paste) a Lat-Long into Maps. (Ever wonder where 0,0 is?? Try it and find out. Or.. for a more interesting location: -10.4838, 105.6356 )

Take aways:

(1) Consider using double quotes to search for an exact phrase when your results are for other interpretations of your words. (2) When searching Maps for features in other countries, it's handy to search for the country first, THEN the specific feature within that country (to shortcut the problem of figuring out which of many alternates you really want). (3) Determine the Lat-Long of a point by activating the Lat-Long tool from Maps Labs.

Saturday, January 8, 2011

I knew this day would come, and it's finally arrived. We've been stumped.

Lots of people worked on this, and several regular readers (Mark, gnetiq, Fred, Ahniwa) all left good comments and leads. But nothing's worked out.

To tell you what I've been doing... I posted this problem on a few social answering sites (Yahoo Answers, Answerland) and on a listserv that's devoted to answering exactly this kind of problem--that is, identifying stories and books from the merest of clues. As it turns out, there's an entire community of librarians that face this problem every day--someone walks into the library and asks for "Poor and Wease" -- you have to know that what they meant is "War and Peace."

So I signed up for the Fiction_L listserv and posted my question there. (Side note: It's been really interesting to see the kinds of questions people ask librarians. "Can you find me novels set in Montana?" I recommend it as a source of ideas about how people think about their difficult search problems.)

SUSPENDED! But nothing has worked so far. So I'm going to declare this question SUSPENDED pending additional information or sudden discovery. If (when!) we finally figure this one out, I'll be sure to do an analysis of HOW we could have found it (assuming we could).

At this point, though, if I were a librarian, I'd be suspicious of the recollection. As we've discussed before, recall of movies / books / short stories is very error prone. It's not that individuals just have weak memories, it turns out we all recode what we experience, and then those recollections evolve over time. Memory for an incident (or book or ...) isn't fixed into memory like an etching into glass, but it's lightly written and subject to weathering--more like water colors on rice paper that's left in the sun and elements of subsequent experience.

As my friend Peter Pirolli points out in his work on Information Foraging Theory, you work on a problem until the expected utility of the next action (more searching, in our case) exceeds the total expected value of the work. That's when you give up--when it just doesn't seem as though more work is going to yield anything.

Pete's work was informed by some great studies of hummingbird's feeding in Alpine environments, where they have to choose when to move to another patch of flowers. They move when the next search isn't likely to be successful.

That's the situation we're in at the moment: this information hummingbird needs to move onto the next food source, the next item on the dinner menu.

Wednesday, January 5, 2011

This might well be a toughie. I've spent about 30 minutes so far without any luck. Can you do better?

My friend Josh writes:

I'm searching for a short story.

The plot: A king lives in a castle at the edge of a bottomless cliff. The taxes are too high, so there's a revolution and they kick him out. The king hints that there's a reason for those taxes, and the last scene has him riding away from the cliff as fast as he can go.

Tuesday, January 4, 2011

As I said, I didn't think it would be that hard. While my post on the Answer to last week's Search Challenge is fairly long, you really should see my notes! In the process of doing the research, I ended up with about 3X what I put into the post (including some truly wonderful things that the margin couldn't contain--such as the factoid that Charles Mentry died of an untreated insect bite... go figure).

In any case, I wanted to spend this post to reflect on what I did to answer the question, and what it tells us about doing this kind of historical research.

1. There's a lot of copying going on. I was dismayed to find key "facts" and "concepts" keep recurring in the exact same language. I only gave a hint about it in my post. Truth is, copying is rampant. And it suggests a strategy for researchers: Read the text carefully enough to recognize it if you see it again.

Copied texts almost always indicate a lack of careful scholarship, and while you can sometimes figure out where the original text came from, it's also sometimes so tangled that you really can't tell. Every time I see the same text claiming the same point, I lose a little confidence.

This was especially true when I first saw the claim that oil was discovered in Humboldt County "...which is in the Central Valley..." I knew that was wrong, so whenever I saw it repeated, I knew that the repeaters hadn't bothered to look at a map to validate their copying. (Note to researchers: When you see an easily verifiable claim like that, spend the 10 seconds to check it out. Suppose the phrase had been "... the Sudd, which is near the Mau Escarpment..."--would you have any idea that the Sudd is nowhere *near* the Mau Escarpment? You could look it up...)

2. Beware of articles that don't cite anything. I found lots of history enthusiasts that make all kinds of fantastic claims... except you can't track down any of what they claim. I did find some marvelous claims, but they were written in a gush of civic pride and boosterism. If the claim sounds plausible, I might be moved to contact the author and try to run it down--but hard experience tells me that often they can't support their claims. They believe them, they truly do, but that's not the same as having the smoking gun.

3. Be happy when you start to see lots of slightly different version of the story that corroborate each other, often with minor details added from version to version. In addition to all the repetitions about the Pico No. 4 oil strike, there was plenty of variety in the stories. Some talked about Mentry and the city that grew up around Pico No. 4 (Mentryville), while others talked about the development of the oil refining industry that came out of Pico No. 4. Point is, there were LOTS of supporting stories--plenty with pictures that had dates written on them at the time.

4. Don't give up too soon--try variations on the theme. I didn't initially believe the Union Mattolle Company story because my first reference had the name of the company misspelled as "Union Mattolle," and I couldn't find ANY other references to that company. But once I tried searching for just Mattole, I was able to find lots of hits.

Finally, a note about my notetaking.

There are a gazillion ways to take notes. For clear questions like this--"what date was the first..."--it's often simplest to keep your notes organized by time.

I just opened a text-editor and copy/pasted claims into section headings by years. I had one section for 1865, another for 1879, and so on. Then I'd copy the claim (along with any notes about the claim that I'd picked up along the way, such as questions I had about its credibility, or a note that it might be copied from elsewhere) into the section by year. I'd ALWAYS include the link to the source document. (Key idea: Don't count on being able to re-find the document again. I lost at least one good reference because I couldn't refind it!)

Then, once organized like that, putting together the blog post was straight-forward--just list the top most-plausible claims and give a bit of background on each. (While bearing in mind that the final product has to be salient and interesting.) That's why the final product is so much shorter than the notes. All those wonderful side lights and tidbits belong somewhere else, not in your masterful summary.

But now we're into the realm of writing, which is another topic for another time.

Sunday, January 2, 2011

This was way more complicated than I thought it was going to be. After all, shouldn’t it be fairly clear WHICH oil well was drilled first, and WHERE the oil was processed?

Answer: No. It’s complicated for two key reasons.

A. There’s a lot of not-so-great scholarship out there, even in books and what seem like primary sources. There’s a lot of repetition of errors that were introduced early on and somehow never corrected.

B. There are various interests at work, each arguing that THEIR particular oil strike was first.

When I first started working on this, I did the obvious search query:

[ California oil discovery ]

And found lots of resources, each willing to tell me where first oil was found, and a few willing to tell me where it was first refined (or “distilled” or “rectified,” depending on what kind of person was writing the text).

One of the first places I went was here, "Oil and Gas Production in California" in The Redlands Fortnightly, a "paper reading" club that has been active in topics social, agricultural and historic since 1895--I thought they would have a useful, historical perspective.

“In 1865, only 6 years after "Colonel" Edwin Drake's monumental discovery in Pennsylvania, California's first productive well was drilled by the Union Matolle Company in California's Central Valley. This area, east of San Francisco, became the scene of much of the drilling activity through the rest of the 1800's. While none of these wells were considered major strikes, they did provide enough oil for the nearby market of San Francisco, by far the largest population center in California in the late 1800's.”

This would seem conclusive. The only problem is (as I learned from other sources), the Union Matolle Company was active on the Matolle River area, in Humboldt County, which is nowhere NEAR the Central Valley. This isn’t a minor error, it’s off by a few hundred miles. (For the curious, the Union Matolle Company operated out of the town of Petrolia, previously known as Petrolea, a fact that complicates search.) So this entire reference is called into question. Yes, the Central Valley IS a major oil area, but not in the 1860s.

Take note of that phrase: “...California's first productive well was drilled by the Union Matolle Company in California's Central Valley...” If you do an exact phrase search on this, you’ll find it’s used in 25 different documents. Hmmm.

Another claim that’s made is that oil was first discovered in Southern California near Santa Clarita at the Pico No 4 site. The problem here is that most of THESE articles use the phrase “Many people may be surprised to learn that one of Southern California’s chief exports over the last 100 years, besides motion pictures, has been oil.” This is used in 37 different documents.

So something very fishy is going on here. I spent several hours looking through all kinds of articles, websites, original news archives and books to find out what was going on. One thing became strikingly clear—there is a HUGE amount of copying and repetition on this topic. This makes it difficult to figure out what actually counts.

So… after lots of searching, what did I discover, and how?

There seems to be four major claims in all of the web pages I found.

1. Oil driller Charles Alexander Mentry 1876 (aka Charles Alexander Mentrier) struck oil at Pico No. 4 in Pico Canyon in 1875, but a gusher drilled in 1876 was the first truly productive well. It continued to produce until 1990.

2. The Union Matolle Company 1865 struck oil in Humboldt County and shipped several barrels to San Francisco for refining.

3. Edward Doheny 1892 near present-day Dodger Stadium in downtown LA by using a sharpened eucalyptus tree trunk to drill down 460 feet.

4. Andreas Pico 1855--distilled small amounts of oil from Pico Canyon, but had a very limited market.

After finding each of these claims, I did a search to determine the origins (and repetitions) of each. As mentioned, the repetitions are numerous; getting to an original claim is surprising difficult.

The evidence for each:

1. Mentry—1876: From Chevron's corporate history"In September 1876, driller Alex Mentry succeeded in striking oil in Pico No. 4, despite rattlesnakes, wasps, mud and underbrush. The first successful oil well in California, Pico No. 4 launched California as an oil-producing state." The oil from Pico No. 4 was successfully refined in that year, according both to Chevron (who ought to know, as Pico No. 4 was their well, inherited from Standard Oil of California, their predecessor company) and continued to be productive for many years.

However, as a major oil company, Chevron has a vested interest in having "the first" or "the biggest," so while this claim is true enough, it turns on what "first successful" means.

From the Wikipedia article on "Pico No. 4": Well No. 4, the Pico Canyon Oilfield, located about seven miles (11 km) west of Newhall, California in the Santa Susana Mountains, was the first commercially successful oil well in the Western United States, [1][2] and is considered the birthplace of California's oil industry. Drilled in 1876, it turned nearby Newhall into a boomtown and also spawned a smaller boomtown called Mentryville adjacent to the drilling site.

[1] Nicholas Grudin (2003-08-03). "Ghosts of an Era: Mentryville Is a Monument to Both the Start and Decline of the Area's Oil Drilling Industry". Daily News (Los Angeles). ("Scofield formed California Star Oil Works, and with skilled oil man Alex Mentry, tapped the first commercial oil well in California - Pico No. 4.")[2] Jonathan Gaw (1993-02-21). "Oil in a Day's Work The Boom May Be Over, but a Few Wells Pump On". Los Angeles Times. ("Oil men had been groping around the canyons of the area since 1876, when the first commercially successful oil well west of Pennsylvania was built several miles south of Lechler's ranch in Pico Canyon."

2. Union Matolle Company—1865: Wikipedia goes with the Union Matolle company find in 1865, but interestingly completely ignores Mentry and Pico No. 4. It then goes on to say that the first productive oil in SoCal was the Brea-Olinda oil field in the 1880s. The reference for this claim is to W.A. Ver Wiebe (1950) North American and Middle Eastern Oil Fields, Wichita, Kans.: W.A. Ver Wiebe, p.198.

Alas, this text isn’t easily searchable, even though it's in Google Books (and the book is hard to find).

From: http://www.beachcalifornia.com/humboldt_county.html"California's first drilled oil wells that produced crude to be refined and sold commercially were located on the North Fork of the Mattole River approximately three miles east of here. The old Union Mattole Oil Company made its first shipment of oil from here, to a San Francisco refinery, in June 1865. Many old well heads remain today."

From: California Division of Mines Bulletin (v. 170, 1950)"[California] first oil by Union Mattole Company from a well near the Mattole River in Humboldt County" (p. 21) However, note-also - a "prospect well was drilled in 1861 on the Davis Ranch in Humboldt County." Apparently the prospect didn’t work out.

Another book: Early California Oil: a photographic history - -1865 - 1940 Kenny Franks, Paul Lambert (1985, Texas A&M University Press) "the birth of the California petroleum industry proper may be dated toMarch 25, 1865, with the first commerical sale of oil refined in the state ...""...the Union Mattole Oil company was incorporated... June 7, 1865, the first shipment was sent to San Francisco where it was distilled by the Stanford brothers...."

FWIW, Walter Stalder records that the Stanford Brothers refined and sold the first shipment of oil from the Mattole well, the first oil produced and refined from a California well. Reportedly, the refined “burning oil” sold for $1.40 per gallon. (Stalder, Walter A., November 12, 1941, Contribution to California oil and gas history: California Oil World. Reference found in: "History in California" from the Conservation Department of California.)

Finally... a really useful finding. Not only does the 1865 claim have multiple sources of support, but finding a price for the refined product lends a good deal of credibility.

1861: Discovery of oil in the Valley first publicized. 1864: All but a dozen or two of the least troublesome Natives killed or captured. Indian troubles considered over. In 1868 measles kills most survivors. 1865: First oil shipped out by Union Mattole Co. Principal town established and named “Petrolia.” Oil boom short-lived, though experimental drilling and subsequent oil excitement recur periodically.

Just to muddy things up a bit, an article in the Humboldt Times (Jan 23, 1907, by Leslie Gould) claims "several large companies began extensive oil operations in 1898." She goes on to say that oil will cause the Mattole valley to flourish, opening up ever larger markets for the export of tan bark (which she clearly thought would be the salvation of the region, with the benefit of 20/20 hindsight, we can see how wrong that was).

3. Doheny-1892: The Doheny claim is made in the book : Petroleum in California: a concise and reliable history of the oil industry by Lionel V. Redpath (1900). But this seems a bit unlikely to me. It has all the hallmarks of a somewhat tall tale (a well drilled 460 feet deep with a eucalyptus trunk?); this is especially odd when later reports claim that the oil-bearing strata was less than 100 feet deep in this region. Despite the title of the book, I find it difficult to believe this story, although it does get repeated (often--again--in exactly the same copy-and-paste language).

4. Pico-1855: Interestingly, this claim of early oil refining by Andreas Pico is ALSO made in that book (Petroleum in California: a concise and reliable history of the oil industry), but is also included in Hanks, Henry G., 1884, Minerals of California, in Fourth annual report of the State Mineralogist: San Francisco, California. State Mining Bureau. But all the articles agree—this was small time distillation, just for the kerosene lights of the mission at San Fernando to use as illumination.

And so....

My conclusion: The Union Matolle Company had the first commercial sales of refined oil from a California oil well in Humboldt County on the Matolle River in 1865. Alas, they didn't last terribly long (a year or two at most) and so the award for best long-term continued success goes to Pico No. 4, which began operations in 1876, with continuous production for over a century.