Friday, October 29, 2004

Xeni Jardin posted a "rough transcript" of a talk by Eric Schmidt (CEO of Google) today. An excerpt:

The average person does not want to debug their computer. We prefer instead the idea of a person typing something in and Google -- or someone else -- figuring things out for you. But very few things are organized around that principle of simplicity; we love and appreciate the complexity in technology but people using the internet really don't want that. When you see an ease of use breakthrough, it's such a wonderful thing.

Things should be simple. It should just work.

As obvious as this sounds, simplicity is not the focus of Google's competitors. For example, My Yahoo is focused on customization, an approach that requires a lot of work from its users. MSN Search leaked a design that seems headed toward more knobs to twiddle and more complexity. And even though Google knocked off AltaVista with a simple keyword search that just worked better, smaller competitors these days such as A9 and Snap.com seem to be focusing on complicated interfaces with lots of little buttons to push.

But Google gets it. The computer should help you, not require work from you.

Update: The Economist is running an article, "Keep it simple", that claims, "The next big thing in technology... [is] the conquest of complexity."

Wednesday, October 27, 2004

Niall Kennedy posts about Delicious Library, a clever and gorgeous Mac OSX application that keeps track of the books, music, and video titles you own. It's a fantastic use of the Amazon Web Services API.

Another interesting talk at University of Washington, this time by Dennis Lee from Amazon.com.

Amazon.com has grown from a small bookseller to a place where you can literally find discover and buy almost anything. As we have grown, we have learned key lessons by leveraging the data that our customers provide to us: where they click, what they search for, and what they buy. I will go over the evolution of our software platform from the early days to today. I will then highlight several areas across the company where we use data in interesting ways that reveal very rich yet unintuitive behavior.

Update: The archive for this talk is finally available. I watched the first part of it live, but then got booted off because of heavy traffic. Guess it was a popular talk. I'm looking forward to watching the rest of it.

Update: The talk was surprisingly light, more of an introduction to Amazon and its features than a technical talk about data mining. The section on A/B testing (the tests used to determine the impact of design changes and new features) was good and might be particularly interesting to many. There also was a quick overview of some personalization features and supply chain optimizations.

Google's metashopping search, Froogle, adds merchant reviews. The reviews seem to be spidered off other sites. For example, merchant reviews for Buy.com were from Bizrate, Shopping.com, and PriceGrabber.

There doesn't appear to be any way to enter your own review. That's too bad. Building a catalog of authoritative and trusted merchant and product reviews would make Froogle more compelling and would be a great differentiator. One of eBay's strengths is in being able to see the reputation of sellers, a type of merchant review. One of Amazon's advantages over other online merchants is the quality and depth of its product reviews. If Froogle is serious about helping people shop, it needs to help people find authoritative and reliable reviews.

Speaking of merchant reviews, there's still no merchant reviews on Google Local. Yahoo Local does have merchant reviews.

Merchant reviews will be a key feature for local search. It's hard to get authoritative, reliable reviews of local merchants. Services like Citysearch, ePinions, or Zagat attempt to provide reviews, but coverage is spotty and reliability uncertain. A well implemented local search with merchant reviews could change this space forever.

Tuesday, October 26, 2004

This Google Cheat Sheet in the Google Help is actually kind of interesting. I didn't know about the synonym search (e.g. "~auto loan" will allow auto to match car, truck, etc.) and the wildcards (e.g. "red * blue" will match occurrences of red and blue with one word in between).

Here's an interesting technical problem. You work at a major search engine crawling billions of pages. How do you identify that two sites are copies of each other?

This problem turns out to be quite important to relevance rank. If two sites are copies of each other, they'll tend to distort PageRank (and similar link-based relevance ranks). The obvious solution, computing pairwise similarity of all documents, is completely impractical on a dataset of this size.

This question also seems to be a popular in interviews at Google. When I talked to them over a year ago, three different people asked me how I'd solve this problem. At the time, I said I'd use a content-based technique that, briefly, would compute several signatures for samples of the text of the documents and then look for matches for those signatures.

Turns out our good friend Jeff Dean co-authored a paper (PDF) on this very topic. The paper analyzes the performance of several techniques for detecting mirrors, from simple approaches like the similar IP address or hostname to more complicated and quite clever analysis of the link structure of sites. The paper concludes that a content-based approach (called "shingles" in the paper) works well but that a combination of several approaches works best.

The paper is accessible. Worth a peek if you have any interest in the topic.

With a little sleuthing, Gary Price discovered what appears to be experimental prototypes for the new MSN Search.

Most curious is the use of sliders to modify the search results. Google Personalized Search has a similar slider feature, but I thought it was widely viewed as ineffective. Interesting to see MSN experimenting with it.

Update: Andrew Goodman thinks sliders could be big, giving users the ability to customize relevance rank as they wish.

I suspect sliders would be of interest only to power users, the kind of people who already use advanced search all the time. Most users want search to "just work". This kind of control is just a novelty, not something most would actually bother to use.

This discussion reminds me a bit of the debate between Udi Manber and Louis Monier at Web 2.0 on whether users want more powerful tools or just want the right thing to happen.

Monday, October 25, 2004

Jon Udell points to a talk by Malcolm Gladwell on the design of the Aeron chair and the "instability of preferences":

[Gladwell] draws several conclusions. One is that preferences are highly unstable. Another is that, when you ask people to explain what they want, their preferences tend to shift toward the conservative, familiar, and easy to explain.

This is a major problem with customization. When you rely on people telling you want they want, they often don't really know.

And it's even worse than that. If you ask people to tell you what they want, you're throwing up a hurdle. You're requiring work. The vast majority of people won't even bother to tell you. For example, on a website like My Yahoo, most people just use the default, uncustomized My Yahoo page.

In the few cases where they do bother, they'll often provide an incomplete or inaccurate description of their preferences. If you rely solely on this data, you'll get their preferences wrong.

And in the very few cases where they provide a complete and accurate description of their preferences, they'll often fail to maintain it. The preferences will become increasingly stale and inaccurate over time.

Had the Aeron designers listened to what people said their preferences were, they never would have produced this chair. But, at the same time people were saying the Aeron chair was ugly, sales were skyrocketing.

How do you know what people really want? Watch what they do and learn from their behavior.

Saturday, October 23, 2004

I think Google is orthogonal to Microsoft. Microsoft will be chasing their tail lights ... Google is saying the Internet is the computer  it's the Internet computer in the sky. And Microsoft's not getting it.

But one of the mistakes Google is making is applications like Gmail are great on the Google platform. But if Google was really paying attention, they'd say we have to have outside developers writing applications for Google. There should be 27 different e-mail systems using the Google infrastructure. And then they can become Microsoft. It will be interesting to see if Google will wake up and open themselves to developers. The (current) Google API is incredibly narrow and not open.

Great point on the Google API. If Google doesn't expand its API more soon, there's an opportunity here for Yahoo. But Amazon seems to get it. They've recently opened their API further and released an Alexa API.

Joel also speculates a bit on a Google browser:

I quoted a Microsoft guy (and Longhorn Avalon team member) named Joe Beda. I quoted him saying "Microsoft is making a big bet on the rich client." And now he works at Google with Adam Bosworth. I'm sure what they're doing is a new browser. It's the IE (Internet Explorer) team reconstructed inside Google.

Thursday, October 21, 2004

In Jeff Dean's talk today at University of Washington, he officially announced that Google is opening an office in Kirkland, WA. The office is right outside of Seattle and, more importantly, right near Microsoft.

Clearly, this move is intended to steal top talent from the Redmond giant. Especially after recently cutting benefits, Microsoft may have difficulty retaining key people in MS Research and MSN Search.

Amazon, located in Seattle, should also be concerned. Amusing, in some ways, since Amazon just opened an office in Google's backyard. Turnabout is fair play.

The search war is heating up!

Update: Let me clarify. What's new here is that this is a software development office in Kirkland. Yes, Google has had a sales team up in the Seattle area for a while. But Jeff said Google is opening a branch office for software engineers here in the Seattle area.

Jeff spent some time talking about MapReduce -- a custom programming model used at Google for rapid develop of robust parallel applications -- and pointed to his upcoming MapReduce paper at OSDI.

He also demoed Google's new word clustering work, which can find related words given a word or phrase. For example, for "rolling hash", it found words related to pot, but for "rolling hash function", it came up with "MD5" and other one-way hash functions. It was also able to find synonyms like "cuisine" for "cooking", though non-synonyms but still related words like "food" also showed up high in the clusters. Jeff said Google will use this data to improve relevance rank and help find search results that are clearly relevant to your search query but don't exactly match your search terms. This demo looked similar to what I heard of Peter Norvig's talk at Web 2.0.

This word clustering is great example of what you can do with massive amounts of data and processing power. We did a lot of similar things at Amazon.com.

These numbers are meaningless, and are simply blows in the search engine war, far too reminiscent of the mid-90s war. Google won the war by proving it was relevancy that mattered, not numbers. Yahoo is doing a good job fighting Google, but this fight is just on the wrong turf.

Exactly right. It's all about relevance.

You want to help people find what they need. The only way increasing index size would do that is if there's many cases where the index is missing an item that should be at the top of the search results. More likely is that most useful items are already in the index but just aren't being surfaced. In that case, the focus should be on relevance rank.

Update: Gary Price notes that "publicly announced pages totals can be useful but are often nothing more than a marketing tool" and that "relevant results are what counts."

Update: Jeremy taunts Google for having an out-of-date image index and boasts about Yahoo's 1B images.

Tuesday, October 19, 2004

Mark Glaser at OJR has a great article on deep linking, diversity, and automation on news sites.

There have been some tremendous changes recently. BBC is linking to related articles on other news sites. The New York Times is opening more of its archives. And even the Wall Street Journal is experimenting with providing access to more of its content.

"I think Google News has been a shot across the bow of all news originators, making us say 'hold on, there's a different way of doing this.' It's very easy to flip between different sources of news. We either try to reverse that trend, which is likely futile, or we facilitate it, and I'm keen that we take the latter route."

Deverell later talks about the advantages of diversity:

"People do not trust individual sources, no brand is trusted completely -- those days are over," he said. "And people value a range of perspectives. So, for instance, with a political story, we try to give a very impartial account of it, but then the left-wing press will give their perspective and the right-wing press will give a different perspective."

And the advantages of automation:

The one striking difference between News.com Extra and BBC Newstracker is that the former is created by human editors and the latter is largely a weighted algorithm similar to Google News. Deverell says having editor picks is a nice idea but is just too labor intensive, while the augmented Moreover feeds are "fantastically automated and therefore cheap."

Simply type in what you are looking for and Google Life Search will quickly locate that item. For example if I enter 'car keys' Life Search responds with the result 'In your pocket', and there they are right where it said! There is also a cached version to show you where your keys were."

Several regular contributors to the tech site Slashdot posted reports of suspected problems in the new search algorithm saying they entered the term "life" and were presented with a "Result Not Found" error. They were presented, however, with several ads for internet dating services and online gaming sites.

"This is the end of the personal privacy. Google now knows everything you read on the web, they search your e-mail and now they know about the dope hiding in your sock drawer."

Monday, October 18, 2004

In a BBC article on Google Desktop Search and personalization, some interesting quotes from Marissa Mayer (Director at Google):

"We think of this as the photographic memory of your computer."

"If there's anything you once saw on your computer screen, we think you should be able to find it again quickly."

This is more than just desktop search. This is clearly inspired by Memex, the "memory extender" described half a century ago. It also sounds similar to the goals of "Stuff I've Seen", a research project at Microsoft. Everything you've ever seen on your computer is easy available again.

Others (A9, Seruku, Ask, My Yahoo Search, Microsoft) have a similar vision, but this is the first time I've seen it from Google. Because of privacy concerns, Google has been resistant to maintaining this type of data in the past, but those concerns seem to be gone now.

Friday, October 15, 2004

Netflix announced a plan to cut prices to $18/month from $22/month, but that's going to be hard on Netflix if they want to maintain profitability and service quality. After all, they just raised their price from $20/month to $22/month back in June 2004, citing the need to improve service quality as the reason for the increase. It doesn't help that Blockbuster immediately cut their price on their Netflix-look-alike service to $17.49.

Netflix is very good at what they do, but going up against Amazon is going to be brutal for them. In The Long Tail, for example, Netflix and Amazon were talked about in the same breath, both satisfying customers with mass customization instead of mass market. Now they'll be in direct competition.

Update: The NYT E-commerce Report column today is "Amazon Rumor Ruffles DVD Rivals". A great overview of the upcoming entry of Amazon into DVD rentals and what it means for Netflix.

Don Park doubts that Google Desktop Search, or any desktop search for that matter, can ever be compelling because page rank doesn't work for files on your desktop:

The problem with desktop search is that, while the file system, email archives, and browser cache offers extra metadata, there are no hyperlinks among desktop documents. Without hyperlinks, you can't do page ranking Google is famous for.

The core problem here is that search engines like Google throws everything into one pot. For web search, all the web pages on the Net gets thrown into that pot. Thankfully, hyperlink-based pageranking pulls the good stuff to surface with minimal hassle. With desktop search, all of your documents gets thrown into the pot without an equivalent of page ranking to measure relevance. IMHO, there aren't enough metadata on the desktop to achieve the same level of utility Google web search offers.

Page rank is great, but I think this overstates its importance. All is not lost without page rank.

Approximate understanding of the content and context of document text can determine importance and relevance. Full natural language understanding isn't necessary; statistical analysis of the text and structure of documents can be sufficient.

Curious, though, that these concerns are only coming up now. Google is not the first to do desktop search. There's Hotbot/Lycos, Blinkx, X1, Copernic, Enfish, dtSearch, MSN (Lookout), Ask Jeeves (Tukaroo), and many others. Like the GMail privacy flap, Google seems to be held to a higher standard.

Thursday, October 14, 2004

Google Desktop Search has launched. Sounds remarkable. Not only does it index all your files and update continuously as files change, but also it indexes your browser cache, allowing you to search all the web pages you've seen before. It's integrated into your Google web searches as well, so any web searches you do at Google.com will also search your local drive and show up in the familiar Google web interface. Clever.

The Google Desktop Search works by installing a lightweight webserver on your local system, running on port 4664. Your indexed system is stored in encrypted files that you can (and should) make only accessible via SSL. The application runs on the localhost, and cannot be reached from remote systems. The end result is that searching your system feels just like using Google, and as previously mentioned, it hooks into regular Google, where you'll now see a "Desktop" tab.

Update: Wow. I've installed it, and I'm impressed. Easy and fast searches of your files, e-mail, and (IE only) web history, all with the familiar Google UI. Clearly, they need to support advanced query syntax, Mozilla/Firefox web history (the main browser I use), other e-mail programs, and MacOS/Linux, but this is a great start.

Update: Jon Udell has a cute proxy hack that allows Google Desktop Search to index the Firefox browse history. But this kind of hack really shouldn't be necessary. Google needs to get to it and support more than just IE/Outlook/WinXP.

Robin Good wrote up a long summary of the recent MSN Search Champs, an advisory session for which Microsoft pulled together some of the top people in search. The most interesting part is the "key recommendations". Here's an excerpt:

Relevancy is key.

Allow a high degree personalization of results is critical.

Improve on what we have come to expect from Amazon search results, recommendations and ratings is key to the future search.

Provide the ability to refine search results is highly desirable.

History of search activity, results, access and preferences is very valuable.

Friday, October 08, 2004

Daniel Steinberg has a good summary of all the goings on at Web 2.0. Don't miss the section on search personalization:

... personalization and improvements to the user interface are key to the future of search ... Under one percent of the public use any of the advanced features that many search engines offer. Louis Monier, director of eBay's Advanced Technology Group, said that enhancements to search cannot depend on training users to do more. Instead, he suggested, the metaphor is that you bring them the dish that they want but you also bring other dishes that they may be interested in.

The key is "understanding the intention of the user and enabling them to complete a task," added Jeff Weiner, a senior vice president at Yahoo.

Emergic.org has a great quote from a Fast Company article on execution:

Never is execution more important than when innovation is at the heart of a strategy.

That is because innovation always involves treading into uncertain waters. And as uncertainty rises, the value of a well-thought-out strategy drops. In fact, when pursuing entirely new business models, no amount of research can resolve the critical unknowns. All that strategy can do is give you a plausible starting point. From there, you must experiment, learn, and adapt.

Jerry Yang (co-founder of Yahoo) was asked at Web 2.0 about the cluttered Yahoo home page. His response:

Our biggest challenge it finding ways to let users know what we have. The home page needs to offer users more control. This isn't about broadcasting what everyone wants. But we still need a way to inform users about what is new at the same time. And we're trying to figure out how to unclutter. Everyone uses Yahoo differently. The UI may change in this broadband world.

There's another approach: show the content that matters. In an earlier post, I suggested that Yahoo users should be shown a combination of favorites -- the Yahoo features that you use frequently or that are very popular -- and a rotating selection of features I haven't seen -- helping me discover useful new functionality. I don't need to see everything every time I come to yahoo.com; I just need to see what I need to see.

Update: John Battelle, program chair of Web 2.0, posted his view of the Norvig talk and what it means for Google. Battelle says, "It's only a matter of time before Google ... [starts] employing your search history, your personal data, clustering, and other tricks [to] deliver more filtered and intentional results." Battelle also notes that John Doerr, board member of Google, said that Google will become "the Google that knows you."

Thursday, October 07, 2004

Everyone seems to the think the holy grail of next generation of search is personalization. That may be true. But not proactive personalization. People will not work to improve their search, nor should they. Search is not that fun. It's not you're private files. And it's not the kind of thing you're going to spend hours ranking, rating and sharing. I would focus on search that personalizes without the user having to do anything. That's sounds too obvious to even post. Apparently, it's not.

Exactly! It should be obvious, but people make this mistake over and over again. Personalization is not supposed to require work from users. The interruptions from the paperclip in MS Office. The effort required to set up My Yahoo. All the checkboxes to set up Google's personalized search. It's not supposed to be this way.

Personalization should be about making my life easier. Help me do what I want. Help me focus on what's important. But, please, please, please, stay out of my way.

Wednesday, October 06, 2004

Jeremy Zawodny wrote up notes on the "Search as a Platform, Where is it Going?" panel at Web 2.0. The panel looks well stocked: Steve Berkowitz (CEO of Ask.com), Udi Manber (CEO of A9), Louis Monier (head of R&D and Search at eBay), Christopher Payne (VP at MSN), and Jeff Weiner (SVP at Yahoo Search). Quite a group.

They seem to have talked quite a bit about personalized search, inferring user intent, and search as a dialogue. Several comments that personalization is hard to implement.

Update: A MP3 recording of this talk is available. It was interesting to hear (guessing from the voices) Udi Manber and Louis Monier argue about whether users want more more powerful tools (Udi) or just want the right thing to happen (Louis). If it's not clear from this weblog, I come down pretty hard on Louis' side.

Tuesday, October 05, 2004

Evan Williams posts about Snap, a new web search company unveiled at the Web 2.0 conference. It's unusual and worth trying. Lots of Alexa-like metrics provided for every search result and a few other tidbits.

Like many search engines, it suggests related queries when you do a search. Oddly, it recommended "greg linden is absolutely correct" as a related search when I did a vanity search. Obviously, this thing ain't working quite right.

Update: Rob's Blog reviews Snap. He wonders whether all the metrics and complicated UI really have broad appeal or if they're little more than a novelty. And, he dings it for a small index and long load times.

In "The Long Tail", Wired reporter Chris Anderson talks about how personalization is changing retailing. He starts with the example of the book "Touching the Void", a poor selling book that suddenly was in high demand when "Into Thin Air" became a bestseller:

What happened? In short, Amazon.com recommendations. The online bookseller's software noted patterns in buying behavior and suggested that readers who liked Into Thin Air would also like Touching the Void. People took the suggestion, agreed wholeheartedly, wrote rhapsodic reviews. More sales, more algorithm-fueled recommendations, and the positive feedback loop kicked in.

A few years ago, readers of Krakauer would never even have learned about Simpson's book - and if they had, they wouldn't have been able to find it. Amazon changed that. It created the Touching the Void phenomenon by combining infinite shelf space with real-time information about buying trends and public opinion.

This is not just a virtue of online booksellers; it is an example of an entirely new economic model for the media and entertainment industries, one that is just beginning to show its power. Unlimited selection is revealing truths about what consumers want and how they want to get it.

Chris then talks about the value of massive selection. For example, half of Amazon's book sales come from the back catalog (outside of the most popular 130k titles), the "long tail." But it's not just having massive selection. You need some way to help customers find and discover interesting titles in your massive catalog.

Netflix, where 60 percent of rentals come from recommendations, and Amazon do this with collaborative filtering, which uses the browsing and purchasing patterns of users to guide those who follow them ("Customers who bought this also bought ..."). In each, the aim is the same: Use recommendations to drive demand down the Long Tail.

This is the difference between push and pull, between broadcast and personalized taste. Long Tail business can treat consumers as individuals, offering mass customization as an alternative to mass-market fare.

So far, personalization has most successfully been applied in retail, helping customers find what they want in a massive catalog of items. Shouldn't it work elsewhere? There's a massive catalog of news articles I never normally see. Can't personalization help me find interesting news? There's interesting web sites I never find using Google. Shouldn't personalization be able to help me?

In fact, personalization can help anywhere there is a glut of information. It's an implicit search, built from your behavior, and an excellent way to supplement explicit search, where you already know what you want. Personalization provides focus, helping you find and discover what you need.

Monday, October 04, 2004

After playing with it a bit, it's pretty interesting, seems competitive with offerings from A9 and Ask Jeeves.

I was annoyed when it immediately asked me for a password on my first search (despite being logged in to My Yahoo). There's some simple Furl-like abilities like saving URLs combined with the nifty ability to search limited to your saved URLs. It does keep track of all your clickthroughs on search results, though you have to turn that feature on explicitly. And it supposedly retains all your previous searches and makes them accessible, like A9 and Ask, but I didn't see that when I used it unless I actually clicked through on one of the search result pages. With all the clicking, settings, and required logins, I thought it required more effort than it should to use (as does the new My Yahoo), but still found it useful.

One interesting and novel feature is the ability to block sites. For example, you can remove sites that are just search spam. Presumably, Yahoo can come along later and use the aggregate data about what people are blocking to improve search results for everyone. Clever.

So, we've got three folks -- Yahoo, Ask, and A9 -- with similar offerings. All keep track of your searches and clickthroughs with a few extras. It's a good start. But, it's not truly personalized search yet. They aren't modifying search results based on your personal history. There's still room for the bigger play.

Update: Chris Sherman has a great writeup on My Yahoo Search. We seem to have reached similar conclusions:

In all, the new My Yahoo Search is well implemented and easy to use, but doesn't offer compelling reasons to use it unless you're looking for what amounts to an enhanced bookmark utility that's tied to Yahoo search results. It's great to see companies like Yahoo and Ask Jeeves taking baby steps toward true personalization of search results. And I fully expect to see more robust features and enhancements to personal search from both search engines, probably in the very near future.

Update: Of the articles in the mainstream press, Michael Bazeley's article in the SJ Mercury News is particularly worth reading. A brief overview of Yahoo's new product and the current state and future of personalized search.

Alexa Web Information service allows for retrieval of site information such as popularity, related sites, detailed usage/traffic stats, supported character-set/locales, site contact information, meta data, and a list of links in and out of the site.

Rob Andrews noticed that BBC News now "adds links to similar stories on rival websites" to most of their news articles.

BBC says that the service "uses web search technology to identify content from other news websites that relates to a particular BBC story" and compares the technology to Google News. Sounds like basic keyword and category matching but, nevertheless, it's an interesting development.

From Gnomedex, Jeremy Zawodny reports that Google sees the same click through rate (CTR) on their text and graphic ads. Explaining their high CTR rates on text ads, folks at Google say, "It's all about the targeting."

As I said back in July when Microsoft cut benefits, exceptional benefits are often valued well above their monetary value. Especially when the benefits are viewed as outside of the norm, it can be seen as gift exchange, creating more of a friendship than purely economic relationship between the company and the employees. Put simply, people care about a company that cares about them. It's especially important when much of your workforce could go just about anywhere they want to.

Friday, October 01, 2004

Kevin Maney at USA Today writes of the "next big thing" for the World Wide Web:

A big part of the promise is that it will turn the Web around: Instead of having to find information or entertainment, it will find you  and be exactly what you want or need at that moment. The network becomes a butler.