Archives

Meta

TECH TALK: Constructing the Memex: From Yahoo

Let us begin by taking a look at how information management has evolved in the past decade thanks to the Internet.

In the Yahoo days, the directory was at the centre of the world. Websites were categorised by human editors into appropriate categories. The taxonomy was at the heart of finding pages. One had to drill down through multiple levels of categories to get to the one that seemed to be the one we were interested in. Then, we clicked through to the website and began our search for information there. When we came across good sites, we bookmarked them in our browser, so the next time we did not have to go through the directory once again. Hard to believe, but this was how we navigated the Web maybe 5-6 years ago.

Red Herrings October 1994 issue had this to say about Yahoo: Yahoo!’s value is obvious to anyone who’s surfed the Web, because it categorizes and creates paths to all the pages that are fit to read. As a vital directory, it’s virtually the operating system of the Internet. It is interesting to read what Yahoos founders, Jerry Yang and David Filo, said in an interview then:

The volume of information on the Internet is for practical purposes an infinite problem, because not only is the content itself exploding, but the existing content is changing all the time. If you don’t have a committed way of doing it, you can throw any amount of money at it and not solve the problem.” Therefore, nobody can be the final solution, and we are just one alternative. The goal, which is fairly modest, is to make the Internet intuitive for the user and to act as a starting point, not an end. It’s kind of a discovery experience. Our vision is to provide different ways of viewing that content, whether it’s through hierarchy or through a search or through customization.

Ultimately, the best tool is the human brain. Obviously, leveraging our users will be the best form of artificial intelligence The search part of it, whether visible or invisible, will be a big part of our operationsOur goal is to create an artificial intelligence library and list sites with different degrees of relevance, instead of just alphabetically. So sites that are definitely relevant are listed first, whereas others that may not be as relevant come after. But that’s going to be a manual editorial process over time, because I think that no amount of artificial intelligence can establish the inference needed. The more context you have, the better it is over time, but we’re not building context for context’s sake. If it’s one of those categories no one ever visits, why build context for it? The context-sensitive retrieval is very powerful if you can get it to work, but you have to manage people’s expectations.

Conceived by co-founders Jerry Yang and David Filo in a Stanford trailer in 1994, much of Yahoo’s popularity was built on the directory’s ability to give order and organization to the unruly Web. As legend has it, Yahoo was developed by Yang and Filo as a way to categorize their favorite sumo wrestling Web sites. Even the company name–originally the acronym “Yet Another Hierarchical Officious Oracle”–highlighted its directory roots.

Unlike the other search competitors that emerged in the mid-1990s, such as Excite, Lycos, Infoseek and AltaVista, Yahoo did not develop its technology to crawl through millions of Web sites. Instead, it hired humans to manually search the Web to find, organize and review sites about thousands of topics. Yahoo’s editorial team became an emblem of the Internet’s rise where legions of college graduates would do the heavy lifting to help Web newbies find what they want.