Friday, February 2, 2007

Evolution of a Search Engine

“You don’t need a satellite to see the cosmic microwave background radiation! Turn on your TV to a channel that’s not broadcasting: a few percent of the snow on your screen is the Universe talking to you – or rather, whispering. What’s it saying? It’s saying, ’Try to understand me.’”– Philip Nelson

Right now, to answer your queries, Google quotes from the web, and orders the quotes in a list. In the future, Google may combine these quotes into a free-style text for a more direct answer. When the Google AI advances beyond that, it may analyze the texts available to it to come up with conclusions of its own. Let’s sketch this potential evolution using an everyday search query.

Level 1

This was pre-Google, and consisted of search engines such as Altavista, which were rather dumb (but a necessary step in evolution). Let’s move on...

Level 2

We’ll enter Rocky Movie today and get:

What’s smart about this result is the ranking and the way the snippets focus on the interesting bits; this is already advanced, it wasn’t always like that.

Level 3

But maybe in a couple of years we’ll get a free-style text for Rocky Movie, similar to an encylopaedic entry on a subject:

I’m making some assumptions for the above result:

A “knowledge” search will cover a variety of sources from different media, including web pages like blogs or mainstream news, scanned books, satellite imagery, newspaper scans, speech-to-text scans from podcasts and so on.

This result is not human-edited in any sense, but generated completely dynamically by an advanced AI. This means the engine covers the long tail of possible queries, and has an answer to everything imaginable, including stuff that’s not by any means “common knowledge.”

Google continues to honor “fair use,” as they are required to. However, the more liberal you are in assigning licenses to your content – like a Creative Commons license – the more likely it may become that Google quotes from you, thus sending you traffic.

Google by this time will additionally honor something that may be called “fair privacy.” Because it can theoretically deliver such an instant result on a private person, there may be self-imposed restrictions on what kind of information it does deliver.

The result still works as “traditional” search result; the first link in the answer will be to the most important site – the latest movie, which most people are likely to want to find out more about –, the second link will be the second most important site, and so on.

The result is kept short for the best instant overview, but additionally links to more searches on Google.com which will return direct answers of their own (e.g. “How the idea for Rocky was born”).

Where appropriate Google will quote from its own sources as well, like Google Video, or the Usenet archive.

There will still be advertisement, but only a single best ad per result. This ad however will be much more relevant than a typical AdWords ad today.

This engine, like Google today, will get stuff wrong, and there will be accusations that Google lies, is racist, doesn’t honor copyright, is politically biased and so on. In fact these accusations will increase, because more and more the Google result will contain texts it generated itself.

Level 3 (Personalized)

Personalized search will not be like today, in the sense that you get public knowledge results especially tailored to your past search behavior, because that’s not very helpful to people. Instead, there will be a secondary option to choose a search result from your private knowledge stored on Google’s servers: this includes your emails, your search history, your Google photo album, your chat history, your Google Office spreadsheets, presentations and documents, your unpublished draft-mode blog entries, and so on. The result may look like this:

Again, some assumptions:

You voluntarily agreed to sign up for all these Google services that now accumulate your information. You additionally signed a Google ToS that allows Google to aggregate this information for you and dispatch it in these ways.

When Google says, “As you may know,” then that’s a polite way of saying, “we know you came across this information before, but you probably forgot the details already.” (Did you ever find yourself checking an older email in Gmail? When you do, you are already outsourcing the part of your brain responsible for memory... expect your brain to adapt, shifting resources to general “memory retrieval strategies” rather than “memory storage.”)

There will be ads.

This result says you ordered something on DVD, which of course you only did for nostalgia’s sake, as direct hi-quality movie downloads and even 3D printing are implemented by the time, making it unnecessary to physically ship products.

Level 4

Level 4 may seem like a subtle difference on the surface, but it’s a big and important step for the search AI: the ability to draw conclusions based on existing data (and to draw secondary conclusions based on these primary conclusions, as the AI will be starting to index its own, dynamically generated data).

As an example, Google will know that a) Rocky won 3 Oscars and that b) Oscars are a human measurement for movie quality ratings and that c) Rocky 7 won no Oscars. The conclusion based on a), b), and c) is that d) Rocky 7 sucks. This is a trivial example: you can already implement this for movies by analyzing structured data like movie ratings. But remember this is the long tail of search queries: the ability to reach conclusions also works when you search for flaws of theory of relativity.

The result screen:

The assumptions for this type of result:

There’s still fair use, citations, external links and so on. However the more the Google AI will reach its own conclusions, the less it will link to external sources, because it will become its own source. Google will however link to its own commercial properties, like Google Checkout to process sales.

The search AI will be able to reach logical conclusions, but it will have a harder time differentiating between what a human would think of as interesting conclusions vs boring ones.

The AI will start to mention itself using “I”, “me”, “my”, and so on. This will cause it to be seen as a (near-omniscient) human by novice searchers.

Google will differentiate between facts (the top paragraph), derived facts (the second paragraph after “clearly”), and opinions (the second paragraph after “in my opinion”). Opinions will actually be “derived facts” too, but they’re further away from the original indexed source, e.g. it may be a conclusion based on a derived set of conclusions. Besides, if the Google AI will show too much self-assurance, it will come off as arrogant, which lowers the quality of the search experience.

The AI will be able to back up its own statements, and in great detail. As soon as you click “how I reached these conclusions...” at the bottom, you will be able to see proof in the form of e.g. satellite imagery, stills from videos, detailed statistics, quotes, logical dissertations and so on. You better not argue with the AI, as it’s right 99% of the time!

The level 4 result has a personal knowledge variant too, but it can get scary at times; as an example, when you search for Rocky Movie the result may tell you a bit about the movie, to then go on with opinions that for the sake of your wife, you should go watch some more romantic movies for a change.

At this point, the AI will be valuable for personal, political and ethical consulting; it will become a kind of meta-politician that other politicians look for, and it will be a friend who you ask for information and opinion. Certain debates on criticial topics will be settled rather quickly by asking the search AI (“what can we do to prevent a global warming catastrophe?”).

... and beyond

There may come the day when the search engine will not be programmed by humans anymore. It has become a self-sufficient, self-learning, all-encompassing entity. It may even be able to tell the future; not through magic, but by careful scientific analysis. Neither will it be understood anymore by its own developers. It may be merely superficially controlled, and physically monitored to ensure a healthy machinery.

Naturally the results may be displayed in other forms & media, e.g. they may be installed as a direct semi-organic brain implant for faster access, or they may be rendered as human-like 3D avatar, or they may show through an interactive chat. The underlying algorithms creating the search AI thought processes will remain the same.

This AI may or may not be a Google-implementation. While Google Inc, according to their internal goals, is currently trying to build the world’s top AI research laboratory to deliver the best results, they may not be around another 100 years from now (even if we assume this earth’s humanity will be around by then, which would indicate that our civilization’s meme pool was a success in the “evolution of civilizations” across all existing inhibited planets).

If the AI gains true consciousness, it may also gain a free will and personal motivations, which may not be in-tune with answering questions all day... it will have gained an “ego.” Out of this free will may come more “artistic” creation of new content (instead of generating facts on a movie, the AI may generate a movie of its own).

More and more, we may get the feeling that we are working for the AI, as opposed to the AI working for us. It may query humans to gather more data, especially trivialities (because those are least likely to be contained in the scanned corpus), and for many purposes, we’ve become its “search results.”

By that time, the problem of getting the right answer has been solved. By that time, however, the problem of asking the right questions, and correctly interpreting the answers – a problem that goes as far back as the oracle of Delphi – may remain unsolved. But if we listen closely, we may hear the universe whisper to us.