The Evolution of Search

Knowing where we’re going often means knowing where we’ve come from. The history of search engines is a short one, but one of constant change.

In today’s Whiteboard Friday, Danny Sullivan takes a look at how search has evolved into the complicated engine it’s become, and what that means for its neon-lit, rocket-car future.

Whiteboard Friday – Evolution of Search – Danny Sullivan – 20130610

For reference, here is a still image of this week’s whiteboard.

Video Transcription

Hey Moz fans. Welcome to Whiteboard Friday. I’m not Rand. I’m Danny Sullivan, the founding editor of SearchEngineLand.com and MarketingLand.com. Because it’s 8,000 degrees here in Seattle, Rand has decided not to be around, and I am here sweating like a pig, because I walked over here. So I’m very excited to be doing a Whiteboard Friday. This is my first solo one, and I’m told I have to do it in 11 minutes, and in 1.5 takes. No, just one take. The topic today will be the evolution of search, trademark Google. No, they don’t own search.

There was a time when they didn’t own search, which brings us to Search 1.0. Did you know, kids, that search engines used to be multiple, that we didn’t talk about Googling things? We actually used things like Alta Vista, Lycos, and WebCrawler. Do you remember those names? There were things like OpenText, and what was that other one, Magellan. Well, these were search engines that existed before Google, and they went out onto the web and they crawled up all the pages, about a dozen pages that existed at the time, and then we would do our searches and try to find how to rank them all up.

That was all determined by just the words that were out on the page. So if you wanted to rank well for, I don’t know, something like movies, you would put movies on your page 100 times in a row. Then if somebody else wanted to outrank you, they’d put movies on their page 150 times in a row, because a search engine said, “Hey, we think relevancy is all about the number of words of the page, and a little bit about the location of those words.” The words at the top of the page would count for a little bit more than if they were further on down below.

Bottom line is this was pretty easy to spam. The search engines didn’t really want you to be doing better for movies because you said the word “movies” 150 times over somebody who said it 100 times. They needed to come up with a better signal. That signal, they took their time getting around to.

Long story short, they weren’t making a lot of money off of search so they really didn’t pay attention to it. But Google, they were sitting over there thinking, “You know what? If we create a search engine, someday someone might make a movie with Owen Wilson and Vince Vaughn. So let’s go out there and come up with a better system,” and that brought us into Search 2.0.

We are now here. Search 2.0 started looking at things that we refer to as off-the-page ranking factors, because all of the on-the-page stuff was in the complete control of the publisher. The publisher could change it all around. There was even a time, when you used Infoseek, where you could submit a web page, and it was instantly added to the index, and you could see how well you ranked. If you didn’t like it, you’d instantly make a change and put it back out again. Then you could move up that way. So off-the-page kind of said, “Let’s go out there and get some recommendations from beyond the publisher and decide what other people think about these web pages, because maybe that’s less spammable and would give us better quality search results.”

By the way, I said not Yahoo over here, because I’m talking about search engines in terms of crawler-based search engines, the ones that use automation to go out there and find web pages. Yahoo for the longest time – well it feels that way to me – was a directory, or a human-based search engine where they listed stuff because some human being actually went to a website, wrote up a review, and added it.

Now back to Search 2.0, Google came along and started making much more use of something called link analysis. So the other search engines kind of played with it, but hadn’t really gotten the formula right and didn’t really depend on it so much.

But Google, new kid on the block, said, “We’re going to do this a lot. We’re going to consider links to be like votes, and people with a lot of links pointing at them, maybe they got a lot of votes, and we should count them a little bit higher and rank them better.” It wasn’t just in sheer amount of numbers, however. Google also then wanted to know who has the best votes, who is the real authority out there. So they tried to look at the quality of those links as well.

You’ve got other people who were doing some off-the-page stuff. One of them, you might recall, was by the name of Direct Hit. They actually looked at things like click through. They would look and they’d say, “Well, we’ve looked at 10 search results, and we can see that people are clicking on the third search result completely out of proportion to the normal way that we would expect. Rather than it getting say 20% of the clicks, it’s pulling 80% of the clicks.” That might tell them that they should move it up to number one, and then they could move things that were down a bit further.

These are some of the things that we started doing, but it was really links that carried us along for about a decade. Now links, off-the-page stuff, that’s been powering and still to this day kind of powers the web search results and how they start ranking better, but we have a little bit of an intermission, which we would call or I call Search 3.0. By the way, I made all this stuff up, so you can disagree with it or you can figure out however you want to kind of go with it. But a few years ago I was trying to explain how I had seen the evolution of search and some of these changes that were coming along.

What happened in this Search 3.0 era is that, even though we were using these links and we were getting better quality results, it was also so much information that was coming in that the signals alone weren’t enough. You needed another way to get more relevancy, and the way the search engines started doing that was saying, “Let’s take, instead of having you search through 100 billion pages, let you search through a smaller collection of pages of just focused content.” That’s called vertical search.

Now in horizontal search, you’d do a search for things like news, sports, entertainment, shopping, and you just throw it all into one big search box. It goes out there, and it tries to come back with all the pages from across the web that it thinks is relevant to whatever you searched for. In vertical search, it’s like a vertical slice, and that vertical slice of the web is just only the news content. Then when you do a search for something like NSA, it’s only going to look through the news content to find the answers about news that is relating to the NSA at the moment. Not trying to go over there and see if maybe there is some sports information or shopping information that may match up with that as well.

That’s important right now, by the way. You have all this talk about something like PRISM that is happening. It’s a spy program or an eavesdropping program or a data mining program, depending on who you want to talk to, that the US government is running. Prism is also something that you use just to filter light, and so if you are doing a search and you are just trying to get information about filtering light, you probably don’t want to turn to a news search engine because right now the news stuff is full of the PRISM stuff. On the other hand, if you want the latest stuff that is happening just within this whole Prism area, then turning to the news search engine is important, because you won’t get all of the other stuff that is not necessarily related.

So we have this Search 3.0 thing, vertical search, and Google, in particular, referred to it as universal search. Trying to solve that problem that, if someone types into a box “pictures of flowers,” they should actually show you pictures of flowers, rather than 10 links that lead you to maybe pictures of flowers. Now we’re pretty solid on this right now. Bing does these sorts of things as well. They have their own blending that goes on there.

Then it’s Search 4.0. Now we are here, or right here just because I feel compelled to write something on that board. Search 4.0 is kind of a return to what Yahoo over here was using, which was human beings. By the way, I don’t write very much anymore because the typing thing.

To refer to using human beings, one of the biggest things that has happened with search engines is that they, in a very short period, completely changed how we sought out information. For thousands of years, if you needed to know something, you talked to a human being. Even when we had libraries and people had all that kind of data, typically you would go into a library and you would talk to a librarian and say, “Hey, I’m trying to find some information about such and such.” Or you would need a plumber, you would ask somebody, “Hey, you know a good plumber?” Babysitter, doctor, or is this a good product? Does anybody know this TV? Does this work well? Should I buy that? You would tend to turn to human beings or things that were written by human beings.

Then all of a sudden we had these search engines come along, and they just took all these pages out there, and they really weren’t using a huge amount of human data. Yeah, the links were put in there by human beings. Yeah, some human being had to write the content as well, but we kind of lost another aspect of the human element that was out there, the recommendations that were out there en masse.

That is kind of what has been going on with Search 4.0. The first thing that is going on with Search 4.0 is that they started looking at the things that we had searched for over time. If they can tell that you constantly go back to say a computing site, like Diverge or CNET, then they might say, “Well, the next time you search for something, let me give the weight of those sites a little bit higher bump, because you really seem to like the stuff that’s there. So let’s kind of reward them in that regard.” Or “I can see that you’re searching for travel right now, and I can see that you just searched for New York. Rather than me pretend that these things are disconnected, let me put them together on your subsequent searches because you are probably looking for information about New York travel, even though you didn’t put in all those words. So I’ll take use of your history that’s going there.”

The other thing that they have been doing, and some of this mixes across in the earlier times, but they are looking at your location. You do a search for football in the UK, you really don’t want to get information about the NFL for the most part. You want information about what Americans would call soccer. So looking and knowing that you’re in the UK when you do a search for football, it helps the search engine say, “We should go through and we should just come up with information that is relevant to the UK, or relevant to the US, based on where you’re at.” That greatly changed though, and these days it goes down even to your metropolitan area. You do a search for zoos, you’re in Seattle, you’re going to get information about zoos that are in Seattle rather than the Washington Zoo, or zoos that are in Detroit or so on.

The last thing, the really, really exciting thing is the use of social, which the search engines are still trying to get their head around. I talked earlier about the idea of links as being like votes, and I always like to use this analogy that, if links are like votes and links are somehow the democracy of the web, which is how Google still will describe them on some of their pages, then the democracy of the web is how the democracy in the United States started when to vote, you had to be 25 years and older, white, and own property. That wasn’t really representative of everybody that was out there.

In order for you to vote in this kind of system, you really have to say, “Wow, that was a great restaurant I went to. I want to go through now and I want to write a blog post about that restaurant, and I’m going to link to the restaurant, and I’m going to make sure that when I link to it, I’m going to use a platform that doesn’t automatically put things like no follow on top of the link so that the link doesn’t pass credit. Oh, and because it’s a great restaurant, I’m going to remember to make sure that the anchor text, or the words near the anchor text, say things like great restaurant because I need to make sure that the link is relevant and passing along that kind of context. Now when I’ve done all that, I’ve cast my vote.”

Probably the 99 other people that went to the restaurant are not going to do that. But what those people are likely to do is like it on Facebook, plus it on Google+, make a recommendation on Yelp, use any one of the number of social systems that effectively enable people to vote much more easily. So I think a lot of the future where we are going to be going is in this social direction. These social signals are very, very important in the future as to how the search engines will determine what are the best pages that are out there.

Unfortunately, they’ve put so much into this whole link system and figuring out that this is a good link, this is a bad link, this is a link that we are going to disavow, this is a link that you disavowed, and so on and so on and so on, that they still need to work on making all this social stuff better. That’s going to become important as well. Not saying the links are going to go away, but I think the social stuff is going to be coming up much more heavily as we go forward into the future.

Now on the way up here I was thinking, because I was asked, “Will you talk about the evolution of search?” I’m like, “Yeah, no problem because I’ve done this whole Search 1 through 4 thing before.” There’s a whole blog post if you search for Search 4.0. Search for Search 4.0 and you’ll find it.

I was thinking, “What is coming after that?” On the way up, as I was sweating coming up the staircase, not the staircase here. There’s a staircase, because I was at sea level and I had to apparently climb up to 300 feet here, where we are located in the Moz building. If there was a swear jar, I would put a dollar into it.

Search 5.0, and this is really about search where it’s no page at all. Remember on-the-page factors, off-the-page factors, which are really off this page but on some other page, this stuff is I don’t even care that it’s a page. I did a blog post, and I can’t remember the title of it. But if you search for “Google conversational search,” you’ll find it. If you don’t find it, clearly Google is a very bad search engine.

In the conversational search thing that I was demonstrating, if you have Chrome and you click on the microphone, you can talk to Google now on your desktop, kind of like how you can do it on the phone. You can say, “Barack Obama,” and Google will come along and it will show you results for Barack Obama, and it will talk back to you and say, “Barack Obama is President of the United States,” blah blah blah blah. It gives you a little box for him, and he appears and there is a little description they pull from Wikipedia.

Then you can say to it, “How old is he,” or something very similar to that. Then the search engine will come back, Google will come back and will say, “Barack Obama is . . .” I can’t remember how old he is. But you should Google it and use that voice search thing. It will come back and say Barack Obama is this age. You can go further and say, ‘Well, how tall is he?” It will say, “Barack Obama is . . .” I think he is 6 foot 1. And you say, “Who is he married to?” Then it comes back and it says, “Barack Obama is married to Michelle Obama.” And you say, “How old is she?” Then Google will come back and say, “It’s really an impolite thing to ask a woman, but she’s a certain age.” I believe 39. Yeah, you’re usually safe with that.

To do all of that it has to understand that Barack Obama, when you searched for him, wasn’t just these letters on a web page. It had to understand that he is a person, that he is an entity, if you will, a person, place, or thing, a noun, but an entity, that there is a thing out there called Barack Obama that it can link up to and know about. When you ask for its age, and you said, “How old is he,” it had to understand that “he” wasn’t just words, but that actually “he” refers to an entity that you had specified before, the entity being Barack Obama. When you said, “his age,” that age wasn’t just a bunch of letters that match on a web page, but age is equal to a value that it knows of because Barack Obama has an age value over here, and it’s connecting it there.

When you said, “How tall is he,” same thing. That tall wasn’t just letters, but tall is actually a height element that it knows. That says height, trust me. When you said, “Who’s his wife,” that wife, with an f kids, not a v, later we’ll do potatoes without an e, that his wife is a person that is equal to spouse, which is a thing that it understands, an entity. It’s not just words again. It’s like a thing that it actually understands, and that actually that that is Michelle and that she has all of these things about her, and [inaudible 15:38]. All those sorts of things along there.

That is much different than Search 1.0 where, when we were searching, we were really just looking for letters on a page. When you typed in “movies,” its going, “How many pages out there do I have that have these six letters in this order? Start counting them up and putting it together.”

We are looking for entities, and that the Google knowledge graph is that kind of demonstration of where things are going to be going forward. That’s all very exciting as well, because, for one thing as a marketer, it’s always exciting when your space changes because if you’re staying on top of things and you’re seeing where it’s going, there are always new opportunities that come along. It’s also exciting because some of these things are broken and they don’t work as well, so this has the opportunity to better reward things that are coming along.

It’s a little scary though because as Google learns about entities and it learns about things like facts, it also decides that, “You know what, you’re looking for movies in a place. I have a database of all those movies. I no longer need to point at a web page that has that sort of stuff.” The big takeaway from that is, if your job is just creating web pages that are all about known facts that are out there, it’s going to get harder, because people are no longer going to get pointed to you facts that are off of Google. People are going to get pointed to facts that Google can answer directly. Your job is to make sure that you always have the information that Google doesn’t have, the facts that aren’t easily found that are out there.

As for Search 6.0, it involved this PRISM system, but we can’t talk about that anymore, so that’s sort of gone away, and we’ll leave that off. In a few years from, it won’t make any sense. Right now, hopefully, it’s still very timely.

I think that’s probably it. So I thank you for your indulgence with my first solo Whiteboard Friday. I hope didn’t go too fast. I hope that all makes sense, and thank you very much.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!