Take the lead out…

Tag: google

Google suggest is a handy feature that displays a list of likely search terms as you are typing them. This feature is usefull for several reasons:

It often stops you from having to type your entire search phrase in

It’s a quick way to check the spelling on a word

It gives you an idea of what a good search phrase is that is related to what your after

This feature has been around in google labs for awhile. A form of it is also used in the search bar of firefox (although the search bar only displays phrases, and not the number of results each will return). So the feature itself isn’t new, but making it the default on the homepage for all users (even guests) is a big deal.

I think it is a good move. Of course they want their homepage lite and responsive, but I believe the functionality is worth the extra pageload. Beside, an Ajax search like this really doesn’t take much javascript when done right. In fact, I recently implemented a similar feature for searching tags on able2know.

I implemented it without the help of javascript libraries, although I do have a javascript file I’ve written that contains a small set of usefull tools, such as targeting elements, sending ajax requests, or validating json data. Using these helper functions, my search code becomes this simple:

Research firms Neilsen Online and Hitwise have released traffic information that shows strong Google growth in the US market. Google’s audience through its variety of web properties grew by a million from June to July to 129 million (Google has the largest audience in the US through its properties like Google.com and YouTube.com). But even more damning for its rivals (namely Microsoft and Yahoo), and arguably for the webmaster community, is the information Hitwise released today about the most coveted traffic of all: search traffic. Hitwise announced that Google has crossed the 70% mark (they cite 70.77%) which is up 10% from July of last year and 2% from June of this year. Here’s the rest of the search landscape that Google dominates:

Google’s search algorithms get a lot of play and not enough people are paying attention to the fact that Google’s contextual ad network still enjoys a technical superiority to its peers that it long lost in the search relevancy algorithm. Simply put, some search competitors are doing a decent job with search relevancy but still seem to be nowhere when it comes to serving relevant ads.

I’d like to share some of my thoughts on the Adsense algorithm, which I will revisit in detail in the future. Given the secrecy of the sauce I will not try to prove what is and what is not the Google Adsense algorithm and will take the approach that any SEO worth his salt should and speculate as to what the algorithm should be, and what current technology is capable of.

At its simplest level I believe the algorithm should, and likely does, work like this:

Use clickstream data to determine what the user might be interested in and serve an ad that may not be contextually relevant.

Use basic demographic data (e.g. geolocation) to attempt to target ad relevance to the user.

The premise is simple, the context of the page is a strong indication about what the user will click on and is the first priority of the algorithm. You may know that the user was interested in other, potentially more profitable, subjects but that the user is on that page now is a fairly good indication of what the user is interested in at that particular moment.

But then again it isn’t always the case, and clickstream data can help identify what the user is really interested in. For example, the user’s previous searches can indicate what is really meant for the query “apple”, but even more immediately relevant is that Google often knows where you were right before you got to the page. And with increasing frequency, it was Google itself.

This is the single biggest reason that clickstream data must be a part of the Google algorithm. It’s much easier to determine context from a user-input query. That’s why other search engines are starting to compete with Google in relevance on most queries. If Google knows what the user searched for before clicking on this page they have a variable that rivals page context in relevance to the user. If they know the user searched for “buy a green widget in San Diego” and landed on a general page about green widgets they would be foolish not to use the additional context they know about the user (the location specific subset that they are looking for) in their attempt to serve an ad the user is most likely to click.

The “session context” as I, as of a moment ago, like to call it in the clickstream would be weighed heavily with page context in my algo, and historic clickstream data would follow at a distance. If you know they area always looking for certain widgets and you don’t have a great ad for the page or session context then an ad about what the user has expressed past interest in is the next best thing. Google has a lot of clickstream data from their own web properties as well as others through sites running Adsense itself as well as their free log analytics service they provide to webmasters in exchange for the data. For example, they could know what you searched for on Yahoo when you land on a page with their own ads or log tracking and it’s precisely such examples that they can use to their benefit. Search history presents the easiest contextualization opportunities because the user has given the context in a string. Other clickstream data requires a lot more guesswork and for these reasons I think that Google should, and does, focus on mainly search related clickstream data. Given my read on their corporate culture, I’m not sure if they are doing this outside of their own web properties, as in my Yahoo example, but they should and I can’t imagine that they don’t for their own search engine.

Lastly you can throw anything else you know about the user. You have the IP and can map to geodata in a simple example, like showing the user an ad for a local restaurant. And you can even get fancy and use aggregate trends (e.g. people searching in a certain area might be in town for a certain reason, come up with your own specifics) and other logical deductions (i.e. “wild guesses” like searching in English from Mexico might mean you are interested in a hotel). I think focusing your efforts is a big part of the end result of this kind of work and believe that if Google uses any of this fall back data they do it simply. Why spend time on a complicated algorithm to generate poor guesses when you can spend more time nailing the real priorities like page context?

In another post, I’ll break down the on-page context algorithm possibilities but I’m out of time for today.

Google announced a number of new options for advertisers in the content network that will have a big impact on AdWords advertisers and AdSense publishers as they begin to integrate their acquisition DoubleClick into their existing ad networks.

Google completed the acquisition of the display advertising giant on March 11th, 2008 with the aim of bolstering its display advertising presence on the web. With the overwhelming majority of their revenue coming from text advertising DoubleClick’s multimedia strengths were deemed a good fit to the tune of a 3.1 billion cash acquisition offer on April 14th, 2007. Earlier this year the regulatory hurdles were cleared and Google’s advertisers are beginning to see the end result of the merging of these ad platforms. The additional options for the advertisers may compel companies who were wary of the Google content network with its legion of AdSense publishers to give the content networks a new try. Here are the options Google announced:

Frequency Capping: Enables advertisers to control the number of times a user sees an ad. Users will have a better experience on Google content network sites because they will no longer see the same ad over and over again.

Frequency Reporting: Provides insight into the number of people who have seen an ad campaign, and how many times, on average, people are seeing these ads.

View-Through Conversions: Enables advertisers to gain insights on how many users visited their sites after seeing an ad. This helps advertisers determine the best places to advertise so users will see more relevant ads.

Today Google Insights launched, a tool developed for AdWords advertisers to better understand trends in search terms. You can use this tool to compare the traffic for a keyword or phrase and filter by vertical (Category) and region. This is useful for search marketing professionals both for their efforts in PPC and natural results marketing. In both cases knowing the search volume is one of the most important strategic variables, after all why spend time and money on terms with less traffic than others that you can work or spend on?

In the past, Overture was the most reliable way to get free query volume information from one of the major search engines. But they have discontinued their tool and Google has been releasing more search volume data around their AdWords PPC product and now has several of the most important keyword research tools for your webmaster arsenal.

Google has been collecting searchers browsing and searching habits for years. They have their search logs, clickstream information for every site serving their AdSense ads, and for every site using their free web analytics program they have information on the browsing history for their toolbar users who do not opt out, and of all the users of their Web Accelerator proxy. And unlike many other companies that collect user data so Google actually uses their data in fundamental ways. So it comes as no surprise that they’d want to find ways to employ user information in their search algorithms. Clickstream data and folksonomy are some of the big areas that search algorithms are expected to use. Right now all major search engines use ranking algorithms that are primarily based on the Pagerank concept Google introduced and became famous for. They all use links on the web to establish authority, and no fundamental change has taken place in the evolving search algorithms in many years. They get better at filtering malicious manipulation and at tweaks that eek out relevancy but nothing groundbreaking.

So authority based on clickstream analysis and social indexing seemed like good ways to use data to further diversify the effort to allocate authority to web pages. What Google learned early is that they needed scale, and their initial data efforts (things like site ratings by their toolbar users) didn’t end up in their search algorithm. Folksonomy and social indexing doesn’t yet have enough scale to rely on and has potential for abuse, but the clickstream has scale and is harder to game given that traffic is essentially the authority and the people gaming the authority want traffic. So if they need traffic to rank well to get traffic then there’s a significant challenge to those manipulating rankings because they need the end for their means.

But Google is cautious with their core search product and has tweaked their algorithm very conservatively. And it has been hard to tell just how much clickstream data was playing a role in their search results and will continue to be as long as it’s such a minor supplement to their algorithm. Today, Google has posted a bit of information about this on their official blog in their efforts to shine more light on how they use private data. You can read about it here in full but the basics are no big surprises:

Location - Using geolocation or your input settings they customize results to your location slightly. My estimate is that they are mainly targeting searches with more local relevance. And example of such a search would be “pizza”. “Pizza” is more local than “hosting” and can benefit greatly from localization. Hosting, not so much.

Recent Searches – Because many users learn to refine their searches when they don’t find what they are looking for this session state is very relevant data. A user who’s been searching for “laptops” for the last few minutes is probably not looking for fruit when they type in “apple” next. They reveal that they store this information client side and that it is gone when you close your browser but because they mean cookies anyone who’s been seriously looking under the hood already know this.

Web History – If you have allowed them to track your web history through a Google account they use this to personalize your results. They don’t say much of anything about what they are really doing but this is where a the most can be done and there are far too many good ideas to list here. Some examples would be knowing what sites you prefer. Do you always click the Wikipedia link when it’s near the top of the search results even if there are higher ranked pages? Then they know you like Wikipedia and may promote its pages in your personalized results. Do you always search for home decor? Then maybe they’ll take that into consideration when you search for “design” and not give you so many results about design on the web. There are a lot of ways they can use this data, and this is probably an area they will explore further.

In summary, right now I’d say their are mainly going with simple personalizations and not really employing aggregate data and aggregate personalization to give aggressive differences. They are careful with their brand and will use user history with caution. After all, if your the use of your results lead to less relevance they fail and because personalization can be unpredictable (there must be some seriously weird browsing histories out there) they are going to be cautious and slight with this.

Search has been the sexy web app for years, with only social networking threatening its status as the cool king of web applications. And anyone gunning for Google seems to get a lot of attention. A husband and wife team of former Google employees launched a search engine called Cuil (pronounced “cool”) this week and the web was abuzz with the drama.

“Ex Googlers build Google killer” was the salacious angle behind the buzz, but when the search engine actually launched, the poor quality of its results and unreliability of the service generated a counter-buzz backlash. That Cuil representatives seemed snappy in response to the criticisms didn’t help and I’ll go ahead and predict that this search engine won’t go anywhere.

They differentiate themselves from the current search engines with a different user interface that is simply a lot less usable and while they claimed to have the largest index upon launch, they don’t and their relevance is behind all the major search engines (even Microsoft’s).

There is speculation that they launched this search engine in the hopes of being bought by a search engine like Microsoft, who has shown a willingness to spend big money this year (Yahoo takeover attempt, Powerset aquisition) to get search IP, market share, and talent.

That may be all that Cuil brings to the table. They went cheap on the backend and can’t realistically challenge for Google, Yahoo or even MSN scale. Their best hope is that Microsoft sees enough value in them to purchase them.

Google announced a new milestone of 1 trillion urls, which is impressive enough that we might as well forgive them for bringing us back to the index measuring wars of yesteryear. In the past, search engine bragging rights were about how much of the web their index contained. Then Google stopped publishing their index total on their home page and said it was quality (of search relevance) and not quality that mattered.

But a trillion’s a bit much to keep mum about so there you go. It doesn’t mean much but it’s interesting that it comes right before a stealth competitor launches a search engine they will claim to be the biggest (I think it’s a coincidence).