search

There is a widespread myth that search engines have taken profits away from news websites. A few months ago, Rupert Murdoch said: “Google has devised a brilliant business model that avoids paying for news gathering yet profits off the search ads sold around that content.”

The reality is that news is a lousy business. Period. Even Google doesn’t make money on it. For example, here are Google’s search results for the phrase “afghanistan war”:

Notice there aren’t any ads on the page. This is because ads for “afghanistan war” generate such low revenues per query that Google doesn’t think it’s worth hurting the user experience with a cluttered page. Google can afford to do this on news queries (along with many other categories of queries) because their real business is selling ads on queries where the user likely has purchasing intent. Big money-making categories include travel, consumer electronics and malpractice lawyers. News queries are loss leaders.

It’s an historical accident that hard news categories like international and investigative reporting were part of profitable businesses. The internet upended this model by 1) providing a new delivery method for classified ads (mainly Craigslist), 2) increasing the supply of newspapers from 1-2 per location to thousands per location, thereby driving the willingness-to-pay for news dramatically down, and 3) unbundling news categories, making cross subsidization increasingly hard.

The internet exposed hard news for what it is: a lousy standalone business. Google arguably contributed to this in many indirect ways, including by helping users find substitute news sources. But the idea that Google takes profits directly from newspapers is simply misinformed.

Websites live or die based on how a small group of programmers at Google decide their sites should rank in Google’s main search results. As the “router” of the vast majority of traffic on the internet, Google’s secret ranking algorithm is probably is the most powerful piece of software code on the planet.

Google talks a lot about openness and their commitment to open source software. What they are really doing is practicing a classic business strategy known as “commoditizing the complement“*.

Google makes 99% of their revenue by selling text ads for things like plane tickets, dvd players and malpractice lawyers. Many of these ads are syndicated to non-Google properties. But the anchor that gives Google their best “inventory” is the main search engine at Google.com. And the secret sauce behind Google.com is the algorithm for ranking search results. If Google is really committed to openness, it is this algorithm that they need to open source.

The alleged argument against doing so is that search spammers would be able to learn from the algorithm to improve their spamming methods. This form of argument is an old argument in the security community known as “security through obscurity.” Security through obscurity is a technique generally associated with companies like Microsoft and is generally opposed as ineffective and risky by security experts. When you open source something you give the bad guys more info, but you also enlist an army of good guys to help you fight them.

Until Google open sources what really matters – their search ranking algorithm – you should dismiss all their other open-source talk as empty posturing. And millions of websites will have to continue blindly relying on a small group of anonymous engineers in charge of the secret algorithm that determines their fate.

* You can understand a large portion of technology business strategy by understanding strategies around complements. One major point: companies generally try to reduce the price of their products complements (Joel Spolsky has an excellent discussion of the topic here). If you think of the consumer as having a willingness to pay a fixed N for product A plus complementary product B, then each side is fighting for a bigger piece of the pie. This is why, for example, cable companies and content companies are constantly battling. It is also why Google wants open source operating systems to win, and for broadband to be cheap and ubiquitous. [link to full post]

Consumersearch is owned by About.com, which in turn is owned by the New York Times.

So how did consumersearch.com get the top organic spot? Most SEO experts I talk to (e.g. SEOMoz‘s Rand Fishkin) think inbound links from a large number of domains still matter far more than other factors. One of the best tools for finding inbound links is Yahoo Site Explorer (which, sadly, is supposed to be killed soon). Using this tool, here’s one of the sites linking to the dishwasher section of Consumersearch:

(Yes, this site’s CSS looks scarily like my own blog – that’s because we both use a generic WordPress template).

This site appears has two goals: 1) fool Google into thinking it’s a blog about dishwashers and 2) link to consumersearch.com.

Who owns this site? The Whois records are private. (Supposedly the reason Google became a domain registrar a few years ago was to peer behind the domain name privacy veil and weed out sites like this.)

I spent a little time analyzing the “blog” text (it’s actually pretty funny – I encourage you to read it). It looks like the “blog posts” are fragments from places like Wikipedia run through some obfuscator (perhaps by machine translating from English to another language and back?). The site was impressively assembled from various sources. For example, the “comments” to the “blog entries” were extracted from Yahoo Answers:

Here is the source of this text on Yahoo Answers:

The key is to have enough dishwaster-related text to look like it’s a blog about dishwashers, while also having enough text diversity to avoid being detected by Google as duplicative or automatically generated content.

So who created this fake blog? It could have been Consumersearch, or a “black hat” SEO consultant, or someone in an affiliate program that Consumersearch doesn’t even know. I’m not trying to imply that Consumersearch did anything wrong. The problem is systematic. When you have a multibillion dollar economy built around keywords and links, the ultimate “products” optimize for just that: keywords and links. The incentive to create quality content diminishes.

Google has created a multibillion-dollar economy based on keywords. We use keywords to find things and advertisers use keywords to find customers. As Michael Arrington points out, this is leading to increasing amounts of low quality, keyword-stuffed content. The end result is a very spammy internet. (It was depressing to see Tim Armstrong cite Demand Media, a giant domain-name owner and robotic content factory, as a model for the new AOL.)

On Twitter you have to ‘game’ people, not algorithms. Look how many followers @demandmedia has. A lot less then you guys: @arrington @jason

These are both sound points. Lost amid this discussion, however, is that the links people tend to share on social networks – news, blog posts, videos – are in categories Google barely makes money on. (The same point also seems lost on Rupert Murdoch and news organizations who accuse Google of profiting off their misery).

Searches related to news, blog posts, funny videos, etc. are mostly a loss leaders for Google. Google’s real business is selling ads for plane tickets, dvd players, and malpractice lawyers. (I realize this might be depressing to some internet idealists, but it’s a reality).Online advertising revenue is directly correlated with finding users who have purchasing intent.Google’s true primary competitive threats are product-related sites, especially Amazon. As it gets harder to find a washing machine on Google, people will skip search and go directly to Amazon and other product-related sites.

This is not to say that the links shared on social networks can’t be extremely valuable. But most likely they will be valuable as critical inputs to better search-ranking algorithms. Cody’s point that it’s harder to game humans than machines is very true, but remember that Google’s algorithm was always meant to be based on human-created links. As the spammers have become more sophisticated, the good guys have come to need new mechanisms to determine which links are from trustworthy humans. Social networks might be those new mechanisms, but that doesn’t mean they’ll displace search as the primary method for navigating the web.

“SEO” (==”Search Engine Optimization”) is a term widely used to mean “getting users to your site via organic search traffic.” I don’t like the term at all. For one thing, it’s been frequently associated with illicit techniques like link trading and search engine spamming. It is also associated with consultants who don’t do much beyond very basic stuff your own developers should be able to do. But the most pernicious aspect to the phrase is that the word “optimization” suggests that SEO is a finishing touch, something you bolt on, instead of central to the design and development of your site. Unfortunately, I think the term is so widespread that we are stuck with it.

SEO is extremely important because normal users – those who don’t live and breath technology – only type a few of their favorite websites directly into the URL bar and for everything else go to search engines, most likely Google*. In the 90s, people talked a lot about “home pages” and “site flow.” This matters if you are getting most of your traffic from people typing in your URL directly. For most startups, however, this isn’t the case, at least for the first few years. Instead, the flow you should be thinking about is users going to Google, typing in a keyphrase and landing on one of your internal pages.

The biggest choice you have to make when approaching SEO is whether you want to be a Google optimist or a Google pessimist**. Being an optimist means trusting that the smart people in the core algorithm team in Mountain View are doing their job well – that, in general, good content rises to the top.

The best way to be a Google optimist is to think of search engines as information marketplaces – matchmakers between users “demanding” information and websites “supplying” it. This means thinking hard about what users are looking for today, what they will be looking for in the future, how they express those intentions through keyphrases, where there are gaps in the supply of that information, and how you can create content and an experience to fill those gaps.

All this said, there does remain a technical, “optimization” side to SEO. Internal URL structure, text on your landing pages, and all those other things discussed by SEO consultants do matter. Luckily, most good SEO practices are also good UI/UX practices. Personally I like to do all of these things in house by asking our programmers and designers to include search sites like SEOMoz, Search Engine Land, and Matt Cutts in their daily reading list

* I’m just going to drop the illusion here that most people optimize for anything besides Google. ComScore says Google has ~70% market share but everyone I know gets >90% of their search traffic from Google. At any rate, in my experience, if you optimize for Google, Bing/Yahoo will give you SEO love about a 1-6 months later.

** Even if you choose to be a pessimist, I strongly recommend you stay far away from so-called black hat techniques, especially schemes like link trading and paid text ads that are meant to trick crawlers. Among other things, this can get your site banned for life from Google.