Leaderboard Zone

The way to online success is through being easily found in search engines such as Google, Yahoo!, and Microsoft Live Search. While developers have historically thought of search as a marketing activity, technical architecture has now become critical for search success. Found is the authoritative place to discover best practices for this nascent industry and gain a thorough understanding of why search-friendly architecture is absolutely mission-critical to businesses of all sizes. No spammy tricks. Just solid foundational coding tactics and actionable data that will ensure search engines can easily crawl, index, and rank your site’s content.

Maybe it’s just me, but these two posts completely miss the point. To me its clear. If you’re in business and want to sell stuff, then keep your site simple. Use strategic tags and key words, and people will find you and buy. Make sure that each page that needs to be found has a link from your home page, and the spiders will find it…

Hi nmw, totally agree. The industry’s been around as long as the web. I think that developers probably haven’t been thinking about it quite as long as others, since a lot of SEO was originally seen as part of marketing. But as the web gets more complex, its gets more important to figure out the tough technical challenges that can be presented when thinking of search (using load balancing, rewriting URLs using proxies, building a custom CMS that enables marketers to do what they need, etc.).

Had I written the release, I likely would have used different wording there to clarify a bit more.

JG, having worked at Google, I still am very focused on search quality, so I totally hear you on that worry. This conference is not about gaming algorithms. It’s all about building solid infrastructure that, among other things, ensures search engines can crawl and index the site. I’ve spent tons of time over the last few years talking with developers about issues they face with this, and the ones I’ve talked to are looking for real solutions. They don’t want to just be told “don’t use tracking parameters in URLs” or “don’t use Flash”. They want solutions that work for the development environment and for users. And that’s where the idea of this conference came from.

the information retrieval industry has been around far longer than the web.

Martin Luther wrote to princes, pleading them to establish and maintain libraries. Granted, many libraries were still quite small — and even 200 years after the printing press was invented, the level of literacy was still quite low. You should recall that when the “founding fathers” of the United States put a “literacy” requirement into the the right to vote, that was a sign that information was the basis upon which Americans sought to build — an approach very supportive of learning and science rather than blind faith.

But even without zooming out that far, it is totally obvious to anyone who wasn’t just born yesterday, that “cryptology”, “information theory” (Shannon/Weaver)… — yes: Vennevar Bush’s Atlantic article (and let’s not forget Hollerith cards! ;)… — these things all significantly predated the web.

Also granted: Saying the Ancient Greeks or the Egyptians — the first forms of writing that were etched into stones in Mesopotamia (to represent livestock — an “information retrieval” system) is a little bit extreme.

However, saying that SEO (which is — as I have said time and again — also a completely ridiculous term, insofar as “SEO” is primarily focused on “optimizing” for Google’s algorithms, not search in general) — saying that SEO or Google invented search sounds an awful lot like “Al Gore invented the Internet”.

This conference doesn’t look like much more than a meeting of Google hackers — and since “Google” is meaningless, it will be little more than a flash in the pan.

JG, having worked at Google, I still am very focused on search quality, so I totally hear you on that worry. This conference is not about gaming algorithms. It’s all about building solid infrastructure that, among other things, ensures search engines can crawl and index the site.

Vanessa, I hear what you’re saying, and you’re coming at it from a very pragmatic standpoint. By itself, that’s not a bad thing.

But I have a more idealistic view of search that says that a search engine should be responsible for retrieving the information as it is, rather than making the information have to adapt itself to the search engine.

After all, if you go back and look at how “information retrieval” historically arose, that is exactly what was happening. Unlike databases and SQL, information retrieval is all about dealing with the structure (or lack thereof) of information as it natively exists, and coming up with clever parsers and algorithms that are capable of dealing with information in situ.

While in the short term it may be very pragmatic to develop an infrastructure that allows webmasters to conform to Google standards, or even to G/Y/M consortium standards, in the long run that means everyone on the web has to adapt to those G/Y/M standards, or no longer be findable. That’s the wrong place to put the onus, and (I believe) is very anti-democratic, anti-web. Google should be the one carrying the burden, not every single web publisher across the whole web.

Stated another way: Sites that search the web should not be trying to change the web. Search engines should be reactionary, only.

Second, by essentially forcing every web site out there to conform to Google/consortium standards, there is a chilling effect on future 3rd party innovation. Since webmasters are altering the structure of their original data, to conform to existing search engine practices, it makes it that much more difficult to come up with some radically new, interesting way of dealing with the heterogeneity of the web. Because that heterogeneity will essentially have disappeared.

I have been forecasting this for a long time already — I think the hour for web 3.0 has finally arrived.

This bifurcation of “publishers” vs. “users” will ultimately lead to the web of community-based publishing, and all the talk of “social, social, social” will ultimately result in communities defined by “location, location, location”.

JG, That may be the case with some types of optimization, but I don’t agree that it’s the case with technical site architecture. Practically speaking, if a company wants to do business in any environment, they should understand how to operate in that environment.

If you operate a bricks and mortar retail store, it helps to open it in a busy location, with a big sign on the door so people know what’s inside, and for that matter, with a door.

Similarly, if a business operates online, they should understand that if they build a site that is blocked with robots.txt, or the home page is one big image that contains the text and no ALT text, or the pages are built with AJAX in such a way that the URL doesn’t change when the user navigates from page to page, then searchers won’t find that site.

If the pages don’t have title tags, then the searcher may not know what that site is about when looking at the search results page.

Search engines want to index all the great content out there and wish their systems could get to sites inadvertently block them due to technical reasons. Certainly the search engines are working to get better at getting to this content as it’s in their best interest, but as a business owner, it’s also in your business interest to understand how to be successful.

Going back to that untitled page example, historically, the search results page would list that page as “untitled”. That’s like not having a sign on your physical store. Search engines want to show the most useful results possible so they’ve developed their own methods to improve this situation. They’ll now create a title using the anchor text to a site or the dmoz.org title. This is better (generally) than untitled, but this is like the physical store owner letting a third-party put any sign they want over their door.

Understanding search when evolving your web site is useful for all kinds of reasons. By looking at what people are searching for and what pages those searchers stay on vs. bounce off of, you can make sure you are speaking in the language of your customer, are offering what they are looking for, and can pinpoint areas of your site that might not be as useful.

For this conference, we are particularly focusing on technical challenges that may inhibit indexing in search. For instance, if a site uses tracking parameters in URLs, it may be giving search engines multiple URLs to the same page. Search engines choose one to index and from a search engine perspective, everything is fine. The page is listed and the searcher can find it. However, from a business owner perspective, what if the version of the URL that’s indexed has an ad tracking parameter that pays on a CPC or CPA basis? That means that every time someone clicks on that search result in organic search, the company has to pay out. A site came to me with that problem just last week. That company wants the canonical URL indexed in organic search, not the ad version. Online businesses face problems like this all the time.

Fundamentally, there are lots of technical issues that can hinder crawling and indexing. By building a site that solves for these issues, you can not only ensure searchers can find you, but you can improve accessibility and usability as well. In my estimation, this can only improve the web.

Practically speaking, if a company wants to do business in any environment, they should understand how to operate in that environment.

Woah, let’s stop right there, before we get into any of the details, and understand what it is you’re saying, here. If a company wants to do business? Who said anything about companies and businesses? I’m talking about information that has been published on the web. I’m talking about citizens and users, wanting to search the web, in order to find information relevant to their needs. Not the commercial web, but the entire web. Not just commercial information, but all information. In whatever form that information takes.

And how is information presented, across the web as a whole? Is it always presented as clean, business-storefront home pages, with pristine (and on-topic relevant) title tags? No. A lot of the time, the relevant information that I (as a user) am seeking is found in the comments section to various blog threads. Or in online forums. Or in a PDF document. Or inside of a video. These things don’t always have title tags. Does that mean, because they don’t follow that Google-centric convention, they will not be ranked on the first page (buried), and therefore never found by users? Or conversely, in order to be found, I am going to have to start including [title] tags in all of the comments that I make on this blog? Or somehow convince John Battelle to change the title tag on this blog entry to “JG comments about X”? I mean, who is Google really trying to support here.. businesses or users?

If you operate a bricks and mortar retail store, it helps to open it in a busy location, with a big sign on the door so people know what’s inside, and for that matter, with a door.

Ah, but the web is different. At least it was supposed to be. The web was supposed to be driven by long-tail forces. Search is supposed to drive users to places that they wouldn’t or couldn’t have found on their own. Search isn’t supposed to drive users to busy main streets with lots of neon lights. Those sites are already easily locatable, without a search engine. Do you really need a search engine to locate bestbuy.com? No. Search is supposed to get me past that raucous main street, and out into the individual homes and quiet boulevards, to find individuals, smaller establishments, rarer items, in situ.

If the street on which relevant information lives is a small street (e.g. like many streets in the downtown areas of European cities), and isn’t wide enough for a large Trailer Truck search engine to fit down, that shouldn’t matter. The inhabitants of that street shouldn’t have to re-architect their whole street, just because the search engine can’t fit. The search engine should get out of its larger vehicle, and drive down the street in a VW bug or even on a bicycle, in order to get me there.

Search is suppose to make the world flatter. Or at least make wider streets narrower and narrower streets wider, so that relevant information doesn’t have to relocate itself to busy main streets. And so that things that aren’t large-scale businesses can still be found, without having to spend thousands of dollars on an O’Reilly conference. That’s why it’s called search. The search engine brings the user to relevant information, no matter where that relevant information is located.. rather than the relevant information having to uproot itself and move to where the users are.

Similarly, if a business operates online, they should understand that if they build a site that is blocked with robots.txt, or the home page is one big image that contains the text and no ALT text, or the pages are built with AJAX in such a way that the URL doesn’t change when the user navigates from page to page, then searchers won’t find that site.

Will searchers not find that site, because it is not relevant? Or because Google refuses to deal with it, because it doesn’t fit Google’s indexing model, and Google would rather spend its time developing calendars and chat widgets than really dealing with the complexity of information available on the web?

Google recently rolled out OCR (optical character recognition) for scanned books, right? So why can’t it apply those same algorithms to home pages that are one big image with no ALT text? And not get all pissy about it and give that page less rank-juice. Because relevant information is relevant information, no matter what form it takes.

I apologize if this was too much of a rant. I know you mean well, and this isn’t personal, of course. I just feel very strongly about the bass-ackwardness of the way that search engines are going about this process. The web wasn’t supposed to turn into a main street. The web was supposed to be different.