Way back in the WWW’s early Jurassic, micro computer based Web development tools sneakily begun poisoning the formerly ideal world of the Internet. All of a sudden we saw ‘.htm’ URIs, because CP/M and later on PC-DOS file extensions were limited to 3 characters. Truncating the ‘language’ part of HTML was bad enough. Actually, fucking with well established naming conventions wasn’t just a malady, but a symptom of a worse world wide pandemic.

None of those cheap but fancy PC based Web design tools came with a mapping of objects (locally stored as files back then) to URIs pointing to Web resources. Despite Tim Berners-Lee’s warnings (like “It is the the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. This needs thought, and organization, and commitment.“). The technology used to create a resource named its unique identifier (URI). That’s as absurd as wearing diapers a whole live long.

Newbie Web designers grew up with this flawed concept, and never bothered to research the Web’s fundamentals. In their limited view of the Web, a URI was a mirrored version of a file name and its location on their local machine, and everything served from /cgi-bin/ had to be blocked in robots.txt, because all dynamic stuff was evil.

Today, those former newbies consider themselves oldtimers. Actually, they’re still greenhorns, because they’ve never learned that URIs have nothing to do with files, directories, or a Web resources’s (current) underlying technology (as in .php3 for PHP version 3.x, .shtml for SSI, …).

Technology evolves, even changes, but (valuable) contents tend to stay. URIs should solely address a piece of content, they must not change when the technology used to serve those contents changes. That means strings like ‘.html’ or folder names must not be used in URIs.

Many of those notorious greenhorns offer their equally ignorant clients Web development and SEO services today. They might have managed to handle dynamic contents by now (thanks to osCommerce, WordPress and other CMSs), but they’re still stuck with ancient paradigms that were never meant to exist on the Internet.

They might have discovered that search engines are capable of crawling and indexing dynamic contents (URIs with query strings) nowadays, but they still treat them as dumb bots — as if Googlebot or Slurp weren’t more sophisticated than Altavista’s Scooter of 1998.

They might even develop trendy crap (version 2.0 with nifty rounded corners) today, but they still don’t get IT. Whatever IT is, it doesn’t deserve an URI like /category/vendor/product/color/size/crap.htm.

Why hierarchical URIs (expressing breadcrumbs or whatnot) are utter crap (SEO-wise as well as from a developer’s POV) is explained here:

If it’s about SEO and it’s there, it’s most probably bullshit. If it’s bullshit, avoid it.

If you plan to spam the SEO blogosphere with your half-assed newbie thoughts (especially when you’re an unconvinceable ‘oldtimer’), consider obeying this rule of thumb:

The top minus one reason to publish SEO stupidity is: You’ll end up here.

Of course that doesn’t mean newbies shouldn’t speak out. I’m just sick of newbies who sell their half-assed brain farts as SEO advice to anyone. Noobs should read, ask, listen, learn, practice, evolve. Until they become pros. As a plain Web developer, I can tell from my own experience that listening to SEO professionals is worth every minute of your time.

17 Comments to "SEO Bullshit: Mimicking a file system in URIs"

As a plain Web developer, I can tell from my own experience that listening to SEO professionals is worth every minute of your time.

Therein lies the problem: I’ve never met an SEO who hesitates to call themselves a professional. The newbs need a way to distinguish between the SEO professionals they should listen to, and the “SEO professionals” they should ignore.

Darren, given the huge amount of crap produced by the SEO blogosphere, on webmaster hangouts, and wherenot, most probably the only way to separate bullshit from wisdom and worthy advice would be a white list.

and here i thought i was going to read something about virtual folders… instead i read that /category/vendor/product/color/size/crap.htm is crap, yet i’ll argue that such is ok for something truly static and temporary while something dynamic and permanent gets built, with thanks for some knowledge gained here

just here to say your content is the shit, care to weigh in on the “best practice” for categories found here?

This looks like it will become a new fav of mine. I have been a fan of Sebastian for a good while now and SEOBS is now up there as well.

While I have been doing some amateur league programming for years, mostly demo type stuff to sell a concept and then hire a pro to do the real work, I’m not new to SEO as a concept but very new to it in practice.

Sebastian you hit it right on the head in this article, I as a user don’t want to traverse more than 2-3 layers of a site before I get where I want to be, but as a designer find myself time and again creating categorization hierarchies that are many layers deep and totally unnecessary.

Google definitely promotes this problem still, even in the most recent version of the Google SEO Starter Guide, v1.1 November 2008, under the heading of ‘Make your site easier to navigate’ a categorized hierarchical structure is illustrated up to 4 layers deep and described as “The directory structure for our small website…”

While the text is clearly speaking about page flow and navigation, calling the illustration a directory structure and inferring that the URI implementation should match only further propagates this problem.

Keep up the good work guys, this article has certainly opened my eyes to a problem I had never really considered
Your right Sebastian, I never really dug into the intended usage docs, time to go see what else I missed.

Mark, I didn’t say you can’t use subdirectories to store files. I said that storage location and URIs have nothing to do with each other. You can even store your stuff outside the Web server’s reach and provide meaningful URIs. Just because a webmaster manages content in an hierarchical directory structure, that doesn’t mean that using this hierarchy in URIs is a good idea.

Having everything “flat”, that is the URI’s path identifies the resource without using slashes, is an option, and just that. You can do it, combine it with slash delimited paths, query strings, … whatever. Always do what’s best for the actual site, and don’t listen to crappy advice that tells you otherwise, even when it comes from a major search engine and is beginner level stuff trying to bring a first understanding of what information architecture is to the noobs.

Just bear in mind that neither search engines nor human vistors care much about your URIs - they follow links, and only links. So if it’s difficult to create a hierarchical structure (usually it’s raping common sense at some point), then don’t do it, or do it just for parts of your content where it makes sense.

I have to disagree with you for several reasons.
1. Google is now displaying breadcrumbs in SERP and they are more likely to do so based on hierarchies.
2. Hierarchically designed IA makes it MUCH easier to analyze indexation and web traffic.
3. While internal linking structures are clearly driving SERP, it is much easier to understand and visualize your internal linking structures if you have a hierarchical structure.
4. Someday, you may need to move some content around or do other tricky and inconvenient things. You will thank yourself if you took the time to build out a really solid IA first.
5. Believe it or not, some users do see the URL and it helps establish information scent.

Jonah, you don’t need to disagree. Just get familiar with the concept, and think a bit further.

Google scrapes breadcrumbs from links, not from directory structures in URIs. For example Google can perfectly understand a path to the root on a page with a query string like ?category=widgets&state=ny&country=us, or even /item=11, provided there’s a meaningful breadcrumb navigation given on this page. Breadcrumbs describe a default navigation path (back to the root, usually), regardless whether there’s a hierarchy or not. Often there’s more than one logical way from the root to a particular page. That’s why setting one of many possible paths in stone is a bad idea.

When you put that as a SOP, it’s simply not true. As long as there’s a hierarchy in real life, then it can make sense, but there are more ways to implement it than pseudo file system hierarchies. Lots of things in real life aren’t organized in natural hierarchies, though. Artificial hierarchies aren’t helpful, esp. not in analytics.

If that would be true, you could provide me with a model to organize all Twitter users in a hierarchy. You can’t do that.

The opposite is true. When you have to move content, it helps when the content isn’t organized in a hierarchy that has impact on navigation and whatnot. I agree that a solid IA helps, but a solid IA is an IA that doesn’t rely on (artificial) hierarchies, but works with networks of intermeshed nodes and similar concepts instead.

A URI like /tom-clancy-all-titles is way more meaningful, more useful, more bookmarker friendly… than /books/authors/us/clancy-tom/titles/all. The page served from /tom-clancy-all-titles can very well have a breadcrumb navigation like Books:Authors:USA:Tom Clancy:All titles.

Whether or not you actually have “/” to mimic a file structure, if your parameters don’t have a hierarchy and a consistent order, you are asking for trouble. If some urls say state=ca&city=san-francisco and others say city=san-francisco&state=ca, you are going to create havoc and canonicalization nightmares. Meanwhile, many tools are much easier to use when you have hierarchies in place and it is certainly easier to handle things like robots.txt, htaccess, etc.

While I agree that a site like twitter doesn’t really confirm to this approach, it is still very solid SEO for 90% of larger sites to build URLs based on a hierarchical IA.

Consistent order, yes, absolutely. Hierarchy? Nope, technically you can do it right using just one query string parameter with totally meaningless values. As for tools only working with URI hierarchies … well, if I’d build IAs based on the requirements of crappy reporting tools, I’d play in the wrong game.

April 9, 2010
Sebastian,
I respect the insight you have on a variety of meaningful topics. The point the article made about URIs not needing to match underlying technology is irrefutable. My opinion is also that URIs must be concise*. I think otherwise about hierarchical URIs.