The author's views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Hello there. You look lovely. I’m Hannah and I'm an SEO Consultant for Distilled. I'm British which means I spell things strangely sometimes, we like to make things more complicated than they really need to be here. This is my first post for SEOmoz, I hope you find it useful.

Whenever I kick off a new project with a client, they are typically very interested in how I might be able to get them some lovely links. They’re also pretty keen for me to create them some lovely shiny content. Sadly, most aren’t too interested in information architecture. Many don’t realise how important it is.

To be honest, up until fairly recently I was one of those people. Most of the sites which I had worked on previously were in the insurance niche. Now typically these sorts of sites don’t really have duplicate content issues. Likewise I had never encountered any problems with indexation. I secretly wondered what those other SEOs were whining about (bunch of big girl’s blouses).

But then... A rude awakening.

I’ll not name names (that’s just not nice) but I had a client who were part-way through a brand new site build. I figured the technical part of the project would be pretty straight-forward; after all when someone’s building a brand new site they’re bound to have given some serious thought to information architecture right? ...Right? ...Bueller? ...Bueller? ...Anyone?

Sadly not. The proposed architecture was riddled with so many issues it made my head spin. They would either have a lot of duplicate content or perhaps little or no content – it wasn’t quite clear which (and neither scenario made me jump for joy). They were likely to struggle with indexing. There were gaps you could drive a bus through in their landing page strategy. Their site was going to be a big old mess.

There was much lamenting, wailing, tearing of hair and gnashing of teeth... Then I calmed down.

What follows is a collection of the challenges I faced and how I dealt with them, plus definitions and explanations which I found useful when trying to fix these issues... Hopefully it’ll save you some pain. Once more unto the breach, dear friends...

The Challenge... No one cares but me

Yep, I came up against a whole heap of resistance when trying to fix these issues. No one really understood or cared about the situation. There was a lot of talk about how important the customer journey was; there was a lot of talk about brand experience – but SEO? Hmmm, well it wasn’t really getting much of a look in. The CMS being used for the build was apparently ‘SEO-friendly’ and there would be a sitemap, so the general consensus seemed to be that we were ‘all good’ for SEO thanks.

The Counter-Challenge – Education & Myth Busting

In my experience if you want to facilitate change, you’ll need to be prepared to do some serious ‘selling in’ of your ideas. But, the first step is to help people understand what the issues are, and as such, education is key. So, why should people care about information architecture?

Here’s what I went with...

Information architecture (or how the information on the site is organised) is important from a search perspective in two key ways:

It enables the search engines to index all of pages on the site

It provides suitable landing pages for all of the keywords (or search phrases) that you might wish to rank for

Without sound information architecture your site may not get indexed properly, and if a site isn’t indexed, then clearly you’ll have no chance whatsoever of ranking. Likewise, without suitable pages to rank for your selected key phrases, again, you’ll struggle to rank for those keywords.

From an SEO perspective we’re also seeking to ensure that we’re not creating duplicate content (i.e. the same content available via more than one URL) – as ultimately this causes issues with ranking as you have more than one page from your site competing for the same search result.

Finally, as links equal strength when it comes to SEO we’re also looking to ensure that we have strong internal linking within the site in order to maximise the strength of our most important pages (i.e. the pages which we really want to rank). Of course, external links will play a major part here, but ensuring we’re passing internal ‘link juice’ is also important.

I also had to do a little myth busting. The most pervasive of which was the mythical power of the sitemap. There was a strong belief that the sitemap would cure all ills, that provided it included all the pages they wanted to get indexed, they’d duly get indexed and everything would be golden. I’m sure I don’t need to tell you that this isn’t the case. Sure sitemaps are helpful, but they aren’t a cure-all and I certainly wouldn’t recommend that anyone rely on a sitemap to get their content indexed. More importantly even if the sitemap assists with indexation, there was still the issue of providing suitable landing pages for all of the keywords which they wanted to rank for.

Key Takeaways

If the search engines can’t index your content you will not rank.

If you don’t have a page for each keyword (or at least each sub-set of keywords – you can of course target more than one keyword per page), again you’ll struggle to rank.

A lack of rankings means a lack of traffic. A lack of traffic will likely mean a lack of revenue.

A sitemap will not fix this.

So, by this point they were finally pretty much onboard with why this was important. Yay! Time to sell in the solution (cue fanfare) - Faceted Navigation!

...Wait, what? What is that?

Faceted Navigation

A faceted navigation allows users to select and de-select various facets in order to search / browse for what they are looking for. As such, it allows visitors to utilise multiple navigational paths to reach their desired end goal.

Let’s imagine that you’re shopping for a t-shirt. You might want to browse t-shirts by size (i.e. only those in your size), by colour, by designer, by price etc. To find the t-shirt you want it would be really handy if the website you were browsing allowed you to narrow down your search using some or all of those facets. It might look a little something like this:

Now I think this is pretty darn lovely from a user’s perspective. Additionally, the flexibility this sort of structure gives you helps you to solve the ‘page for each keyword / sub-set of keywords you want to target’ issue. Whilst it may look fairly simple on paper there are quite a few things to think about when tackling this. Here are some of the things I came up against, and how I dealt with them...

1.How many facets do you need in order to get everything indexed?

Ideally your deepest facet should contain no more than 100 products. This will assist you greatly in getting all of your products indexed. (NB whilst most SEOs are comfortable that the search engines will crawl more than 100 links on any given page, I prefer to stick with 100 product links as most websites will have a number of navigation links on every page in any case. Sticking to a maximum of 100 product links will help keep the total number of links on any given page at a sensible level).

By ‘deepest’ I mean however many folders down you decide to go. Let’s stick with hannahstshirts.com as an example – here you may decide to use the following facets:

Womens

T Shirt Type

Designer

An example deep facet page: hannahstshirts.com/womens/v-neck/a-wear/ - on this page, visitors would see all women’s v neck t-shirts from A Wear.

Now this type of page should have no more than 100 products on it, so provided that none of your designers offer more than 100 of a particular style of t-shirt then this is as deep as you need to go. If this isn’t the case you’ll need to add in another facet – e.g. colour.

2.Facets versus filters

There will probably be further search / browse options which you want to offer visitors to your site that you don’t really care about from a search perspective. For example – it’s really useful for visitors to be able to browse only items which are available in their size; but you may decide that you’re not particularly worried about the search engines indexing these pages. That’s where filters come in. These filters should be implemented using JavaScript or no-indexed to prevent these pages from getting indexed.

3.Do you have pages to enable you to rank for all of the keywords that are important to you?

This is really linked to the previous two points. Again using the example above – if your facets were Womens, T-Shirt Type and Designer; but you had a burning desire to rank for the term ‘white women’s t-shirts’ – then bad news, friend. As colour is a filter rather than a facet you don’t have an indexable page for that phrase. If you want to rank for these sorts of keywords you’ll need to make colour a facet, not a filter.

4.Pagination

At the top level e.g. ‘Womens’ you’ll return a number of pages of results. Now really you don’t want these pages indexed. Page 2 onwards of a given set of results is rarely an awesome result for a user; plus of course you’ll effectively be having more than one indexed page competing for the same keyword in the SERPs. It’s bad all round. Therefore use Ajax or JavaScript to display page two and onwards.

5.Sorting

Likewise, you may decide to offer sorting options – e.g. sort by price, sort by rating etc. These are great for users, but a potential duplicate content love fest for search. You don’t want the various sorted versions of the same page being indexed separately, so use JavaScript or Ajax.

Because there are multiple navigational paths to a user can take, if you’re not careful there will be duplicate URLs for the same content . For example if you wanted to see all of the women’s white t-shirts by Bench you could go via:

www.hannahstshirts.com/womens/v-neck/bench

www.hannahstshirts.com/womens/bench/v-neck

Plus, depending on your site structure you might also be able to go via:

www.hannahstshirts.com/bench/womens/v-neck

www.hannahstshirts.com/bench/v-neck/womens

www.hannahstshirts.com/v-neck/bench/womens

www.hannahstshirts.com/v-neck/womens/bench

Uh oh. Imagine how many permutations of this you’ll have across the site. Bad times. You’ll need to make sure that no matter which route a user takes to reach a particular page, there is only one indexable URL. Now hopefully, you’ll either be custom building something awesome, or be using a CMS which will allow you to do this. If not? You’ll have to 301 all the variants back to one indexable URL.

Right, we’re nearly there, I promise. If you’re still reading then you definitely deserve a cookie. Possibly two.

Content’s Still King (well, nearly)

So, let’s imagine that you’ve finally got there. You’ve got a lovely looking faceted navigation. You’ve got all of the keyword targeted pages you need. You’ve defeated the duplicate content demons. You are made of win.

Don’t stumble at the final hurdle. Despite your best intentions, you still have a site with a lot of pages which look quite similar. Lists of products which are available on a variety of other pages. Doesn’t feel all that unique, huh? You’ll need to create some unique content for each of these pages, and the more important the page is to you; the more awesome this content needs to be.

Key Takeaways

Use as many facets as you need to ensure that your deepest faceted pages contain 100 products or fewer AND to ensure you have all the pages you need to target the keywords you want to rank for.

Hi Hannah, I liked your first post (spiced with British humor) very much.It is a perfect example of all the upcoming problems we have to deal with and brings lots of solutions.Especially I liked that you pointed out, that each keyword should have an individual landing page.The cheat sheet is very usefull, too.

About 5% of my search engine traffic lands on paginated results beyond the first page. Most of this is long tail from the partial product descriptions on the paginated results. Will using 'noindex follow' help increase the importance of the detail pages that have the full description and also increase the importance of page 1 of each category?

That's a great question. Some of it depends on how crawlable your site is. If search engines can easily get to your product detail pages, that same content is there and - therefore - it should be able to rank for those long-tail searches. With a few good links, it should probably rank better than a paginated category page. However, if you are receiving a large amount of product-related traffic to paginated category pages, it could be a sign that your product detail pages themselves are either difficult for bots to get to, or don't have enough links. Two good links into a paginated category page could cause that to show up instead of a product page with no links.

You could try noindex,follow ing those pages for a couple of weeks and see if your overall traffic has dropped, or if your product pages picked up the slack. Sometimes it's fine to allow paginated category pages to be indexed (not with faceted navigation, but I'm assuming you're not using that), especially if the copy (category description, seo text, etc...) only appears on the first page and that subsequent paginated pages don't share the same title and meta description as the primary category page. Usually I'd recommend noindex,follow but 5% could be a huge amount of traffic to "give up" just because the standard SEO advice says to.

Yeah, I think 5% is pretty significant traffic for paginated results past page one, so I'm apporoaching this cautiously. A few days ago I 'noindex folllowed' one of my categories and I'm going to wait and see the results before implenting it site wide.

As I said below - personally I prefer the 301 option - I think it's cleaner to effectively remove these pages. Also rel-canonical could be ignored as it's not a directive; but yes you could use this option - just be careful with implementation :)

I think we may be talking at cross-purposes here. In terms of pagination I'd recommend not paginating - so you load links to all of the associated products on any given page, and use Ajax / JavaScript to paginate for the users.

However, if you didn't want to do that, you could as you suggest paginate the results and use rel-canonical. However, as I said before I'd be concerned about doing this as it's not a directive. Particularly as I'm not sure in the case of pagination that pages 2 onwards are actually the same as page 1 - they would have different products on for example. As such I'd worry that the rel-canonical would be ignored. I've not tested this though, so couldn't say for sure :)

Noindex, follow is not helpfull in a huge site because it will waste you Crawl budget – the number of pages a search engine will crawl each time it visits your site. You want the search engines to crawl pages you want to be indexed.

Hey hannah, great post. Just a question though. In the case where duplicate content is created because of a product being accessible via multiple navigational paths, wouldn't the rel="canonical" link element be preferrable to a 301 redirect?

Personally I prefer the 301 option - I think it's cleaner to effectively remove these pages. Also rel-canonical could be ignored as it's not a directive; but yes you could use this option - just be careful with implementation - I've seen sites mess this up with pretty disasterous consequences.

My first thought was to use rel=canonical but I can see that 301's would keep things "clean" and perhaps reduce indexing time for the tired spider! The only slight snag might be that if you have really user friendly urls and a savvy user, they might wonder what is going on when they jump urls.

This topic - information architecture and faceted navigation - is really one of the most important and yet misunderstood ones when it comes to SEO, and your suggestions and that wonderful PDF from ProSEO London are really helpful.

One of the most common problem is usually to make understand to the client first and the devs later the importance of the facets hubs. In fact, they are not only essential in order to help the spiders indexing the most of your site, but also - if very well crafted - they can represent a big active for the middle/long tail optimization.

The problem is that, usually, clients go directly (super)optimizing the product pages, which is a rational instinct by them, but that miss the point over the concept of spider "navigation and discovery).

Thanks again Hannah.

post scriptum: if you like her style, therefore you should have to hear Hannah speaking at conferences... she has the great gift to make the borest thing amusing :)

Just wondering, you mentioned that you should use javascript and ajax to reduce potential of duplicate content issues. What happens when the Search Engines work out a way to read these pages... duplicate content issues. No index, follow must be the most sensible way of future proofing these ideas?

Quick question about your tip on pagination: you suggest "Page 2 onwards of a given set of results is rarely an awesome result for a user [...] Therefore use Ajax or JavaScript to display page two and onwards".

If we hide these pages from search engines, how will crawlers ever find the actual product pages? Should we rely on sitemaps to point bots to products?

It's not an issue because you're not relying on your top level facets to get all of your products indexed. You're quite right, your top level facets will probably have tons of links, but we're not worried about the bots following all of those links in any case.

That's why I say it's important to make sure that your deepest facets have 100 products or fewer on them - that's what we're relying on to get all of the individual products indexed :)

As it pertains to pagination of multiple results, suggesting the use of ajax/javascript could you elaborate on what you mean here?

From my experience the introduction of non-crawlable/indexable and linkable urls, and content is intrinsically dangerous as code will often get re-used beyond its original intent. I understand that by using AJAX/recursive javascript you are intentionally masking the duplicate content pages. But it seems frought with trouble if for instance you actually wanted say a search results page(ination) to index/rank. Cooks.com does this to great effect. What do you think?

I'm suggesting that you use Ajax / JavaScript to show and hide content for the users. Say there were 200 products associated with a particular facet - all of these are loaded on to the page, but to make it more friendly for users, you paginate using Ajax / JavaScript. So, essentially you're not creating mutliple pages and having to hide them - only one page exists.

User-experience and SEO go hand-in-hand and it frequently boils down to site architecture. Users and search spiders have to be able to find the information they are looking for and find it quickly. If your site can't deliver, they'll just go somewhere else.

I like the ajax approach, it's clean and no additional product pages but to play devil's advocate, what happens if you have a client with many pagenated product pages and looking for a cost effective solution to reduce problems associated with having 5 pages of 20 products.

Possibles mentioned in the comments above were

canonical on page 1 and canonical to page 1 on all additional pages

no follow on the pagination links

no index on everything but the first page

combination of all the above

What are your thoughts on that as I know many of my smaller clients have not always got big budgets and it's good to have the bargain basement approach that may not be so fancy but provides the same benefits.

Also, anyone got any feedback on how to effectively hide links so the navigation can be much larger than 100 links for a given page. Not in any way to game the search engines but to create a well designed 100 link per page navigation for search engine bots but to have something that allows more links for users (something like this: http://www.notonthehighstreet.com/).

Even if you select to noindex the subsequent pages you probably dont want to nofollow the pagination links. You still want the search engines to crawl down to each and every leaf page (e.g. product) so let them follow without indexing.

Canonical of subsequent pages pointing to the first page is interesting and I ve seen some large and well established sites doing it. Never tried it though and there is an article on seomoz blog advising against it:

"Don't put a rel=canonical directive on paginated results pointing back to the top page in an attempt to flow link juice to that URL. You'll either misdirect the engines into thinking you have only a single page of results or convince them that your directives aren't worth following (as they find clearly unique content on those pages)."

Here you can read some more advice on this very interesting and challenging subject:

Great post Hannah. I had the same issue with an old client site where we had the same pages accessible via multiple URLs, unfortunately there was no quick option to disable this so eventually managed to fix it with a manual patch to sort the directories within the URL into alphabetical order.

Some definite food for thought here and very relevant for the kind of Drupal based sites I work on/with. A lot of different views in Drupal have independent URLs which will in fact feature duplicate content.

Great post! I run a silly pet stuff retailer, and have been wrestling with this issue for awhile. As the SEO guy/developer/marketing guy, I get to write our platform to do whatever the heck I like. We're rolling out a system soon that deals with this exact issue. Here's what we've come up with:

Our product tags are organized according to categories, such as Brand, Color, Materials, Season, etc. That lets us "facet" everything as above. Each tag category has an order in the URL (or at least it will when we launch). That keeps the order of of directories (really just URL rewrites) in a single, canonical order to rule them all. Like... domain.com/dog-sweaters/puppia /red/winter.

We're pretty pumped to see how it affects our rankings. We currently have duplicate & buried content issues out the wazoo.

I face this time and time again, I am a webmaster for a manufacturer who has their own website and customers copy and paste the content. I now have a writer in place to keep re-generating new content to ensure the site stays fresh, unique and up to date.

Hello Hannah, in the past I worked as IA designing faceted navigation and search for enterprise search and intranet to improve findability.

Now I would like to use that expertise to manage e-commerce web site, but I was warried to do some unfriendly SEO actions like there were a trade-off beetween user friendly faceted navigation/seo friendly information architecture. Till now, I couldn't find any help to my dubts, since I have never proposed this IA solution because I could'n manage all the emerging issues that you have explained so well.

Great article and very topical to me as I'm currently dealing with a duplicate content nightmare. Unfortunately the client's CMS is so awkward I can't implement any of the changes you suggest! Any tips on how to talk a client into switching e-commerce platforms? :D

Faceted navigation can be mind-boggling... I wish there was a good visualisation tool that can wrangle 1mil+ page websites and complex navigational structures. So far every such tool I tried either failed or produced wrong results.

This brought back a lot of memories for me, I used to work for a catalogue company and trying to explain SEO and architecture to people who just didn't understand or care was frustrating to say the least. Ecommerce is always a challenging area, but there are some great tips here. - Jenni

Like most mozzers point out, I think it's also important to keep UX as top priority, and to make sure that everyone working on the project, web masters, web dev, designers, SEOs, etc is on board with that too. That is what will get pages to rank page 1 as well convert all of your users into sales. Most of the time ;)

Excellent article Hannah. Presenting your story in a warts and all way really engages the reader.

I never expected politics and change management to play such an important role in SEO. It's a reminder of how SEOs need excellent people and communication skills. No doubt we'll see more SEO directors on company boards in coming years and those firms will perform better as a direct result.

It's good to see distilled is a rising star in the global SEO community. Not just because you're British, but because you're in my neigbourhood.

Information architecture is certainly a critical component of SEO. It also happens to be one of the most challenging aspects of working with clients. When a new site is being developed (or a site redeveloped) you at least have the opportunity to promote buy in to the information architecture for SEO concepts, but in many cases you are working with production sites that have numerous architecture issues and site owners with no intention of making major changes to the code. My approach in these situations is typically two pronged: identify the most search engine unfriendly aspects for correction and to encourage the creation of new content which is search engine friendly. When stakeholders are opposed to eliminating non SEF content I opt for removing those pages from the index. The bigger problem is often convincing them of the need to create new content which can fill those important keyword/ranking gaps. Many times this takes the form of viewing such an approach as "creating content solely for the purpose of SEO". At this point you have to convince them that no matter how much they like the content on non SEF pages it is adding little value, adding SEF content is the only way increase visibility on the searches most likely to drive business results. Having the ability to create such content is another major issue which probably deserves its own blog post.

Unfortunately, there is still a common misconception that SEO is a magic bullet which can take a site with poor information architecture and lacking content and 'magically' drive beneficial business results.

If you figure out how to use the facets pages for SEO, it makes the process easier. There are companies that have 'products' to fix SEO as you say with a 'magic bullet.' I have no idea how it works except that it can't. IA is a fundamental part of an application. It's like turning dirt into gold with alchemy. It doesn't work.

Last week I came across a web design firm that claimed it used an 'SEO friendly CMS' and that 'SEO was built right in to their site designs'. The amazing part was that most of the content and navigation was handled through flash! These kind of marketing pitches really do wonders for the SEO community ;-) That being said, I generally think designing a CMS from scratch is a less desirable approach than modifying an existing CMS to make it conform with SEO best practices, as much of the functionality you would be creating from scratch is fine in a CMS like Drupal for instance, so that is going to be a waste of resources. It should go without saying that even with a CMS which allows for SEO best practices there is a still a lot of SEO work left to do. Unfortunately the whole concept of building SEO into a site as some sort of fire and forget SEO strategy is still a pervasive myth.