Site Search, Dynamic Content, and SEO

I recently shared at SearchFest how large-scale site search and dynamic content can be problematic for SEO. Not surprisingly, there were many questions at the conference and online afterwards asking me for more specific information. This article is an attempt to set clarity on the subject.

It’s a difficult topic for a few reasons. One is that many of the trends SEO professionals see are the result of industry happenings that are not publicly known. There is a lot that happens in the world of enterprise SEO that will never make front page news on SearchEngineLand. There are reasons why this information should always be kept confidential; RKG is not in the business of “outing” anyone else in the industry; we are in the business of taking care of our clients and staying on the cutting edge of search. While it’s tempting to talk openly about some of the things we know, it’s prudent to keep certain things confidential.

But there are things we can learn, and share, from what happens inside the world of SEO. In fact, it’s often learning about the inside stuff that gives us glimpses into trends and the future.

Bad search result pages can be like a potemkin village.

“Search Results In Our Search Results”

Google is famous for saying, “We don’t want search results in our search results.” I remember speaking at SMX Advanced a few years ago and showing examples of what I felt were great site search SEO strategies that Epicurious was using. A representative from Google (someone I very much respect) was on the panel with me, and invoked the infamous quote. It’s been a few years since that panel, but only in the last several months have we seen Google get more proactive in discounting certain types of search results and dynamic content. The words certain types of search results are very important here. Not all search results are ‘bad’ for SEO. Jamey Barlow of RKG and SEO extraordinaire says it best,

“Are these pages truly relevant and helpful to a user? Otherwise you are creating these Potemkin pages that are obviously designed to fool the search engines. That’s a bad business to be in and it ignores the first rule of SEO: that value to the user is value to the web.”

It makes sense that Google would not want blatantly low-quality search result pages ranking well, especially if their data show lower overall user engagement. Some of this action may fall to Panda-related algorithm updates. When there is insufficient content on a search result page, and that result is not particularly relevant for the terms the site is “going for” in their title and headings (for example), Google should be able to discount this automagically with their classifiers.

Site search carrying the SEO load; categories weak.

It may be more than that in the cases we’ve seen, however. The first example is a Fortune 10 website and household name that has made use of search results to a large extent. While it’s known within the organization that this isn’t a sustainable strategy, their reality has been formed more passively than aggressively. Within large enterprise companies, projects are not easily implemented (especially big projects). Often a current strength or weakness in SEO is the culmination of months or years of not getting the right things done, rather than a proactive agenda to drive SEO in a particular way.

When we investigated this site several months ago, we found a large dependence on site search pages for their non-branded organic traffic. While products performed well (a great sign), category pages were stunningly weak. The combination of too much dependence on site search pages and weak category pages created for them a highly unsustainable situation.

Looking at data today we see this large site and another Fortune 50 competitor of theirs both dropping precipitously in the number of organic keywords from Google. This data is via SEMRush, which can at best be considered directional. It is not precise data. In fact, it could be completely wrong. However, based on what we know about the industry and what these sites were doing, it seems highly coincidental that their traffic would decrease around the time we heard of Google looking more closely at site search and dynamic content.

Two big ecommerce brands with site search and/or dynamic pages.

It’s Not Just Site Search

It’s not only site search that can be problematic. Dynamic content – pages generated on the fly – can also pose problems. This is not a carte blanche statement that must be adhered to in a general sense. Every company and SEO must take into consideration their experience, the site’s strengths and weaknesses, and their appetite for risk when making a decision about using dynamically generated pages. Done right, they have the potential to be powerful tools. Done wrong, they can create large issues for websites.

Use of dynamic content (and for that matter, site search) are SEO techniques that have been heavily relied on in the past. Four or five years ago it was a fairly novel approach and fewer sites were using the technique. Today, it’s all too common to see it done poorly. There are good technologies available, but I suspect Google is taking a close look at all forms of dynamic content if they’re designed with SEO in mind. It’s a slippery slope and a controversial topic, to be sure.

How To Do Site Search and Dynamic Content Right

In our experience, site search pages tend to convert higher than conventional category landing pages on ecommerce sites. Certainly there will be exceptions, but we have seen the patterns too frequently to ignore. If the pages are fast, it makes intuitive sense: shoppers want to see everything a store offers right in front of them, rather than a curated list that may not include what they’re looking for. This introduces a bit of tension: high-quality category pages can rank very well, but shoppers convert higher on search result pages. What should you do?

Think about methods to add quality – and relevance – to your site search pages. A site search that is highly relevant, optimized in all the right places (URL, title, headings), and has unique content is no longer a poor quality page; it now has potential to be a targeted and valuable page. Maile Ohye writes,

“…if your site design surfaces category pages similarly to search result pages, adding valuable content to the page makes the content more helpful to the searcher (and no longer just search results).”

The key goes back to what Jamey Barlow said: it’s about making a relevant, quality experience. Too many SEO strategies rely on site search because it’s easy and it scales well, but they forget to think about the overall user experience.

I’m not saying every site should go create unique content for each of their major search result pages. Sites that make use of site search heavily can benefit more from these (especially if they’re already powering category and sub-category selection) by adding content and other quality signals to the pages.

Dynamic content is harder to get right. By nature, anything automated will have some sacrifice. Compared to human-created content, dynamically generated pages usually won’t be as high quality, or as relevant, or include as much originality. With automated tools you benefit from exactly that: automation. You also benefit from scale and efficiency. What you lose is quality, potentially relevance, and potentially the user experience. To me the most important question to ask is, are these pages high-quality and will our users love them?

The other major issue with site search and dynamic content is that they both can introduce duplication. Site search is infamous for creating infinite variations of pages with the same content but slightly different URLs. Dynamic content can cannibalize a site’s ‘natural’ pages by creating slightly overlapping topical themes and keyword targets that compete with each other. We’ve seen both cause problems.

Technical Methods for Handling Site Search

GWT parameter handling tools are effective.

Keep in mind the following tools:

Rel Canonical

Meta Noindex (Follow and Nofollow)

Robots.txt Disallow

Nofollow (link attribute)

Webmaster Tools Parameter Handling

Rel Prev, Next

Let’s talk to the particular benefits of each.

Rel Canonical: this is your go-to for everything duplicate content. Rel canonical tags work much like a “soft 301″ and will appropriately pass equity while removing the duplicate URL from Google’s index. Bing follows these clumsily in our experience, and as yet still doesn’t support them cross-domain. On the downside, anything annotated with rel canonical must be crawled to be counted: this does nothing to make search engine crawling more efficient.

Meta Noindex: think of this as a method to noindex a URL at the meta level, rather than the link level with nofollow, which we’ll cover below. URLs marked with meta noindex will still get crawled, and unless the annotation specifies “nofollow” as well, the links within a noindex’d page will also be crawled. Internal PageRank can still flow through the links on pages marked with ‘noindex, follow’. This can be an effective tool and we continue to recommend it in certain cases. However, like the rel canonical tag, meta noindex’d URLs must be crawled to be counted.

Robots.txt: the sledgehammer of SEO, disallow rules here will put a brick wall between your content and Googlebot. This can be a very good thing, but proceed with extreme caution: it is not a subtle tool. Robots.txt is quite effective at blocking Bing and Google (and whoever adheres to web standards) from crawling, but it is not as strong with regards to indexing signals. Robots.txt excluded pages don’t pass any equity, do not get crawled, and if they’re indexed may stay in the index or may fall out slowly over time. More frequently they become what we term, “suppressed listings” in Google’s index, where there is no title or snippet information, only a URL. This happens when Googlebot finds a link (usually on another site) to a robots.txt excluded URL and cannot crawl it.

Nofollow: the nofollow link attribute is a strange little animal. It does so many things: it discounts links that “aren’t trusted” or that are paid. It stops equity from passing. It generally (with exceptions) stops Googlebot and Bingbot from crawling. It is a very fine tool, however, and it’s greatest strength is the ability to do this at the link level rather than the meta level. For cart pages, certain overhead facets, for sorts, tags and the like, nofollow is still a tremendously useful little tool.

Parameter Handling: entire posts could be written on the Google and Bing parameter handling tools. They are fantastic, especially Google’s, and can work quite effectively. They are entirely focused on the crawling experience of the engines, not indexation. Because of that, they can have great influence over indexing. See this useful article by RKG’s own Ben Goodsell for more details.

All that said, what’s the right approach for site search pages? Typically you’ll want to use a combination of tools. Robots.txt is the most emphatic and easiest method (if you can live with the PageRank vacuum). However, if you already have tens of thousands of site search pages indexed you’ll want to use meta noindex (just keep in mind crawling bloat). Parameter handling can be very effective, too, provided your URL query strings are encoded in a series of field-value pairs.

In every case, I would look specifically at the site in question and make a recommendation based on its particular situation. Unfortunately, “it depends” is the only responsible answer here.

Comments

“Robots.txt excluded pages … if they’re indexed may stay in the index or may fall out slowly over time. More frequently they become what we term, “suppressed listings” in Google’s index… only a URL. This happens when Googlebot finds a link… to a robots.txt excluded URL and cannot crawl it.”

What happens if you place a rel canonical link on an indexed, excluded page? Do you think that could speed up the process/ Google would get the hint to drop that URL?

When deciding if one wants internal site search result pages indexed or not, one needs to also take into consideration the target market too. More specifically, one needs to know if there’s demand for such phrases and act/block accordinly. For example, is there a great demand for “black bar stools” or “brushed steel bar stools”? If yes, then better build content-rich, user valued pages for those searches and keep them indexed. I’ve worked with a couple of clients in the past where we’ve seen extremely specific search engine queries – and those had the highest conversion rates.

Internal site search result pages requires quite a bit of analysis and resources but it can be rewarding.

Hi Adam, I appreciate that you cannot disclose the name of the sites that you are mentioning in the article. Could you however share with us on which date last year you did notice their drop in long tail keywords? Best, Hervé

I am investigating a drop that occurred on the 17th of Nov in UK ( a month after the drop of the fortune 10 site you are referring to in your article ). Our site is being exposed to a high “internal search” risk and I am looking for possible similarities in the drop patterns.

The main characteristic of our drop is that it is site wide with a modest traffic drop (-20%).

Would that drop pattern fit the ones you have encountered? Maybe not for the fortune10 site – as I would guess than fortune10 sites behave differently than smaller ones – but the other site you referred to…

Adam, great post and was a great follow up read after your Robots.txt post from last year. I wanted to get your take on blog category pages. I’ve seen numerous instances of these pages doing very well organically in the past. To me they seem like curated hyper niche mini-sites. So if you have a blog about SEO and a category for Title Tags, then that Title Tags category page is a nice hyper relevant and hyper niche resource. And I agree with and have implemented what you advised as far as developing category and site search pages out more to add value and it works beautifully!

But more to the point, do you advice against indexing category pages in blogs and why? I know a guy that has a publishing empire of sorts and he has the meta noindex on all his category pages, from an SEO plugin he uses no doubt. And I don’t really see good PR flow throughout the site and in individual blog posts. My assumption is that its because he is blocking his category pages.

Miguel, I think it depends a bit on how much content the blog has published and how many categories have been created. A blog with too little content and too many categories creates a lot of thin, low-quality pages. A blog with a ton of content categorized lovingly creates category pages that can be quite valuable. An even more valuable addition to these pages would be a bit of content about the category and writing contained therein.

Thanks for the great article. You’ve included a lot of info here that I hadn’t thought about before, such as meta noindexing and rel canonical. I agree 100% with the assertion that Google is looking to rank high quality- relevant content and is getting really good at filtering out a lot of the junk out there. It’s worth spending the time getting great content created!

Generally, site search field is considered as one of the most significant elements of navigation. I have learned that search implementation on your own web site is simple should you take an advantage of the tools as well as the technical knowledge. You have just proven me right of everything I learned.