Content Pruning – the Technique That Will Protect Your Rankings from Google Panda

Time has passed since the Panda 4.0 update and there’s still a lot of talk around this subject. At that time, the headlines were filled with words of panic, which foretold an impending doom. Yet, time proved that the latest Google Panda shouldn’t be brought in discussions in terms of penalties. Instead, targeting and deranking low quality content would be more appropriate concepts to bring into discussion. So it seems that the Panda update should affect the ones who do not comply with Google’s idea of valuable content: creating high-quality content, that is both useful and entertaining to the reader.

It’s all the rage in the SEO community and specialists are looking at ways of improving their strategies in order to work around it. Most webmasters rely on digital marketing strategies that produce content on a regular basis. The content production may be represented as a blog tab on the site or just a section of press releases. But in it’s greater form, it may be a large-scale SEO strategy that prunes old content which still generates traffic. If we were to generalize to the extreme the SEO formula, you could say that your success may come from the number of backlinks and the number of indexed pages. With this raw formula in mind, the underperforming content that gets indexed by Google might pull the whole site down in ranking. The problem is that many webmasters tend to look only forward when they are optimizing their site for search engines, leaving the old content behind.

New and up-to-date content is always going to be the focus point, but people tend to ignore the old content. And that’s how everything tends to transform into a huge pile of pages that gather little to no traffic for the site.

After Panda 4.0 was run, this forgotten bundle of underperforming content may even drag your site’s ranking down. That’s why content pruning should become an essential part of your ranking strategy. Even if in the past a lot of SEOs would argue that this was a valid point of view, I think that Google Panda 4.0 showed this is a possibility.

What Is Content Pruning?

Since Google rolled out the Google Panda 4.0 Update, some webmasters were forced to reduce the number of indexed pages and began pruning low-quality content that didn’t add any value to the site. It’s not easy to take the decision of removing indexed pages from the Google Index and, if handled improperly, it may go very wrong.

Your sites’ obsolete or low quality content may be one of the problems that generate a drop in rankings.

Google made a lot of effort to make the search algorithm and detect quality content, just as human visitors would do when they read it. This means that low quality pages may affect the performance of the whole site. Even if the website itself plays by the rules, low-quality indexed content may ruin the ranking of the whole batch.

Content pruning isn’t a technique that should be taken into consideration only by the small sites or the emerging businesses. Reuters, for instance is one of the biggest and the most known international news agencies, managing to be an important player in the field since the last century. Yet, they too should prune their content in order to give the user the best experience there is. As we look at the screenshot above, we can see a list of pages that can hardly be found by searchers, therefore, they shouldn’t be indexed. Moreover, the listed pages don’t offer valuable info, have duplicate content (highly penalized by the Panda Algorithm) and definitely pull the whole site back than pushing it forward in rankings.

Why Should I Prune My Own Content?

Every site has it’s evergreen content, which attracts organic traffic, and some pages that are deadwood. In the Google Index you should only have pages that you’re interested in ranking. Otherwise you might end up polluting your rank. Those pages filled with obsolete or low quality content aren’t useful or interesting to the visitor. Thus, they should be pruned for the sake of your website’s health.

Keep the Google Index fresh with info that is worthwhile to be ranked and which helps your users.

Those low-quality pages that make up for a very stale user experience, are considered unacceptable from Google’s point of view. Your site should be cleaned of these pages because it makes for a very poor and confusing user experience for the searcher. It might look like you’re an old site with lots of pages, but if you’re not on the point with your content it only means you’re writing for the search engine not for the user.

How Should I Prune My Own Content?

To successfully prune your own content you should follow these steps:

1. Identify Your Site’s Google Indexed Pages

For this task you have two methods that you may use in order to identify all your site’s indexed pages:

Method A. The first one, explained below, uses Google Webmaster Tools. The disadvantage of using this method is that the list of website pages displayed does not give only indexed pages. Here you’ll find all the pages found by GoogleBot during the crawling. However, you have a total number of indexed pages and a graphical display of it’s evolution in the Index Status.

You can access the whole bundle of internal links for your website by going to Search Traffic > Internal Links in the Google Webmaster Tools. Here you’ll be able to download the data into a CSV or Google Docs format. That way you’ll have a list with all the pages from your website ( indexed or not ) and also the number of links pointing to each. This can be a great incentive on discovering which pages should be subjected to content pruning.

This method is recommended for very big sites that have a lot of pages.

Method B. Site:Domain Query and Extract Only Indexed Pages.

While the first method took the whole bunch of pages GoogleBot crawled for your site, this one delivers a list that contains only the indexed pages. By using the command site:examplesite.com you will only receive the actual results that are displayed in the search engine results page.

In order to hasten the process you need to modify some settings from the search engine. You need to go to the Search settings page and set the Results per page to 100. That way, you’re going to see more results per page. And you should also check the Never show Instant results option from Google Instant predictions. Given that you’re going to see much more results per page, you want to remove any clutches you might encounter.

The next step is done with the help of a bookmarklet which scans the results displayed in SERP and generates another window with a numbered list that contains the links and their anchor text. To install this bookmarklet you’ll have to click and drag this button onto the Bookmark bar above:

Be sure that you are viewing the SERP page as you activate the bookmarklet. As you can see from the image below, it will generate a list with all the links and all the anchor texts from the SERP. Remember, if you have sites that have more than 100 links, you have to repeat the process for all the results pages. You should also be aware of the fact that for big sites, the process may take a while. You just have to be thorough with this process and try to grasp as many indexed links as possible.

2. Identify Your Site’s Low Ranking Pages

To identify the low ranking pages you can use Google’s own free tool – Webmaster Tools. Google Webmaster Tools provides accurate data regarding the number of impressions, clicks, click-through-rate and average position. The feature that helps you to view all these data is called Search Queries and you can reach it in the Search Traffic category. Here you will have a list containing the top pages from your website. Identifying low ranking pages becomes a really easy process as you can order the list by the number of clicks in order to see the least performing pages.

3. Identify Underperforming Pages

Understanding a site’s structure and identifying the obsolete paths and pages are critical in order to “clean up” your content. You can use metrics from Google Webmaster Tools such as number of clicks or number of impressions in a certain period of time. This way, you can check the interest of the user in reading a certain page or his involvement in doing something actionable on your site. For instance, you can use the “Clicks” element in order to show the pages that have zero clicks for a certain query. This way, you can have some insight about what your users are in to.

Also, you can use these metrics to make data analysis regarding the performance of your content. For instance, you can download the data provided by the GWT in CSV or Google Docs. Once you have these files, you can mark the pages that are underperforming , keep an eye on them and start filtering the content posted there in order to get the most out of your content pruning campaign.

4. No-Index the Underperforming Pages

After you have successfully identified the pages that don’t bring you any added value, you should start to no-index them. If you want that pages to disappear from Google’s radar, you need to entirely block them from crawling. This way, they won’t be indexed by Google. Here are two ways you can do this:

You can disallow those pages in the robots.txt file to tell Google not to crawl that part of the website. Yet, you should keep in mind an important consideration if you plan on using the robots.txt method: be sure you do the right stuff. You may break your rankings easily with it!

You can tag certain pages. After you identify the pages that should be de-indexed, just apply the meta no-index tag to them. If you want the page to be followed by Google Crawlers you should add the <META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”> tag. That will tell Google not to index the page but at the same time to crawl other pages that are linked from it.

Don’t Jump the Gun

De-indexing some pages or part of them is quite a big decision. So you need to really think it well before you start implementing it. Even though content pruning may present itself as a workaround to the recent Google updates, you should use it with caution. As with everything in life, don’t abuse this. It’s natural to feel the need to solve your issues swiftly, especially when you’re facing major rank and traffic drops. But be sure to do it gradually and know exactly what you block so you do not block entire folders to be crawled or to block the Google crawler from accessing important sections of your site.

Conclusion

Good content takes a lot of time to be built. Pruning it might take even longer but it pays off on the long run. While focusing on providing new content to your viewers should be the main priority, you should still overhaul the old content. Neglecting jaded content could harm your website’s ranking!A content pruning campaign is not effective just from a ranking point of view but could be also a part of the content marketing strategy. Having a high-quality content will increase the overall credibility of your site, will improve the user experience and therefore will positively affect conversions.

He is the proud Founder & Chief Architect of cognitiveSEO, an SEO Toolset focused on in-depth analysis of
ranking signals. With over 8 years experience in affiliate marketing and search engine optimization and 12
years in programming and web development, he has gone from Web Developer to Super Affiliate for large
international networks. With a strong focus on everything that is Search Engine related, he developed
strategies to stand-out search engine updates. His passion for search engine marketing led him to create
his own SEO Toolset, trying to solve the issues that he is facing in the search engine optimization field.

13 Comments.

Razvan,
Liking your approach, some sites are just to content heavy and deliver no value. Mostly inherited from days gone by where as webmasters where trying to target pages at every keyword level. Had a client that had 45,000 indexed pages for local towns that were bringing in no traffic. Reviewed this, focused on the top 200 locations and de-indexed the rest. We now have a site that is more efficient and focused on their best revenue markets. The good old 80:20 rule.

syndicated content may lead to a duplicate content penalty if copied from other places and used as a ranking technique. This may work well for visitors coming from other places so I would de-index that content. It really depends how much content you extract.

As usual extremely informative post Razvan! You are right we always want to add new, shiny and SEO friendly content, with out caring about the old and trash ones lying neglected. It was an eye opener of sorts to know about the disadvantages of letting those content be resigned to their own fate.

I have to admit it’s quite convincing…but I have one question: is it an evergreen solution? How does Google see this solution for themselves?
Form my own experience I know that it’s pretty easy to insult Google and get punished.

Great post and thanks for the scraper tool, I’ve used similar ones in the past but this is just easy. Gotch SEO talks about a similar topic to this one but you’ve added a nice little twist on there – love learning new techniques like this one so thank you guys!