Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

OnCrawl Afterwork - How to fix index bloating issues?

Last October, 25th, we organized an OnCrawl Afterwork in New York with the SEO manager from Purch, Vincent Malischewski.
He talked about index bloating, which is one of the most common SEO problems that websites face today. It happens whenever Google indexes pages that should not be indexed. It can happen to almost any website as a result of pagination issues or by allowing blog categories, tags, and archives to be indexed by Google. Vincent Malischewski, SEO Manager at Purch will show how to find and fix index bloating with actionable use cases and key takeaways.

6.
Why is it an issue?
From our point of view,
our quality algorithms
do look at the website
overall, so they do look
at everything that’s
indexed.
(Webmaster Trends
Analyst @Google)

7.
Index Bloating @Purch(and @ any other company that generates a lot of content)
Some of our websites are 15+ yo. There is A TON of outdated content.
TomsGuide.com in 2002
Yet Google is still crawling, processing and indexing
all this data.
Don’t get me started on the forums…

9.
Other methods we’re using( even #3 ಠ_ಠ )
Depending on your website, business or technical limitations, you can:
• Remove the content (status code: 404 or 410 will be slightly faster)
• Re-arrange: merge, redirect to make stronger pages (preferred solution)
• Don’t do anything and hope for the best (not recommended)

10.
Identifying the content
This method really depends on your website. Because of the mass of content and the
number of brands we’re dealing with, we used a simple rule for archiving:
- Remove pages with no SEO visits over the past X months
OnCrawl will easily provide you this
data via the Data Explorer using the
logs or GA data
(I’d advise to use GA if not too many
pages)

11.
Identifying the content
You can identify indexed pages with a crawl: Below the “indexability breakdown” report (segmentation is
important)