Hey friend! Have fun exploring Q&A, but in order to ask your own
questions, comment, or give thumbs up, you need to be logged in to your
Moz Pro account.
You can also earn access by receiving 500
MozPoints
from participating in YouMoz and the Moz Blog!

Removing Dynamic "noindex" URL's from Index

6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google.

It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index.

I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?

8 Responses

You could try adding the pages you want to remove to your robots.txt file. Since you're not linking to them, and it's very unlikely that Googlebot will index those pages naturally now, this might be a better way of telling it which pages to explicitly not index.

I'm not really sure how quickly this will trigger Google to remove those pages from the index - but they do reference robots.txt on the actual "Remove URLs" page of WMT ---> "Use robots.txt to specify how search engines should crawl your site, or request removal of URLs from Google's search results ..."

For that technique, you'd want to add something like this for all of the pages you want to remove:

Disallow: /oldpage1toremove.php

That should work. If it doesn't, then I would probably just submit the requests through the "Remove URLs" tool.

Don't use GWMT's removal tool to remove URLs which should not be in the index (unless those expose sensitive information). Best practise is to exclude them in robots.txt and to also ensure that the pages either 404 or have a noindex,noarchive tag.

Is there a crawl path to them currently? One issue I see a lot is that a bunch of pages get indexed, the path is found and cut off, NOINDEX (canonical, 301, etc.) is added, but then the pages never get re-crawled. Since they don't get recrawled, the page-level directive never gets honored.

If there's a URL parameter involved, you could use parameter-handling in GWT - it's not a perfect solution, but it sometimes seems to work without a re-crawl.

The other option would be to create a new XML sitemap with all of the bad/indexed URLs. This may push Google to re-crawl them and then see the tags to deindex. It's a bit safer than re-opening the crawl paths.

If they are being crawled and Google is just ignoring the NOINDEX for some reason, I'd try to 301 or canonical those pages to a primary search page, if that's feasible (probably canonical, since you don't want the users to 301). Sometimes, if a signal isn't working for that long, you just have to shake Google and try a different signal. Even following their exact recommendations, it rarely works as planned at large scale.

The search pages I've been testing out having the canonical point to the default search page. I've seen a slight drop in these pages - but I guess I just have to be more patient.

For the other pages the path is no longer there like you were mentioning. I like the idea of setting up the XML sitemap, I never even thought of making a bad/indexed page sitemap. I will give that a shot! Thankfully this will be a quick job with the importXml function in google spreadsheets! Great tip, hopefully it'll work.

I created sitemaps for the pages I want removed using the google spreadsheet importXML functions, which saved a lot of time.

It took a couple weeks but all of the pages, and similar pages, have successfully been removed from the index. Even the similar pages I didn't get a chance to put in the sitemap yet (importXML limits the results to 100).

Hey friend! Have fun exploring Q&A, but in order to ask your own
questions, comment, or give thumbs up, you need to be logged in to your
Moz Pro account.
You can also earn access by receiving 500
MozPoints
from participating in YouMoz and the Moz Blog!
Learn more.