We have a relatively new website (no external links, PR0, DA17) and we are making a major change in content. On the one hand we have to incorporate new content and otherwise eliminate much old content.

When we finally make this change, for several reasons, more than 100,000 URLs will give a 404 error. The main reason is that the URL pattern of products will change from something like example.com/1871-producto1/ to example.com/producto1/

How would you recommend to proceed in order to minimize page not found errors?

In most cases, we will do a 301 redirect. But we do not want to abuse these redirects.

3 Answers
3

As others have said, use 301 links. However, in this case instead of bulk 301 redirects manually, and I'm sure you already know this. But you could always use a regex to match the pattern and redirect traffic to the correct page. For example, redirecting example.com/product/abc to example.com will give you no benefit at all where as matching example.com/product/abc and matching the /product/ and redirecting to /product-abc/ would obviously be much better. This way, in between time Google will firstly know which is the new URL, all current traffic going to the 404 would now be redirected to the correct page hence no major loss in traffic.

Personally, with regards to massive amounts of 404's on the website and then letting them fade out. If I see a big website and get presented with a 404 error I sort of think if there not going to redirect me to a relevant page, then why should I bother. It's much more professional to redirect the user to an appropriate page that is related than just to leave the user sitting and wondering why your "website isn't working".

They will disappear and it will only affect your rank for a short amount of time. Return a 301 Response AS MUCH AS YOU CAN, every page you can save, is some PR you've saved.

All the pages which give 404, give them a custom page. It will help a tiny bit to create a custom 404 page (because you understand UX, according to Google). 404 pages will get removed from the index faster than blocked pages, for a simple reason:-> 404 header -> page does not exist, remove from index-> no crawler access. Might still be there for user, so no remove from index

BTW: You're not abusing the 301 if you use it what it's for: Indication that the content has been permanently relocated ;)

The Google Webmaster blog page you linked to was explaining how to deal with URL removals via the removal tool. You should not need to use the removal tool for normal site maintenance. And you should not block 404 pages with robots.txt as Google will not be able to crawl the pages to see the 404 response.

See here :

Don’t use the URL tool to get rid of pages in these circumstances:

To clean up cruft, like old pages that 404. If you recently changed
your site and now have some outdated URLs in the index, Google's
crawlers will see this as we recrawl your URLs, and those pages will
naturally drop out of our search results. There's no need to request
an urgent removal.