We think that it might be other site, which is having a policy of extremely bad SEO based on other sites branding and keywords usage:

thirdsite/buscador/tetonas-oursite

The question is: if other sites are generating these URLs, how can we prevent this?

Why the tag is being generated if no link was added to the other site?

What should we do with these errors? 301? 410 gone?

I have read all similar Q&A here but none of them seems to solve our problem. It is not likely to be a bad ad (Inspected them all). Maybe some all content which google decided to recrawl suddenly? Maybe third parties bad SEO policy? Maybe all of them?

When you say "other site", what exactly do you mean? Is it one website that you control? Don't control? Multiple websites???
–
John Conde♦Oct 29 '13 at 13:52

Third party sites: direct competition or others. Some urls seem to be a sum of our video title and third web video title, others are just our-video-title/feed (feed does not exist any more). Disavow? Have no idea...
–
NatáliaOct 29 '13 at 14:12

5 Answers
5

Google Webmaster tools warns against using 301's to deal with non existent URL's. They recommend letting these 404, and drop off naturally from Google's index.

As frustrating as it can be, I know because I've had the same issue. If you 301 a bad incoming backlink to a page you DO have, it can show Google that these links are/were real. Only use a 301 if a legitimate page existed and the URL changed, or 410 if you had a real page that you just decided to drop and not move the content.

Hello Jimmy, thanks for your contribution, yet errors are increasing and not decreasing and I fear that if we wait longer, this is not going to go any better. Last week we had a 1000 of new errors and I really don´t know what to do. It might be people adding toxic backlinks on purpose, or maybe old links which suddenly have been reclawled, I do not know. Apart from waiting, any other advice? Disavow? Linke
–
NatáliaNov 11 '13 at 11:02

With that in mind, 404 is a very appropiate response for those URLs. Whatever, the link is to, your server can't figure it out.

410 would not be an appropriate response. That would mean that you had the resource and it is now gone.

A 301 redirect could be appropriate. I generally redirect away from unknown query parameters. You might consider redirecting oursite.com/\&.* to the home page. Of course, then Google would just treat it as a "soft 404" and it would still show up in the webmaster tools error report.

Another possibility would be to redirect those queries to your site search. Redirecting oursite.com/&q=videos+caseros+sexo+pornos+gratis&sa=X&ei=R... to oursite.com/search?q=videos+caseros+sexo+pornos+gratis would actually show results for any content that you do have. So any real users that happened upon the link would be happy. Because site search has to be blocked in your robots.txt file, Googlebot would not crawl any of the URLs after the redirect and would therefore not complain about 404s anymore.

Another option might be to just block the URLs in robots.txt outright. You could use a wildcard match that is understood by Googlebot:

Disallow: /*&q=

Webmaster tools might still complain about all the ones that it found before you put that rule into place, but Googlebot would never crawl new ones.

What you don't want to do is submit a 250k+ list to Google's disavow tool. They will never look at it. From what I understand, the disavow tool has been overused anyway and doesn't provide much fruit.

I would recommend trying to find a pattern in the bad URLs. Check the "linked from" tab and see what sites are sending this bad traffic. We had a thousand errors that were coming from 4 total domains to our site. In this situation you can use disavow to tell Google "don't pay attention to any links coming from this site to ours."

If they stem from your site, then Google is indexing search results or pagination pages, at which point you should probably utilize some form of the canonical tag and similar solutions as well as define your URL parameters in GWMT, and check for internal linking to bogus pages.

I'm afraid there is no "quick fix" here, but rather a myriad of things that you need to do in order to slowly untangle this web. I would be willing to bet your situation consists of a little of both (bad back links and architectural issues.)