How do I prevent search engines from picking up my page in their index?

I have a website, that should and may not get into the index of a search engine (Google, Yahoo, Bing)!

So, before I set the page online (because then it's certainly faster too late than I can think), I want to ask you, what can be done effectively, that a page will not be included in the search index - provided there are meaningful opportunities available at all.

ReplyPositiveNegativeDateVotes

SmartUser

That means, that you allow search engines to crawl your page and to take the page in their index and that search engines are allowed to follow the links available on your website. Here, you can write instead:

<meta name="robots" content="noindex,nofollow" />

Through this meta tag, you tell the search engines, that the page should not be included into their index, and if the search engines are cute, they will delete your page from their index, after they have discovered it.

If you have links on your page, which the search engines should not follow, you can use this:

<a href="page.htm" rel="nofollow">Page</a>

The rel="nofollow" tells the search engine, that it should not follow this link. However, there are also search engines, that do not investigate this information and visit the site anyway. Because, afetr all, your pages can still be called and visited even though they are marked with these tags.04/09/2012 at 12:51

ReplyPositiveNegative

Axuter

You can also write instructions in the robots.txt. The robots.txt should be available under yourdomain.com/robots.txt and it is a plain text file, in which you can just type directives for the crawlers.

To exclude the web crawlers from all areas and files of your website, just write in the robots.txt:

User-agent: *
Disallow: /

If crawlers now want to index your site, they look first in the robots.txt, and find their the statement, that they should take nothing from your own site. Then they disappear.

However, if you want to exclude just a few pages or specific directories, you can also specify that in the robots.txt. That's a bit much for this comment, so I have to written a little tutorial on this subject.06/09/2012 at 08:04

Computer Expert

Unfortunatelythere are also some search engines that do not comply with the instructions given in robots.txt and they also ignore your meta tags.

In this case, just a .htaccess file in the appropriate directory can help. These may look like this:

order allow,deny
deny from 123.456.789.000
deny from 100.100.100.100
deny from 200.123
allow from all

This in your .htaccess file, locks all requests from the two IPs 123.456.789.000 and 100.100.100.100. The fourth line ensures, that alsoo all requests from IP addresses that begin with 200.123 will be locked like 200.123.1.1.1 or 200.123.10.27.23, for instance.

Only in this way, you can be sure that the appropriate Spider really can not access your site! The only problem is, you need a list of all the "bad" or at least the most important spider crawler. And they may change constantly! For such a list, that is quite actual, you should search the Internet.08/09/2012 at 14:29

ReplyPositiveNegative

Stefan Trost

I would consider to protect your site with a password generally and to set the page not publicly into the internet.

It seems to me, that your website is not meant to be found by search engines and therefore it should probably be only available to a small circle of people. And those people, you can also give the password for the site. With this solution, you do not have the problem to exclude search engines from your website or that perhaps secret information published on the page come to the public.10/09/2012 at 01:41

Important Note

Please note: The contributions published on askingbox.com are contributions of users and should not substitute professional advice. They are not verified by independents and do not necessarily reflect the opinion of askingbox.com. Learn more.

Participate

Ask your own question or write your own articles on askingbox.com. How to do.