Hiding text from search engines

Hi.

A user of a site I run is complaining that the site displays her full name (which she registered under) and that her posts and personal page at my site now show up at Google when you google for her name. I can understand why some people don't want that to happen. But I also don't want to lose the good Google placement my site has (I've got a high PageRank, so her page at my site will show up as the top result for her name).

Does anyone here have a good idea how to hide, for example, author names in a forum application, in such a way that the browser renders them but search engines will not pick them up?

One idea I'm contemplating is splitting the user names from the page itself, putting them into a JavaScript method which is loaded from a separate URL, and placing the names into labeled "span" elements after loading the page. That would, in effect, mean I'd have to generate each page twice from the database driving it -- once to render it and once to gather up all the names and bundle them into the JavaScript.

Any in-page substitution scheme I can think of would involve placing the names in JavaScript variables, where they would be indexed along with the page again. So that's no use. Rendering to an image is also out of the question, due to font and presentation issues.

"A user of a site I run is complaining that the site displays her full name (which she registered under) and that her posts and personal page at my site now show up at Google when you google for her name. I can understand why some people don't want that to happen. But I also don't want to lose the good Google placement my site has (I've got a high PageRank, so her page at my site will show up as the top result for her name)."

Are you talking about the info displayed with her domain (through her registrar?). If so, you could use a service such as domainsbyproxy and hide that info.

I don't see how you are going to get rid of her name. Especially because you want to keep the search results the same.

What makes you think that search engines would index javascript variables ?Even if it's the case, you could add some simple encryption, like rot13.

Pakter
Saturday, November 25, 2006

Deleting …Approving …

"A user of a site I run is complaining that the site displays her full name (which she registered under) and that her posts and personal page at my site now show up at Google when you google for her name."

I agree that the user changing her name would be the obvious solution.

Justin -- no, it's a site where people have blogs. Among other things, the user account info contains a realname and I'm displaying that as part of the blog (i.e. "Blog operated by Yourname Here") since I find that a bit more personal and friendly than just using the login handle everywhere. The names are being picked up by Google and some people don't like that. I don't need to have the users' named indexable but I DO want to have the content indexable (it's about travel, so search is one thing that drives people to the site).

Pakter - thanks! I was thinking along the same lines. Put the data into the page in spans, in rot13 and then onLoad() iterate over the page model and decode the contents. I was under the impression that Google would also index the JavaScript parts between "<script>" and "</script>" tags as long as the contents were in the same page.

Not the best solution, but it is possible, too: using images, For the user signature, may be using images with the username on it may work.

But if the name appears in the content of a post as plain text, then the problem remains. (Well, you can check every post and try to find usernames and then replace them by images, but it does not seem a solid solution.)

How about giving users the option of their site being indexed or not - use robots.txt or equivalent. If what is expected is for part of the content on a site to be indexed and some not then this doesn't look feasible - it's presumably all or nothing.

Arethuza
Sunday, November 26, 2006

Deleting …Approving …

I've gone with the UserAgent idea. Obviously, I also include other UserAgents in my check (including archive.org).

I realize, as Nick also pointed out, that this is potentially a "black-hat-style" approach which COULD trigger a red light with Google. It still seems to me to be the best solution for the problem at hand. Simply not including the names in the page works in all cases, is independent of JavaScript being activated and leaves the browsing experience unchanged for the user.

Not having the content indexed (or leaving that up to the users) would be a problem for me. First of all, I'm running AdSense on the site, so Google needs to be able to index the content. Also, I want to bring people to the site based on their searches. I'm providing the site as a free service. I don't want to introduce "premium paid membership" with the option of hiding your name or anything like that.

An alternative would have been to only show the full names to logged-in users. Obviously, search engines won't be able to log in. But that would make the site unwelcoming to new users ("Blog created by [UNDISCLOSED NAME -- PLEASE LOG IN}" ...? No thank you).

As re: Google banning due to UserAgent fiddling, I'm hoping that the change is miniscule enough to let me get away with it.

Unless this lady is paying an obscene amount of money or has an incredibly unique name, I don't think it's worth fixing the problem.

Imagine you're John Smith and you type your name into Google's search box.

Fundamentally, she chose to post to a public forum that's indexed by Google. You've made no secret of that fact. It's just mere coincidence that her posts come up first, and as soon as someone starts contributing more valuable web pages that contain her name, her posts will drop down the rankings. If she didn't want people to be able to find her stuff using her name, she shouldn't have used it in the first place.

Having said that, I do think it's worth trying to appease her by offering to change her name within those posts. Just keep in mind that Google does cache web pages and even if you did, it may be months before she stops showing up in searches.

I'm also against the whole "hide text" idea because it would be too easy to abuse and I'm sure Google would put a stop to it quickly. Imagine crafting a page with lots of Disney relevant key words while hiding the pornographic words from search engines?

TheDavid
Monday, November 27, 2006

Deleting …Approving …

Anyone who deliberately and voluntarily puts their own "personal information" on the internet and then gets shocked that their personal information is on the internet isn't smart enough to be worth caring about.

Problem solved. ;)

Honestly, "I was walking down the street and someone saw me" just isn't a valid invasion of privacy complaint, and "I wrote something and put it up in public and then someone read it" is a stupid complaint.

Monday, November 27, 2006

Deleting …Approving …

Are you explicitly asking for their real name when they sign up and then displaying it without warning? If so, that's a very blatant violation of a user's privacy and I can understand why they'd be upset. It should go without saying that when a user gives any sort of real life information -- real name, phone number, location, email address, etc. -- that this information should be held private unless permission is explicitly granted.