If you require a username and password to access specific content, why would you want to allow a spider access to it?

The SE doesn't want to index pages that the average user cannot see. If I click on the result and get a login box, I'm not going to be a happy camper with the SE for sending me there...

HI Scottie:I do not think the use of search engines is limited to "average users" but to less than average and better than average and all shades of users in between. I also fail to see why ALL the content on a site should not be indexed by the search engines.

If you click on that result and get a log-in or register box you may not be a happy camper, but it is up to the site how they want to allow access to their content. As an example you have valuable papers you have available for interested parties, but as a condition of that access they want you to register first (maybe even pay a fee). If you are not willing to register to access the content that is your decsion, but IMO your opinion should not be forced on all users.

This is a very normal situation with the large online research companies who require a subscription and payment before you can access their online content. Do you believe that their research and papers should not be indexed. so that a person can find them if they want them?

One would have to question the effectiveness of that, Mel. While I'm sure it depends on your target audience, I think it's a mistake to assume that people are stupid, and spoofing a user-agent isn't exactly rocket science these days. Take one not-stupid user, a forum or two that targets the same market, and all of your content just became free.

If it were my client, I would advise them to write teasers or summaries that were publicly available to allow search engines and prospective subscribers to read them.

If I'm looking for your information and I only get a login box, I'd assume the content wasn't there anymore and move on to the next result. Or I'd just pull up the Google cache and read it without logging in.

If however, I came to an abstract or summary of what was contained in the pay-only version, I would be more inclined to get my credit card out and sign up, knowing that what I want is in that document.

This is a very normal situation with the large online research companies who require a subscription and payment before you can access their online content. Do you believe that their research and papers should not be indexed. so that a person can find them if they want them?

Hmm...that's a very interesting situation.

The question is really what do the search engines think of this? Do they want to index your password-protected content which will not be available to the user unless they pay?

If they said yes to this, then I would say what you're doing is fine. My gut tells me that they would not actually want to index that stuff. But I could definitely be wrong. I am interested in discussing this with some search engine reps., however, and I am going to try really hard to ask some of them in San Jose in a few weeks.

We really would need their answer on this to be sure.

To be on the safe side, I would suggest doing what Scottie recommends. In fact, I've recommended the same thing to clients in the past. It never would have occured to me to have a search engine index password protected stuff.

I hope that the engine reps will let me know their rules about this, and will let you know what they say.

(Note to Scottie...make sure I remember this at the conference, and feel free to ask them yourself. I get a bit brain dead at these things from staying up to late and drinking too much!)

If you click on that result and get a log-in or register box you may not be a happy camper, but it is up to the site how they want to allow access to their content.

It's also up to the search engine what content they want to index. A search engine needs to see what the searcher will see to make this decision.

This is a very normal situation with the large online research companies who require a subscription and payment before you can access their online content. Do you believe that their research and papers should not be indexed. so that a person can find them if they want them?

Yes, that's exactly what I think! It's also what most free access, free inclusion, general purpose search engines think, IMO.

Some engines, e.g. the old Northern Light, allow "special collections" of paid content to be searched, separately to the free search - not as part of it.

These days, if you use a PFI program you may be OK. I suggest checking with your PFI provider first! The general rule of thumb, though, is that search engines want searchers to see what the spider saw without having to offer any kind of payment.

Scottie's solution is the generally accepted workaround.

FWIW I don't see any problem with using Content Delivery to remove session IDs from URLs. I wouldn't call that cloaking - just as I wouldn't call it cloaking to use Content Delivery to add session IDs to URLs for browsers that supported sessions ... it amounts to the same thing.

Scottie, I think, is right. And the proof is probably right in these forums.

How many people here subscribe (or have subscribed) to the Member's Area at searchenginewatch.com? Would you have done so if Danny's free content hadn't proven to you he would deliver?

And Mel? In RL, my experience has been that thieves are fairly rare. On the Internet, however, they are rampant when it comes to content or intellectual property rights. Most seem to believe that anything composed of bits and bytes, from music to software to private pages, is theirs for the taking.

Well if you do many searches you will often come up against links to articles in research sites that require payment to access them, so one would assume that the search engines do have a way of indexing them even though they are PW protected.

FWIW IMO the pupose of a search engine is to find and deliver relevant links to users in response to their queries. I have never seen any search engine say that they will only index content that is free, and if pw protected content is not indexed then the search engine has not done as good a job as it could have in indexing the web. It might be noted that content which payment is required for may well be of better content than that which is free.

There are other similar situations also, I know of quite a few sites where payment is not required to access content but registration is.

Ron:
someone who is going to set up and spoof a user agent to read a single web page is beyond my experience, but I suppose there may be those who do this just for the challenge. FWIW I believe that there are many, many more honest users than thieves, but the thieves get much much more publicity and this is contributing to a general sense of unease.

As a test, how many of you have been successfully ripped off by an online credit card scam for instance?

I would love to see less publicity about how the web is such a lawless, dangerous place and more about how the great majority of users find it a great and useful place.

If it were my client, I would advise them to write teasers or summaries that were publicly available to allow search engines and prospective subscribers to read them.

If I'm looking for your information and I only get a login box, I'd assume the content wasn't there anymore and move on to the next result. Or I'd just pull up the Google cache and read it without logging in.

If however, I came to an abstract or summary of what was contained in the pay-only version, I would be more inclined to get my credit card out and sign up, knowing that what I want is in that document.

Hi Scottie:

I agree that an abstract of the content is a great way to both give users some indication of what they may find behind the veil, and to give the search engines something to chew on.

Why would you assume that the content was no longer there if you were asked to register to view it??

At any rate pulling up the Google cache is not an option for .pdf pages, which many with valuable content use to prevent copying.

Why would you assume that the content was no longer there if you were asked to register to view it??

I would assume it was no longer there because obviously it was there when the search engine spider came by but now it has been moved or removed and there is a login screen in place of the content.

I'd just move on to the next result until I found what I wanted. How many people do you think would assume that the content behind that login was exactly what they wanted and trust it to be there enough to give a credit card number?

Actually, that would make a very interesting usability test! If you are interested, I'll run one just for fun! PM me a search query that will return a login screen and let me run it by some test subjects and see how they react.

Well if you do many searches you will often come up against links to articles in research sites that require payment to access them, so one would assume that the search engines do have a way of indexing them even though they are PW protected.

No, they can't without something further taking place. For it to take place, one of two things has almost certainly happened:

1) The site has cloaked, or2) There is a commercial arrangement between the site and the search engine (e.g. PFI)

There are other alternatives. For example, a well known Web forum precludes unregistered access from several ISPs, some of whose users abuse the forum with robots. I commonly access the Web with one such ISP, so when my search results include listings from that forum I am required to register or sign in before I can view the content. In this case the forum hasn't necessarily made provision for spiders, but it is performing a kind of Content Delivery to exclude many thousands of humans, which could be interpreted similarly. In this case, though, the content isn't really password-protected. There is another mechanism in place that makes it appear that way.

I would have to cast my vote on Scottie's abstract of the article, study, etc. In doing so, you allow the user to peruse the information to ensure that the article is first relevant, and then will provide the citation that they need.

From slaving over many research papers in college, I can tell you that many article references looked great. However, the articles themselves did nothing to support my research. However, when there was an abstract available, it made more sense to read that first, rather than an entire study or paper on the subject.

I also think that converting people to a subscription would be more effective if an abstract were presented, rather than a page with only a login/password/registration required. Obviously, I'm not going to pay for something that I can't be sure that I need. Paying for an article that is related, but counter to your thesis would be a real kick in the pants. Being able to evaluate the content prior to purchase is the ideal conversion scenario. (In a paid subscription model)