If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Enjoy an ad free experience by logging in. Not a member yet? Register.

WGET and the OPTIONS Indexes directive

So I just discovered wget, and how powerful this tool potentially is. I would like to know how to safegaurd against it if it is at all possible. I am not really sure how it works; I just figured it out, and I am able to recursively download from a couple of my domains. I havn't tested it on my PHP code, just images, so I don't know how the server will actually send the PHP. As PHP code, or as HTML code that the PHP script outputs. If it is by HTTP protocol, I think it will just send the HTML markup but I am not sure.

Will denying Indexes with the Options directive safeguard against wget or do I have to do some more advanced configuration? Help here is appreciated.

Will denying Indexes with the Options directive safeguard against wget or do I have to do some more advanced configuration? Help here is appreciated.

In general, unless you have an explicit need to list the files, you should disable indexing. Spiders can still crawl your pages to retrieve the images/files you use on them(wget can do this), but they can't get a list of everything in your folders and follow it recursively, if you disable the indexes. They also can't see the source of your PHP files because they are parsed by the server when they are requested. An exception would be if you named something .phps or an extension that is not handled by Apache(like .phpbak for example).

To disable indexes for your site put this in an .htaccess in the document root:

Again, thanks for your help. If I disable indexes in an .htacess file in the root directory, would I be able to override it in a sub-directory or no? There are a couple of places where indexes are convenient.

In directories where I did want to index, would denying spiders in a robot.txt file, and setting a valid-user requirement with basic authentication be sufficient to to stop recursive downloads of the entire folder?

If I disable indexes in an .htacess file in the root directory, would I be able to override it in a sub-directory or no?

Yep.

Originally Posted by JamesOxford

In directories where I did want to index, would denying spiders in a robot.txt file, and setting a valid-user requirement with basic authentication be sufficient to to stop recursive downloads of the entire folder?

No, not really. robots.txt is more of a suggestion and only well-behaved spiders will follow it. You may just end up making it easier for people to find the directories you don't want indexed... so they can index them. That is, if you are worried about bad robots to begin with.

At this point it is more of a hypothetical, than a true concern. The basic authentication won't stop them? Won't they get a 404 redirect instead of a 200 OK if they tried to access the directory without authenticating?