I have a pretty basic question. I developed a neat little website which I'm ready to upload, but still needs a bit of work. The designer needs the html to do his work so the website needs to be uploaded. Besides that, I have to correct a couple details, do the friendly-urls, etc.

What's the best way to set up the webpage in the definitive hosting with the definitive domain, blocking it to any unknown users and without affecting affecting SEO and those kind of things. If I were to just upload it, the non-definitive website might be crawled by a SE-bot.

6 Answers
6

what I usually do (and if you're working with structured websites) is to include a php script to the beginning of your index page which you want to deny its access from user/bot.
This include verifies if the session 'auth' variable is set to 1. If not I redirect the user to the auth. page (it's just a form asking for a username and a password).

I create a .logs file that contains the username and the password (crypted thanks to the php method crypt('your password')).

The authentication script is verifying if the username and the password are successfully entered

Been through this one already. The moment your site's up and online, expect to be scanned and indexed. Robots.txt with a deny all sounds good until Baidu and Yandex show up. You get indexed anyway. There are several other search engines out there that barely pay attention to robots.txt or the robots meta tag. Hackers just use the robots.txt file to tell them where to concentrate their snooping. And I can tell you that getting stuff to drop out of the index once this has happened is a PITA. Baidu still calls around for files it shouldn't have indexed and is still denied access on.

Basic authentication which can be problematic if your application uses flash image uploaders and always is requiring a login, or if your site content's not as flamingly sensitive a 403 entry in .htaccess that allows access to only the ip addresses on the need to work on it list are the best ways. They both do the same thing, issue a 403 Access Denied error, one requires fiddling, the other's just automatic.

Once you get through, sitemap your website and submit it to the big three to unleash the flood. All three hitting at once tell you if your website's hardware is sufficient for real time traffic. Adding Baidu to the big three tells you if both your application and web server are up to snuff as the combined traffic can easily take your website down.

The all time high if you really are a web developer and serious about it is to have your own test server system that completely bans web crawlers. That way you don't have Google, et. al. calling around and being told to shove off. Which might be a good thing for your SEO.

Websites only exist nowadays to service web indexers, the customers are only a sidenote (from analyzing web server access logs).

The best way is to setup a dedicated test environment. You will need it in the future anyway once you will start planning updates and changes to your site.

However, in case you (for whatever reason) cannot afford or don't want to have a test environment, what about setting some sort of user authentication? Then only people you grant access will see the unfinished site.