I've been building a website that requires some strict user-input to get the content the user wants. Sorry if that sounds vague, but I'll try to explain what I mean below.

domain.com/req/subreq/sub-subreq/sub-req4/

This is the structure to request something. To put the allowed characters in regex, you would get;

domain.com/[a-z]+/[a-z]+/[a-z-]+/[a-z0-9-]+/

Now I was wondering; what HTTP response to give when a user doesn't comply to use the correct syntax?404 is for 'explaing' to the user's browser that there is no page with such title, for example;

domain.com/english/category/entertainment/yuiopasdfg/ - this will give a 404 as yuiopasdfg does not exist.

But when a user does not use the correct syntax, shouldn't the server response be e.g. 400 "The request cannot be fulfilled due to bad syntax." or 501 "The server either does not recognize the request method, or it lacks the ability to fulfill the request." Altough 501 would be a server fault.

What do you think?

Thanks in advance,xtaste

(PS: Sorry if I'm being unclear, it's been a while since I wrote in English )

But when a user does not use the correct syntax, shouldn't the server response be e.g. 400 "The request cannot be fulfilled due to bad syntax."

That seems like an acceptable way to respond. It's probably a good idea to still send a page in the body of the 400 response that tells the user specifically what they did wrong and how they can fix it.

Please remember that Apache is simply a file server so its status codes are just that: 404 is simply a file not found code.

Do not despair, though, as you have a great start on your 404 handler script ... well, the error portion of your redirection script.

First, I'm sure that you know that your regex values must be converted into key/value pairs (for your redirection script). Then, before you go to your database for the content to be presented, simply check that the relevant (key) fields contain the values specified by the request. That will allow you to specify an error message (if required) regarding each of your key/value pairs.

Jeff:I'll be sure to include a message for every error document the user gets to see, thanks for the tip (almost forgot the obvious part)

DK:Thanks It's true that it's only a server message, but I was just wondering if there's a difference between types of 404's.If a page doesn't exist and MAYBE never will, then - of course - a 404 is the right message to display. But when a request of the user will never ever be used and give a 200 OK header, maybe we should let the user know? To me it seems a difference between; "Hey, the article isn't there anymore." and "Oops, I made a typo, I'm going to fix the typo and give it another try."

I think I know what you mean, but to be sure, here's the structure;

We use the example page ; domain.com/en/cars/volvo/whats-new-5/

We redirect all urls to index.php?request=your-request

after that we do a regex check; preg_match('#^([a-z]+)(/[a-z]+)?(/[a-z-]+)?(/[a-z0-9-]+)?$#', $_GET['request'])(front page is a static page)

if this regex returns false, then - this was my idea - there pops up a 400 error.

After that we check every URL by the hand of cached database material to arrays; (no need to check the database every time, the allowed links will not be that dynamic as you can imagine)

$lang['en'] = array('region' => 'gb','title' => 'English');

... and in the English sitemap ...

$sitemap['cars'] => array(['c'] => 12,['volvo'] => 1,['opel'] => 2);

(The 'c' value is for the number of the category.)

It will check and assign numbers to the requested pages and then there will be a connection the database checking if the requested page "whats-new-5" does exist. If not; there will be a 404 error displayed. If the page does exist, we begin loading the data.

NOTE: We use cached database information because of the many database requests we will be using. It's an information/feedback system and we want our website to be as fast as possible. And so, any information that is not dynamic in the long run; we cache.

Aha! More specificity! With that, we can build a list of acceptable values within your regex:

domain.com/[a-z]+/[a-z]+/[a-z-]+/[a-z0-9-]+/

for

domain.com/en/cars/volvo/whats-new-5/

Your PHP code suggests that the first atom (provided you enclose your regex elements between the /'s with parentheses) is simply en, cars is (volvo|opel) but your regex for 'whats-new-5' (and '/') is the best you can do which still means that mod_rewrite still cannot divine a 404 for the {REQUEST_URI} (for the last value - leaving that to index.php). Therefore, you still need a RewriteMap (or, for everyone's use, a handler file) to sort the acceptable URI values from the unacceptable before handing off to your index.php. That handler file SHOULD set the response code it wants Apache to give before the redirection to index.php (or the 404 handler file).

BTW, trailing slashes tell Apache that the URI is a directory (and it should serve that directory's DirectoryIndex). This "problem" can be overcome by using MultiViews but I consider that a very bad thing to use (because it picks a "directory" from the path if it matches a filename and serves that file). Worse, it changes the directory level of relative links which might be in your scripts making it necessary to use absolute links. Of course, your index.php file knows that so you don't have a problem but many members would (for their own websites).

So glad that yet another SitePoint member has been helped with the tutorial.

About your question, I'd first have to ask whether you're using a CMS or not (and which one - although I'm only familiar with WP) as index.php's handling of requests is already a complicated process (best not to add a new module to determine whether the 'sub-req4' exists in your database (and linked to the language and car fields). Therefore, I'd say that a handler file would be the only safe way to determine that every key/value pair exists in a single record before handing off to the index.php script as it would appear to be the only way to be sure that an appropriate error message could be provided.

The question I must ask is whether all this additional processing is worth the effort to write the error notification handler script and process it before the hand-off to the index.php script. IMHO, it is not. My creation of the http://wilderness-wally.com website, where I use the article's title to generate the links to the script then use the generated title to provide the script, does NOT provide for "unnecessary hand holding" of visitors who refuse to use the website's links (failure to find the record in the database will cause the Home Page to be sent). As with Security, it's a matter of the tradeoff in terms of Cost vs Ease of Use.