One of the biggest on going thing for any site owner is to get ranked in search engines and optimize the site for search engines. SEO (stands for Search Engine Optimization) is often talked about these days. This is action of optimizing your site for, you've guessed it, search engines.

One of the biggest methods of SEO is to make your links "crawlable". Most, if all, search engines use robots to crawl sites and then classifies the content and keywords into a database. Google even keeps an archived copy of the site in case it goes down or if the page is pulled off, which is a very useful way to access information that perhaps stops being available. By having non-crawlable links, robots won't get very far on your site, minimizing the pages that will end up in the search engine's index. Not sure about all search robots, but I know that Google does not like to follow links with ? = and & in them which are needed for dynamic sites such as forums. A link like forums.php?act=viewmember&member=35424 will most likely be ignored by bots. Or worse yet, some links such as forums.php?sid=348920fs7c37f70vcv2 may be followed by some bots, but may change every time, per page as they are simply session IDs. This is a known cause of robots getting stuck on a site as it thinks these links are all different pages but may actually be going in a loop. But Googlebot, the bot for Google which is the search engine most webmasters want to concentrate on, usually does not follow these dynamic links. They must be changed to something better!

But that's how the forum is made!
There are ways to eliminate these types of urls and still maintain the site dynamic. There's 3 main ways to do it, but it all falls down to one apache module that will make IIS users wish they would of chosen the path of righteousness which is Linux and Apache. Mod_rewrite. Mod_rewrite is an apache module which should be installed by default on most web hosts. It enables to rewrite urls on the fly. When installed index.php/something will be rewritten to index.php but the /something part is not lost, but passed down to the script being run.

3 ways - all functional
The 3 ways are mostly different methods of handling mod_rewrite, and I'm sure there's other ways then these 3 but we will stick to these 3 in this article.

Generic "catch all" Redirect Script
One way to work this is to have a redirect script that takes any /act/view/member/342 type url and redirects to the index.php?act=view&member=342 and simply have it plug in the variables. By redirect I mean a location redirect and not a meta redirect (though this may work I never tried it - but location is faster and more efficient)

So let's say you name this script show.php and the forum is index.php you would change all the forum links to go to show.php/[specialcode] where [specialcode] would be any type of identifier to tell show.php what you are viewing and which ID (ex: that you are viewing a topic, and which ID). The easiest way is to simply replace ? = and & with / so instead of index.php?act=viewtopic&topic=39432793 you have show.php/act/viewtopic/topic/topic/39432793 and the script would simply replace the slashes in the right order and do a location redirect. This method is the first way I learned and is not the best so I wont bother with the php code, let's move on to a better version of this method.

In-Script Handling
Instead of changing all your links to a different script, you can make that particular script handle the /part. You can also shorten your urls this way as you can design the new url handler to work strictly with the type of urls you need.

For example let's say your script has categories and items, you could use something like index.php/category/item where category is the ID of the category and item is the ID for the item. Nothing in between, just 2 "folders". Here's some sample code that would handle this:

You would put that at the top of your script. $_SERVER['PATH_INFO'] variable is whatever is after the filename. So when you type in index.php/stuff this var holds /stuff. A server without mod_rewrite would look for a file or folder called stuff under a folder called index.php, giving you a 404 error. So if you want to test if your site has mod_rewrite simply request a file but at a slash and some random characters after and it should open up that file and not 404 on you.

The first part of the script simply fetches the path info and splits it into an array, basing itself on slashes. So $pathdata is now an array. 0 will have nothing since there's nothing before the first slash, 1 will have whatever is after the slash and 2 will have whatever is after the second slash, and so on. $catid and $itemid are simply assigned these values and can now be used throughout the script.

.htaccess
This is probably the most efficient of all, basically without even modifying any php code you can make Google friendly urls by editing a .htaccess file.

Here's some sample code that you would put in it.

The first 2 lines are just what you need to get apache's mod_rewrite engine activated. Then the RewriteRule is the heart of everything and uses POSIX standard regular expressions to do it's thing.

The RewriteRule is the heart of all the magic. In this example, all .htm files are changed to .php. (.*) means "anything" and $1 is a variable representing "anything". 1 means it's the first section where we specify to look for anything.

You would put this in a .htaccess file in the folder you want it to apply to. So if you have an html folder with a bunch of .php files you could put it in and magically make them work if requested with a .htm extension.

You could go on this way with as many variables as you want. Very tricky to get to work at first but once you get the hang of it, .htaccess mod_rewrite is quite powerful. You get just as much functionability with it as you get with Sendmail. Now we can all argue on whether that is good or bad since Sendmail is pretty complicated. :D

You can also have multiple rules per folder, so you can rewrite the urls of a whole forum if you want by adding all the rules for the forums, topics, profiles etc... and don't forget topics with multiple pages and such!

If you make all your urls search engine friendly you will notice quite an increase it robot visits which yield to more pages indexed and of course, more traffic.

One thing to note though is if you use these urls, especially when making fake folders, to the browser, there's no way to tell the difference, so any relative paths will have to be changed to absolute. Easiest way to do this is to simply make it so it's absolute to the site's home so start all your urls with the / and folder the file is in. If you were to change your domain, or have multiple domains pointing to that site then it will still work the same as relative links.

This article has been kept simple and simply enough to get your feet wet with the concept, so you can explore further by choosing one of the 3 methods and implementing it. Perfect for blogs, forums, directories, shopping carts and many other dynamic systems.

Red Squirrel
Owner

27333 Hits

Pages: [1]

12 Comments

Latest comments (newest first)

Posted by Red Squirrel on August 08th 2006 (11:35) Or do like I did for my sig and make php files .jpg.

Posted by Chris Vogel on August 08th 2006 (11:32) “Clean URLs” are about more than search-engine optimisation. They also make your site more usable. After all, which would you rather type:

The clean one has less characters and a more natural construction, meaning less chance of mistakes. You also no longer use file extensions in your URL. They’ll only confuse normal users, and what if you decide to use something other than PHP in the future? You could associate .php with something else, but…eww.

Posted by Red Squirrel on August 08th 2006 (10:03) good to know it works!

Posted by eduardor2k on August 08th 2006 (07:07) Hi, i managed to make it work...ok sort of.I've put in the root the .htaccess with this.

Thanks anyway, even if i didn't get an answer to my post, everything works fine.

Eduardo

Posted by eduardor2k on August 08th 2006 (14:40) Hi, i'm trying with this sistem because i've since i've implemented my new portal that use this style of url index.php?pagina=XXXXX i've lost my pagerank because of that, before it was 5 now it's 1 or 0...