Over the past several days, there has been an interesting discussion on the wp-testers mailing list (though, it really belonged on the wp-hackers list, but that’s beside the point) about permalink structures in WordPress. The original question came from matthijs and questioned why WordPress was storing rewrite rules for every page on his site in a database option. Further discussion revealed that this was a side-effect of his particular permalink structure, and some really good information about good and pad permalink patterns. This information could be important for sites that use non-standard URL structures, and I thought it deserved a summary.

First, let’s look at the original question and the situation that brought it about:

Recently I discovered that the current way wordpress handles permalinks is not scalable. All rewrite_rules are at the moment held in a single database field in the wp_options table. If you have a few dozens pages and posts, you have maybe a few hundred rewrite_rules in it and all is well. But as soon as you start to have a few hundred pages and attachments, the amount of rewrite_rules explodes as well as the field size. This also depends on the permalinks settings. On one of my sites I can’t even open the database field to take a look because my browser and text editor crash because of its size.

Before anyone starts to panic, let me that this is not a general problem in WordPress. This person had a particular permalink structure which forced WordPress to store extra rules for every page. This is a situation which can be avoided by choosing a permalink pattern which allows WordPress to find your posts in an efficient way.

WordPress gives site builders a lot of flexibility in how their post URLs are created. There are several attributes which can be used, and ordered how the person likes. The default “pretty permalink” structure looks like this:

/%year%/%monthnum%/%day%/%postname%/

Which results in perlink URLs that look like:

http://example.com/2009/01/22/hello-world/

There are several structure tags which can be used to form permalinks: %year%, %monthnum%, %day%, %hour%, %minute%, %second%, %postname%, %post_id%, %category%, %tag%, and %author%. As mentioned earlier, this gives a lot of flexibility in how your URLs can appear. However, Ryan Boren pointed out:

Verbose rules are used for structures beginning with %category%, %tag%, %postname%, and %author%. Avoiding such structures is best.

This important note was subsequently added to the Codex page about Using Permalinks:

For performance reasons, it is not a good idea to start your permalink structure with the category, tag, author, or postname fields. The reason is that these are text fields, and using them at the beginning of your permalink structure it takes more time for WordPress to distinguish your Post URLs from Page URLs (which always use the text “page slug” as the URL), and to compensate, WordPress stores a lot of extra information in its database (so much that sites with lots of Pages have experienced difficulties). So, it is best to start your permalink structure with a numeric field, such as the year or post ID.

This would be a problem for any dynamic CMS, not just WordPress. If there isn’t some way to narrow down the information in the URL and map it to a specific page or post, the system must perform a lot of database searches to find the correct entry. Otto provides a really good hypothetical example:

Actually, I think this deserves a bit more discussion… Let’s consider a permalink like %category%/%postname%.

So you’re handed a URL like /mycat/mypost. You start by parsing it into mycat and mypost. You don’t know what these are. They’re just strings to you. So, first, you have to consider what “mycat” is.

First, you query to see if “mycat” is a pagename. This is a select from wp_posts where post_slug = mycat and post_type = page. No joy there.

Next, you query to see if “mycat” is a category. This is a select from wp_terms join wp_term_taxonomy on (term_id = term_id) where term = mycat and taxonomy = category. Hey, we found a mycat, so that’s good. Unfortunately, this just tells us that it’s a category, which is rather useless in retrieving the actual post we’re looking for. So we ignore the category.

Now, we move on to the “mypost”. Again, we start querying:
1. Is it a page? select from wp_posts where post_slug = mypost and post_type = page. Nope.
2. Is it a category? select from wp_terms join wp_term_taxonomy on (term_id = term_id) where term = mypost and taxonomy = category. Nope.
3. Is it a post? select from wp_posts where post_slug = mypost and post_type = post. Bingo.

The whole goal is to determine the specific post being asked for. The category is not helpful in this respect, and we have to do a couple queries just to figure out that we need to ignore it. Five queries to determine what the post is with this structure. Five queries, two of them expensive (joins ain’t cheap). And these have to happen on every load of a post on your site.

Otto then goes on to explain that this isn’t what WordPress actually does. Instead, when WordPress detects that you have an inefficient permalink structure, it stores extra rewrite rules in an option in the database, which it then refers to when presenting a page.

To finish up, let’s look at a couple of quick examples.

Bad:

/%postname%/%post_id%/
/%category%/%postname%/

Better:

/%post_id%/%postname%/
/%year%/%category%/%postname%/

In conclusion, when building a site’s permalink structure, choosing carefully can help WordPress locate your articles in the most efficient way possible.

Like this:

About Dougal Campbell

Dougal is a web developer, and a "Developer Emeritus" for the WordPress platform. When he's not coding PHP, Perl, CSS, JavaScript, or whatnot, he spends time with his wife, three children, a dog, and a cat in their Atlanta area home.

Does WP automatically do the 301 redirects or are you using a plugin? If it does it by default, would it do it as well if you went from /%postname% to a permalink that included the year and month before the postname?

I do have a redirect plugin, but that’s for manual redirects (ha ha for when i dropped the category from the URL). Anyway, I think it’s doing it automatically by matching the truncated string to the nearest match in the database (if you truncate to one letter – EG .co.uk/a – it matches the post with the next URL alphabetically). Er, not going to redo my permalinks to test your question though!

The WordPress Codex still shows that structure in several of its examples of custom permalinks. There is a note about it not being a good structure to use but it is below the examples. I hope they remove them because it’s confusing!

Thanks for the post! I started my blog off over a year ago using /%postname%, from a performance standpoint, should I change mine to /%year%/%monthnum%/%postname% ? I chose that because I like it better than a few of the other configurations. If so, could you recommend a plugin for the redirects and how long until the plugin would no longer be needed? Thanks!

Actually, the options table is a pretty good place for it. For sites big enough for this even become a problem, they’d probably want to run an object caching plugin with memcached, xcache, or something similar, so the option would stay cached in memory anyways. And again, it’s only a “problem” (which it really isn’t, normally) for those who choose certain permalink structures which are hard to resolve.

Well, I suppose I care because SEs might not care about the URL from an algorithm / primary relevancy-factor point of view. On the other hand, URLs may get truncated when used as links and are truncated when shown in SERPS. So stripping out irrelevant characters seems (well, seemed!) sensible from a ‘secondary’ SEO point of view. Hence post title only.

Plus I hate the idea of category/title as a URL. Conceptually. WP’s idea of assigning a multi-category post to the category with the lowest ID just seems wrong from a usability point of view …

Also, by coincidence, I just read this: http://yoast.com/articles/wordpress-seo/ – Jump to the 1.1 bit and see him recommending using /post/ or /cat/post/. That’s fairly common seo/wordpress advice I think … Even if not right …

VERY interesting post, I was just thinking on changing my permalink from %year%/%month%/%day%/%postname%/ to %category%/%postname% !
I read somewhere that the former structure helps in search engines, and indeed it seems more logical and worthy to use category name than date on a URL.

So, /%year%/%category%/%postname%/ would be more worthy because it helps WP to find our posts? And if I don’t wanna use dates at all, how about %postid%/%category%/%postname%?

Another question, how does WP chooses which category to use at %category%? It seems it is ramdom, but can’t we chance it during post creation? Use category in the URL is useless if we want to use a specific category and WP forces us to use another one…
Or do any of post’s categories are valid and any of them we use works fine?

Could you clear it out for me? I fear changing my permalink structure to then regret and mess with everything…

I have two sites that are powered by WordPress; a blog (160+ posts) and a content-rich website (660+ pages, rapidly growing).

Both permalink structures have the %postname% only; no category, no ugly numbers. This way, folks immediately know what the page is about; plus, I accomplish maximum SEO and the shortest URLs at the same time. In addition, folks are not intimidated by the thought that they would read outdated stuff when looking at the URLs.

Now, before someone tries to convince me that this would be an exotic approach — no chance…

Anyway, very interesting article as I, too, just discovered that my Firefox browser also crashes when trying to open that database table matthijs talks about. Frightening, isn’t it?

Is this really a performance problem? Has this ever been tested? What is the actual difference in page loading times between a large site using just %postname% vs. that same site using a standard date-based structure?

Oh, poo. Would it not be an idea to include this as a note on the Permalink settings page? My football club site uses just the postname, mainly because we like it but also because years don’t matter so much to us – we work in seasons.

I don’t think I made it clear in my article, but even using a “bad” permalink structure, you aren’t likely to run any to any problems, for most sites. As mentioned by @Marcus, you can use even just %postname% without problems, even on sites with hundreds of posts or pages.

Sure, your server has to do a little extra work to sift through the rules for a match, but it should work fine unless you have *thousands* of potential matches, and your server is already highly overloaded.

But since most of us would like to be able to handle as much traffic as possible, this information is just something to keep in mind if you want to make your server just a little bit happier.

I think %postname% is fine as long as you prefix it with a numeric filter such as year or month. Computers filter numeric values faster than varied text. Thus, but filtering efficiently at first, you increase the performance and limit the proper record set.

Very interesting, although it looks a bit as if we were shooting with plastic guns at war Zeppelins. How about turning the stuff around and shoot with big cannons at flies : stellar wordpress performance.

What if you added the word “article” or “blog” before the %postname% parameter in the Custom Structure field? … Would this result in extra information being inserted into the wp_options table in the database? Would you still need to use something like the %year% placeholder?

Actually I guess I have enough info to do it, just look in wp_options. However, I’m still developing the site at the moment, locally, and it has next to zero content. Just trying to work out the best permalink structure to go with from the start.

This seems the best if it does indeed work. I want %category% and %postname% in the url for seo and logical-looking-urlness. I *dont* want some arbitrary number, dates or post ids, before the category and postname. As this takes away from both seo and logical-looking-urlness.

And I want to keep WordPress happy.

So going with /article/%category%/%postname%/ seems like the best of every world.

It’s mildly annoying as direct competitors to the site I’m building are running /%category%/%postname%/. However, maybe I can add to seo with what I use for /article/ (I will be picking something different, but a fixed string of course).

One solid reason for using nice looking permalinks is to get numerical id’s out of the urls. Dates are on the fence.

Putting dates in the urls is a big no-no for my site. It makes the content look…would you believe it…dated! And this shit is timeless. Hahaha!

Sorry. *cough* The main content of the site does not revolve around, nor does it want referencing by, the date it was created.

Exactly. The whole idea ofusing category anywhere in your URL seems wrong to me unless you impose a really strict rule about one category per post. I started out like that, and then I found I couldn’t keep to it. But who can predict their IA that ar in advance with a blog …

You guys are making this too complicated.. MySQL, PHP, and computers LOVE numbers.. just put a unique number (like an article ID) in the first parameter of the URL.. example: /10022/I_love_this_article_from_my_blog/.. MySQL will find this FAST (searching for 10022) and the person reading the blog will like the pretty permalink.. as far as categories go, if you want people to find categories in your URL then list the category AFTER the number and before the pretty post name.. but again, why make it difficult? You are looking for database efficiency and speed here, right?

If you’re just using %postname%, then what happens when somebody passes in a URL that looks like “/category/whatever” trying to see a category archive? Also, Posts are different than Pages, which also have bare URLs. How to distinguish between them with that structure?

In short, your idea doesn’t work because WordPress is more complex than that, and capable of more than you’re probably using it for.

To be honest, and perhaps I should shut up as a result, I never really saw a difference between posts and pages. Pages are just posts that aren’t in the loop or a category (or they’re in the category of not-in-a-category). So why worry about distinguishing them? As for distingushing category/whatever, well it’s got a / in it. So (1) don’t put category in there as it’s nasty in a world where posts can live in more than one category. And (2) look for the / if you need to …

Have to confess, though, that you wrote the guide to 2.7 upgrade, and I didn’t. So I guess you know best …

Pages are slightly more complicated than that. Yes, internally, they live in the wp_posts table. However, their handling is different enough to set them apart from the rest of the system. Especially with regards to permalink structure, as they have several special features. They don’t obey any settings with permalinks: they live under the site root + page slug URL. They have hierarchical layout. They have special templates that they (and only they) can use. They are undated (or rather, their dates are ignored and not often used). They have no categories or tag capabilities. And so on. They’re very special cases, regardless of where they live.

Because then there is ambiguity between posts and pages, for one thing.

And when there is no “efficient” identifier to work with, the database has to do a full table scan to find the correct post. When using date or ID based identifiers, the database indexes come into play, and it can find the correct post more directly.

There’s a lot of confusing mojo going on under the hood to map your URL into the items that WordPress needs to fetch from the database to build your pages

Given a URL, WordPress first applies the rewrite rules to the URL. The rewrite rules are generated based on several things, and they essentially allow WordPress to take the URL and determine that /2009/02/05/my-post breaks down into year=2009, month=02, etc…

Because WordPress has a built in set of things called “Pages”, these can interfere with the rewrite rules when the first parameter is non-numeric. For the numeric matches like year, WordPress can simply look at it and say “hey, that’s a number, it’s probably not a Page, and hey, it fits in with the %year% he has in the permalink structure, so let’s go with that”. But a Page will have a URL like example.com/blog/pagename. So WordPress can see “hey, that’s not a number, it must be a Page name, let’s skip ahead here”.

So, when you use the four non-numeric items (%category%, %tag%, %postname%, or %author%) first, then it’s hard for WordPress to distinguish between one of those and a Page name immediately. This is why the rewrite_rules option expands. Suddenly, WordPress can’t use the numeric shortcut any more. Now, it has to take that string and compare it against all of the possible Page names and see whether it fits or not. This is a big performance hit, obviously, but it’s also a big list of Pages that it has to generate as well. Internally, this is called “use_verbose_page_rules” and it defaults to off, only getting turned on when there’s no other choice.

Anyway, regardless of what happens, the end result of the rewrite handling is that the URL is broken down into components (year=2009, month=02, etc…), and then this is filtered through to the WP_Query system, which determines how specific of a page it can get. The most specific wins. Meaning that if there is enough information to get a single post, then it gets a single post. If it only has category, then it gets a category archive. If it has year and month, then it gets a monthly archive, and so forth.

Again, using a simple URL structure like /%postname%/ is not going to cause problems for 99% of the sites out there. It might detract from site performance, if your site has a huge number of pages, and if your site handles a large number of hits per second.

Also, WordPress is using a lookup table under the conditions mentioned, that’s what the whole deal about rewrite rules stored in the options table is about. It’s a set of URL -> page mappings put there in order to improve performance, so that we don’t have to do a whole buttload of database searches against non-indexed fields.

Like Shane, I’ve simply been using /%postname%, thinking that the simplest url would be the best.
OK, so now if I change that and put /%year% in front of it, will WP automatically take care of the redirects?
Also, what will happen in the Search Engines? Are there other consequences to changing the permalink structure?

I’m new to WordPress, need to get my link structure done ASAP, and don’t have enough privileges on the machine that’s hosting the site to play around (and my home Ubuntu box is misbehaving); I am a WordPress Administrator, so I can see the settings pages, they just can’t write to the files. Could someone please let me know what to tell my SysAdmin?

I am using WordPress to store a few dozen pages of info useful to three teams in my work group. The docs are strictly divided into four categories corresponding to “Everyone” plus each of three teams. I’m creating the docs as “Pages”. URLs with meaningful names that don’t require one to know what year the page was created or a random number would make sense. After reading earlier comments, I’m no longer sure I can edit the slug for a Page (as opposed to a Post). Is that true? I’d like to use a URL like “example.com/docs/foo”, where “docs” is my WordPress root and “foo” is a short version of the page name “How to Foo for the Baz”. I could then list all the group’s pages under a page called “groupname”. I don’t care about SEO (it’s all internal), but would like to search locally.

As was said already great article. I’ve been telling people in IRC to avoid category whenevery they can, but when you say its useless you’re absolutely right. It’s great to know there’s also a serious speed reason to avoid it.

Great information thank you, I just installed wp on my blog and was playing around with the structure of the rewrites, I settled on leaving the year, month, date, and title. But after reading this i am going to add category as well into my permalink structure thanks!

This article is very helpful. I am setting up a new blog and want to get it right from the get go. I understand that from this article you are indicating to do something like date/postname or id/postname. I am curious though why all the SEO guru blogs, like Matt Cutts, etc are using simply blogname? Have they not ran into this problem? They post tons of stuff and it hasn’t seemed to be an issue for them? Thanks for your time in advance!

Just to reiterate, using one of those ‘bad’ structures is only a problem if your site has many, many posts/pages. And there are plenty of WordPress admins who will tell you that they run high-traffic sites with that same structure, with *lots* of posts, and have no problems at all.

And the use of a caching plugin like WP Super Cache will just about nullify the problem anyways, since it will cache the pages to disk, bypassing the need for the PHP + MySQL processing to look up the page mapping, anyways.

Currently I am set on default, but would like to change it – would Year & Post name be good? Also how do I avoid all links out there be re-directed correctly to avoid any broken links? Is there a plug in anyone uses for this? That is my biggest question. Also by doing this does google have an easier time indexing you?

Supposing that it is important for a site with thousands of pages to have /%category%/%pagename%/, an efficient strategy should be able to be hardcoded somehow by:
1. Making pages urls start with a static dir, ie. “domain.com/page/myPage”
2. After checking for that static page indicator, force the rewrite rules to start from the back of the URL…
ie. “domain.com/category1/category2/mypostname” …
parse from right to left, and just lookup “mypostname” straight away.
Since category slugs must be unique anyways, category pages should also work ..
ie. “domain.com/category1/category2″ … should just lookup category2.
Am I missing something here?

Dougal,
Thanks for tackling this question! I have wondered about it for some time. @lorelleonwp gave me the same answer, but I wanted more detail about why the database is less efficient when you use only %postname%. You have supplied the detail! @willnorris directed me to your site.

Regarding @suzecampbell ‘s tweet about voice to text software, I know little about it, but today a friend was raving about http://www.livescribe.com/ so I thought I would share the link. I have no affiliation with the tool and neither does she, other than that she plans to buy one.

OK. I switched to /%post_id%/%postname%/ structure from just /%postname%/ and now the permalinks to my Pages don’t work. Gives a 404 Page Not Found error. Once I switch back to my original structure, it works fine. What is going on?

Well, sure. If you switch to a new permalink structure, the old links won’t work anymore. They’re called “permalinks” because they’re supposed to be permanent. Changing the format is not something you can do lightly.

You’ll need some sort of extra plugin in order to redirect old permalinks if you want to switch to a new structure.

Could your redirect plugin be erroneously trying to redirect your pages as well as posts? In your old permalink structure, posts and pages shared the same URL namespace. If the plugin doesn’t take that into account, it could cause problems.

I removed the /%post_name%/ from the old structure field in the Permalink Redirect plugin settings and now it works. Now I just have the first structure I started out with i.e. /year/month/day/post-name/ in the old structure field.

Strangely, the /post-name/ URLs still get redirected to the new structure /post-id/post-name/. What gives?

Why doesnt WordPress create a simple mapping table of ‘pretty url’ to ‘non-pretty url’ instead of using very slow regular expressions? Then when a url comes in, it can look that up in an indexed table and very quickly determine which page/post to load.

I wish I had discovered this a few years ago. There is lots of discussion here and on other sites about which is best. That is great for new sites. But for a site like mine with hundreds of posts, changing the PL structure to one with a number at the front causes all my posts to be not found. I have not figured out how to fix this yet, so am leaving the structure as /%category%/%postname%/ for now.

I have about 1200 posts on my blog so going back and making corrections would be a monstrous task at this point. If I ever create a new WordPress blog I’ll keep the permalink structure in mind though…thanks!

I’m using WP as A CMS.
I set a Static page as home.
set Permalink Structure to: /%postname%/
my pages address are like: mysite.com/page3
my blog home is also mysite.com/blog
The problem:
I want individual blog post to be like this: mysite.com/blog/post5
right now they are like this: mysite.com/post5
how can I do so?

I think this discussion is obsolete. Why not do away with “permalinks” in the DB?

In my opinion, the whole concept of storing “permalinks” – a hardcoded link for each page in a database is mad! Even if WP had a mapping table (post <=> link) the server must still look through potentially thousands of records with each request. On top of that comes the headache if anything needs to be changed at some point.

I’m puzzled as to why WP doesn’t just use some general rewrite rules that you can specify and which are mapped to querystring parameters, just like in .htaccess. The same rules could be applied in a function to create the URLs for display on the site.

Say you had a CMS for shops. Each shop was a post. The shops had products that was also posts. Here is an example in the syntax used by htaccess, but rules could be handled by the WP parsing function just the same:

Remember, the problem discussed here only applies when the configured permalink structure causes ambiguity between matching permalinks for posts and pages. Under most normal circumstances, the WordPress rewrite rules are very efficient.

This inefficient URL parsing scheme is why you should avoid doing large sites in WP. We have a client with over 500 pages and the regular expressions that WP has to parse for every page load are well over 1meg of text. In fact, WP is getting so slow due to its size that it barely lets us delete any pages… the irony. Why not create a simple look up table of URL to page/post like Drupal does? Regular expressions are definitely not the best approach, and having to serialize them every page load is worse. This is not a problem ‘every CMS’ has, only WP.

Soooo… How’s about the new Permalink option ‘Post name’ since 3.4? Is that still as inefficient and should it then be avoided or has something changed in the way WP parses requests or stores rewrite rules?