Enhance rel_canonical function, add filter

Description

I think it's a bit shortsighted to think that only singular pages need the canonical tag output in the <head>. Considering the fact that just about any page on your site can be accessed with a malformed URL, I think it's time to enhance this function.

The attached patch is just a first pass. But I think it gets us started in the right direction. There's also a filter before output, so themes and plugins can further enhance the output of this plugin (related #14458).

For the official rel_canonical() function, we definitely shouldn't canonicalize back to page 1 of an archive. That suggestion was to benefit our users at the recommendation of a very well respected SEO expert, Greg Boser. Yoast said he and Greg will have to fight about that the next time they see each other :-)

So yes, my patch as is won't work. But it's a start. When I get some time, I'll updated the canonical generation to account for paging.

No redirect. The canonical tag helps search engines know that when people link to the homepage with a malformed URL, you want pagerank to pass to http://example.com/ and not the other URL (seen by Google as two different pages).

I've attached a better patch, based on the code in my WordPress SEO plugin. The pagination canonical Genesis uses is at best agressive optimization, and not based on the canonical standards, so we should indeed not include that.

This patch makes sure WordPress outputs a canonical on:

singular posts, pages, etc.

the homepage & frontpage

taxonomy, category & tag archives

post type archives

dated archives

It also deals with pagination, both for paged posts and pages and all paginated archives.

joostdevalk,
Please take a look on my plugin, ​Meta SEO Pack - in particular MetaSeoPack:::get_canonical() method. It generates canonical URL for all WP pages, including multi-paged content (this is a bit tricky area). I haven't updated if for a while so some updates will be needed there (in particular use get_search_link() for search results page and add support for custom post types if needed). It should give you few hints how to improve your patch.

What's the status on this ticket? It was submitted and patched prior to the feature freeze for 3.3, but the milestone hasn't been changed. Do we need a certain number of definitive "patch works for me" messages before it can be committed?

I've applied this patch, merged it with HEAD, and gone over each line of code. And I've done this no fewer than four times in the last six weeks. I can't bring myself to commit the whole thing at this point, and for two main reasons.

I think adding rel=canonical to non-singular pages need some very careful consideration. get_current_archive_url() sounds like a fantastic idea in theory, but it could be damaging to a complicated site pretty easily.

I am worried about how a complicated drill-down URL (say, with multiple taxonomies, or a query string, or something else) would end up with an improper canonical link. This is perhaps the greatest issue when it comes to archive pages, and that's likely the only reason why we stuck to singular the first time around.

I think the pagination aspects (both rel=canonical respecting pagination, and rel=next/prev handling pagination) is much more solid. I think it has the potential to cause problems for non-singular pages, but probably not as bad as a faulty rel=canonical could.

However, I am irked that the patch avoids using any API functions for creating pagination links, and instead calculates them on their own. I understand this is of no fault of the patch — rather, our APIs around pagination links (both singular "page" links and archive "paged" links) are all over the place.

There are also a number of filters and hooks scattered about these functions, which make me question things like pagination bases and other points of customization that could break. I would like to go through these functions, potentially in 3.5, and clean them up, and make them obvious about what is going on.

The one thing that seems most safe is a single, targeted patch that adds support for singularly paginated items to rel_canonical(). And it is getting too late in the cycle to make such changes, so I would rather try to do all of this in one effort for 3.5. Sorry, yoast.

Happy to work on this patch more if someone (@nacin?) is willing to bless the task and provide me with support on how to fix this. I'd love for us to finally dig it all out and fix all the related nonsense old code in core...

I have a concern about get_current_archive_link() which appears in canonical.6.patch​, namely that it doesn't account for custom rewrite rules added by plugins. This is also a problem for wp_get_canonical_url() actually in regards to custom rewrite endpoints (anything other then page is stripped).

What about an algorithm like the following to determine the default canonical URL for a given request:

<?php$added_query_vars=$wp->query_vars;if(!$wp_rewrite->permalink_structure||empty($wp->request)){$url=home_url('/');}else{$url=home_url(user_trailingslashit($wp->request));parse_str($wp->matched_query,$matched_query_vars);foreach($wp->query_varsas$key=>$value){// Remove query vars that were matched in the rewrite rules for the request.
if(isset($matched_query_vars[$key])){unset($added_query_vars[$key]);}}}

This ensures that custom rewrite rules and endpoints are honored, as well as all public query vars.

Note: I'm looking into this for the sake of adding canonical support to the ​AMP plugin, as the AMP spec ​requires a rel=canonical link on every AMP response, even if it points to itself.