Follow this Blog!

Jan 28, 2012

Amazon Say I have a database table called 'article' and it contains the absolute URL of each article in column 'page_url' (e.g. http://www.mensfashionforless.com/zara-black-leather-jacket.html) and I'd like to change the contents of 'page_url' so that it contains ONLY the path of the url (e.g. /zara-black-leather-jacket.html).

How do I do that in pure MySQL commands?

Solution
First you add a new column called 'page_url_old' to preserve the old absolute URLs just in case you want them back later. Remember: Always keep a backup unless you have a good reason not to.

alter table article add column page_url_old varchar(255) default '' not null
Next you copy the contents of 'page_url' over to 'page_url_old':

update article set page_url_old=page_url
Next let's update the table so that 'page_url' column now contains the paths of the URLs. HOWEVER before we do that ALWAYS use 'select' first to see what the results are in case you've made a mistake in your SQL syntax.

To that end here's the logic. We find the last '/' in the URL and extract the sub string from that slash all the way to the end of the URL. We'll use SQL's reverse(), left(), locate() functions to accomplish this task as follows:

select reverse(left(reverse(page_url), locate('/', reverse(page_url)) -1)) from article
You may be wondering why not use substr(). It's because we need to find the LAST slash but MySQL does not have such a function. So we need to reverse the page_url and find the slash index. Then we extract the sub string from the beginning up to the slash; then we reverse the sub string to get the correct path.

An Example
Here's an example if you are still confused. Say the original page_url is 'http://www.mensfashionforless.com/zara-black-leather-jacket.html'. Here's the process of extracting the path from this URL:

You execute this 'select' command and realize it's returning the correct results. Now you can be confident that by employing the identical logic in an 'update' clause you'll be able to set 'page_url' to the correct values, as follows: