Tower of Jade

HREF rewriter in Python

While I was working at DFAS-Cleveland there was a discussion about moving our web pages from a Microsoft based hosting solution to a UNIX based hosting solution. The biggest problem we faced in this move was a strange one: Case sensitivity. Our existing solution didn't care about case of HREF values, but the UNIX system did. (A quick summary: UNIX sees THISFILE.HTML and thisfile.html as two separate files.)

Why did that matter? Great question. An HREF tells a link where to go when a user clicks on it. If the HREF for the link is THISFILE.HTML, but the file on the server is named thisfile.html, a UNIX based server won't find the correct file, users are unhappy, and eventually the world will explode*.

Now the real fun part is making sure all the links have the correct naming convention. Actually it's not fun at all. It's tedious and boring and would take forever on the DFAS site. So I wrote a Python program to do it for me. I've since cleaned it up a little and removed all the DFAS-specific stuff, so I thought I'd put it here for future reference or for someone to experiment with.

I make no promises that this will work for your system. If the script doesn't work, eats your homework, destroys your computer, kicks your dog, makes your cat explode, or otherwise causes you any kind of harm in any way (including mental stress), you cannot hold me liable, sue me, or otherwise pass the problem on to me in any way.

What you need

Python installed on your computer.

A copy of your website on your computer.

What it does

This program will start in a directory you specify (see start variable), open each file in the directory and all subdirectories, scan through the file for HREF values, convert the HREF values as specified in the clean function, and then write any changes back to the file. Read the file specified in the log variable to see what happened or see any errors.

Using it

The program wasn't intended to be interactive. Just save a copy of the script to your computer, start a command window, and type python cleanup.py. That's it. (You might be able to use some of these functions elsewhere in a reusable library or interactive product, I just never had the need to do it.)