I just said what I said and it was wrong
Or was taken wrong

logged into the library web site using all my family’s library card numbers,

gathered the information on all the items we had on loan,

formatted it into a nice HTML table, and

emailed my wife and me the results.

The script ran every morning and was really useful, because it put all the information in one spot and let us consolidate our trips to the library. No more “Oh, I should have returned that book, too; I didn’t know it was due in a few days.”

Here’s what the daily email looked like on my iPhone:

About a week after I’d debugged the script and started running it regularly, it failed because the library changed its login page. OK, that’s life in the screen-scraping world. I decided to dig into the code that evening and change it to the new login page. By the time I got started, the login page had changed again, back almost to the same state it had been the day before.

That seemed weird, but I made the few changes necessary to get the login working again and also cleaned up a new chunk of code I’d been working on to include the items we’d put holds on. I described the expanded script in a post at the end of February. Here’s what the new holds section of the email looked like on my iPhone:

The pink background for the items ready to be picked up is also used for items that are due (or overdue).

Several days ago, the library changed again, and this the changes were significant and didn’t disappear after a day or two. I emailed a politely-worded WTF message to the webmaster and asked if the login page was going to settle down. (I also took the opportunity to complain about a few broken links on the login page as well as some poor formatting due to a typo in one of the HTML tags—all of which were promptly fixed.) The webmaster apologized for the back and forth and gave the impression—but didn’t say outright—that the new login page is here to stay.

So rewrote the checkcards.py script again. This time, I used the Python mechanize library the way it was meant to be used, as a way of creating automated browser sessions. The earlier versions of the script used it simply as a cookie handler. An added benefit of this change is that it should still work even if the library goes back to its earlier login page. Here’s the code:

The primary changes from last time are the addition of Lines 53–60. This section of the code sets up a “browser” that enters the library card number and pin into the appropriate form fields and clicks the Submit button. From that point on, the script parses the loan and hold information via BeautifulSoup and sorts and formats it as before. My first post on this script explains in detail how the output is piped to sendmail and how I get it to run automatically every morning.

As I said in my first post about this script, the only people who can use this script directly are those who use my library. But the general outline of the script should be useful to anyone who wants to make his own library tracker.

Update 3/16/09
I added the encode('utf8') parts to Lines 169 and 170 to handle accented characters in the data from the library. I didn’t bother to encode the other printed strings because I’m in control of their contents and they don’t have any characters outside the ASCII range. Presumably, Python 3 will take care of such encoding automatically.