Python and stuff

An Agile Time Keeping App

I saw the comedy theater I intern at could really benefit from a computerized way of tracking when interns show up. So, I decided to practice my agile programming and whip something up as quickly as possible. I considered using a web framework, but decided that Django would be overkill for such a simple project.

So, where to start? I need to keep track of users, who have passwords. I decided a dictionary was the way to go.

This is super simple, demonstrates the basic functionality I need, and is set up to incrementally expand functionality and test along the way. So where to go from here? I need a few things still. I need the passwords to not be stored in plain text. I need the users dictionary to be update-able and to persist after the program shuts down. I need the program to do something more useful when the user logs in successfully. I need an interface. And I need these things in pretty much this order. Why work out the interface before I have all the basics of what the interface will call? Why worry about logging times before I really have a way of creating, storing, and securely authenticating users?

Passwords

The secure way to store a passphrase is as a hash. A hash cannot be reversed*, so one needs to try a passphrase, hash it, and compare it to the stored hash to know if they have used the correct passphrase. I decided to use a simple MD5 hash, since it comes with python. So I wrote a hashing function

def saltyhash(password):
'''salt the password and hash it a bunch. Security through obscurity. No password cracking
programs are pre-configured to do this exact hash.'''
result = password+'_._'+password
for x in range(25):
result = md5(result).hexdigest()
return result

and now I always call saltyhash on a password before doing anything with it (storing or checking it)

Storing User Passwords

Now that I have passwords worth storing, time to figure out how to store them. My default method is by pickling the data, in this case a dictionary. I don’t know of a reason for using the pickle module over json, so I just went with json since it comes up in more situations. Their usage is practically identical.

Since there is nothing interesting about pickling data, writing it to a file, and retrieving it, I won’t bother discussing it further here. See more complete code later in this post for details.

So, instead of a user database, I have a pickled dictionary stored in a file. This makes perfect sense for how simple this project is and how little data I have to keep track of.

Logging times

Now, to do something useful with the successful login. This turn out to also be trivially simple. I want to log the user, current date/time, and some user input (any important notes). The agile thing to do is log this data into a tab delimited csv. This logs the data accurately in a human readable format and allows for the admins to manually edit and add information through tools they are familiar with. Maybe I’ll want to use a google doc in the future, but I’d rather get feedback from whom I am making this program for before getting into that. I decided the easiest format for the (potentially) technically naive admins was a separate csv for each user. Maybe use a naming scheme like “username_timesheet.csv”? Sounds good.

The time comes from datetime.datetime.now(), which should be formated with strftime. If these numbers ever need to be parsed out again, the same format that strftime used can be given to strptime to get a datetime object again. So, I’d define this format right up at the top.

User interface

Maybe I do want a web interface. Maybe I want something pretty on the desktop made with Tkinter or something. But maybe not. For now, I could leave it as command line program called from the bash shell, but I do need to sell this a little bit, and ‘./tnmtimesheet.py username’ is a bad interface for anyone but developers. I decided to take a middle road and learn something in the process. I used the cmd module, as it provides tab completion, help text and I can define default behavior.

With the bash command line as the interface, I know the theater folks will want a different interface without even asking. With the cmd interface, they may just want it tidied up a bit or they may want something graphical. I also used getpass, which is like raw_input but doesn’t echo what you type.

I did decide to get a little fancy with this since I am just seeing cmd for the first time. It is easy to make custom tab completion suggestions for arguments to a command. If I wrote a do_login(username) function, then I could also write a complete_login() function that gets called once after login is typed. So typing “login u[tab]” gives me “login username”. But I want users to just type their username without typing “login”, because that will be easier for people. So, instead of writing a do_login(username) I’ll use default(username). But completedefault(), despite my expectation, doesn’t get called for commands. It is only called for completing arguments for commands that do not have a complete_ function defined. Makes sense.

So I dug into source code for cmd.py and it looked like I needed to override cmd.completenames(). This gets called by complete after deciding what complete function to call.

This may be a bad idea, because I may have more commands in the future, and admins may want to be able to create_user, delete_user, get_user_stats, change_user_password, and they would want tab completion for those. But, maybe I’ll deal with this by making a second “admin” shell, and keep it separate from the “login” shell all together. Or maybe they’ll want a GUI. So, no use spending more time refining this until I get feedback.

One more step I need to make is to set up file permissions so that someone can’t just get the json file with the password hashes in it, or cheat and modify their timesheets manually. Again, I’ll make sure this program is close to what the theater admins want before testing that.

After some tweaking some details, the full code of this prototype is as follows.

*Password hashing is really interesting to me. My understanding is like this: For many mathematical operations, the forward operation is much easier than its inverse. Any 10th grader can square a decimal number exactly with pen and paper, but taking the square root is much more complicated. The best method I know of is Newton’s method, and it involves guessing and then adjusting the guess over and over. Plus, on taking the square root, you can’t be sure if the original number was positive or negative anyway. A good hashing algorithm is this to the extreme. It takes a bit for the computer to do it forwards, and is impossible to inverse, and wouldn’t give a unique answer even if you could. It is even impossible to use a method like Newton’s, where each guess informs the next as you zero in on the exact answer.