The way you’d use this algorithm normally would be to instantiate the class with the required parameters, and then call the run method:

alg = Algorithm("test.txt", 0.67, 20)
alg.run()

That’s fine for using interactively from a Python console, or for writing nice scripts to automatically vary parameters (eg. trying for all thresholds from 0.1 to 1.0 in steps of 0.1), but sometimes it’d be nice to be able to run the algorithm from a file with the right parameters in it. This’d be particularly useful for users who aren’t so experienced with Python, but it can also help with reproducibility: having a parameter file stored in the same folder as your outputs, allowing you to easily rerun the processing.

For I while I’ve been trying to work out how to easily implement a way of using parameter files and the standard way of calling the class (as in the example above), without lots of repetition of code – and I think I’ve found a way to do it that works fairly well. I’ve added an extra function to the class which writes out a parameter file:

This function is generic enough to be used with almost any class: it simply writes out the contents of all variables stored in the class. The only bit that’ll need modifying is the bit that excludes certain variables (in this case filenames, m and c, which are not parameters but internal attributes used in the class – in an updated version of this I’ll change these parameters to start with an _, and then they’ll be really easy to filter out).

The key thing is that – through the use of the repr() function – the parameter file is valid Python code, and if you run it then it will just set a load of variables corresponding to the parameters. In fact, the code to write out the parameters could be even simpler – just using repr() for every parameter, but to make the parameter file a bit nicer to look at, I decided to print out floats and ints separately with sensible formatting (two decimal places is the right accuracy for the parameters in the particular algorithm I was using – yours may differ). One of the other benefits of using configuration files that are valid Python code is that you can use any Python you want in there – string interpolation or even loops – plus you can put in comments. The disadvantage is that it’s not a particularly secure way of dealing with parameter files, but for scientific algorithms this isn’t normally a major problem.

The result of writing the parameter file as valid Python code is that it is very simple to read it in:

params = {}
execfile(filename, params)

This creates an empty dictionary, then executes the file and places all of the variables into a dictionary, giving us exactly what we’d want: a dictionary of all of our parameters. Because they’re written out from the class instance itself, any issues with default values will already have been dealt with, and the values written out will be the exact values used. Now we’ve got this dictionary, we can simply use ** to expand it to parameters for the __init__ function, and we’ve got a function that will read parameter files and create the object for us:

So, if we put all of this together we get code which automatically writes out a parameter file when a class is instantiated, and a class method that can instantiate a class from a parameter file. Here’s the final code, followed by an example of usage:

I think the main benefit from my point of view is that the resulting parameter file is human-readable and human-writeable. Therefore someone who doesn’t know Python (eg. my supervisor) can write a parameter file, give it to me, and run the algorithm. Similarly, anyone who can open a text file can read the parameters, and you can easily put comments in the file too.