Dimanche, 15 septembre 2013

I have a list of files in a text file, and I want to load this list into some
kind of data structure. The list is quite long, and requires to instantiate
100,000 objects in Python, all of the same type. I found out that depending on
what kind of object is used, the time it takes to instantiate all these can
vary greatly. Essentially, each line of the file is composed of tab-separated
fields, which are split into a list with Python's str.split() method. The
question therefore is: what should I do with that list?

The object must hold a few values, so basically a list or a tuple would be enough.
However, I need to perform various operations on those values, so additional
methods would be handy and justify the use of a more complex object.

The Contenders

These are the objects I compared:

A simple list, as returned by str.split(). It is not very handy, but will
serve as a reference.

A simple tuple, no more handy than the list, but it may exhibit better
performance (or not).

The Benchmark

Each class is instantiated 100,000 times in a loop, with the same, constant
input data: ["a", "b", "c"]; the newly created object is then appended to a
list. This process it timed by calling time.clock() before and after it and
retaining the difference between the two values. The time.clock() method has
quite a poor resolution, but is immune to the process being set to sleep by
the operating systems's scheduler.

This is then repeated 10 times, and the smallest of these 10 values is
retained as the performance of the process.

The Results

The results from the benchmark are shown relatively the speed of using a
simple list. As expected, the use of a simple list is the fastest, since
it requires not additional object instantiation. Below are the results:

1.000 list

2.455 tuple

3.273 Tuple

3.455 List

4.636 Slots

5.818 NamedTuple

6.364 OldClass

6.455 Class

6.909 TupleCustomInit

7.091 TupleCustomInitTuple

7.545 ListCustomInit

7.818 ListCustomInitList

Conclusion

One can draw several conclusions from this experiment:

Not instantiating anything is much faster, even instantiating a simple tuple
out of the original list increases the run time by 150%