Off-topic, but when you find yourself writing lines like times.append(datetime.datetime.fromtimestamp(time.mktime(time.strptime(timestmp‌​[i], time_format)))), it's a good sign that something should be changed.
–
Glenn MaynardMar 7 '11 at 19:14

5 Answers
5

First, you should run your sample script with Python's built-in profiler to see where the problem actually might be. You can do this from the command-line:

python -m cProfile myscript.py

Secondly, what jumps at me at least, why is that loop at the bottom necessary? Is there a technical reason that it can't be done while reading mydata.txt in the loop you have above the instantiation of the numpy arrays?

Thirdly, you should create the datetime objects directly, as it also supports strptime. You don't need to create a time stamp, make the time, and just make a datetime from a timestamp.
Your loop at the bottom can just be re-written like this:

Thanks -this is good advice. After profiling, I've realized that the bottleneck isn't reading from the file (as I had incorrectly assumed) but in fact the datetime type conversion.
–
Pete WMar 7 '11 at 19:31

You're right that the separate loop at the bottom for the datetime conversion is not necessary. I've now included it directly within the main loop at the top, however it didn't seem to have much impact in terms of timings.
–
Pete WMar 7 '11 at 19:52

Re: use of time.mktime and time.strptime instead of datetime.datetime.strptime directly -I think the reason was that originally I had to run this on python2.4 which didn't support datetime.datetime.strptime. But you're right, on 2.5 or later your method is much neater.
–
Pete WMar 7 '11 at 19:59

Thanks for all the responses. The bottleneck turned out to be the datetime conversion. In the end, I managed to get a dramatic speedup since most of the datetimes were repeated (e.g. several thousand with equal value), so I only did a type conversion on the numpy.unique(times) vastly reducing the number of conversion required. The key was really in profiling my code properly (which I should have done in the first place....I live and learn.)
–
Pete WMar 7 '11 at 21:40

But more probable I think is that you have a lot of data conversions. Especially the last loop for time conversion will take a long time if you have millions of conversions! If you succeed in doing it all in one step (read+convert), plus taking Terseus advice on not copying the arrays to numpy dittos, you will reduce execution times.