We typically log our data in UTC to make things easy, but now and then I get a data set from somewhere else that is NOT UTC. Now that I am being forced to think ever so slightly I am quickly becoming confused about what exactly is going on in my time conversions. Typically, when I convert from string to seconds I do this:

I get the current time corrected to UTC. I am hoping this only occurs because my local time zone is stored in the time.time() call, but I am unsure. Every time I think I understand what is going on I re-confuse myself. Any clarification on this will be appreciated.

time.time() returns the current seconds since Epoc. It's just a float and has no time information attached to that. time.gmtime creates a time struct with time zone set to UTC. You probably want to be using time.localtime() -- it works the same way as gmtime but returns the time for the current time zone.

The only sane way to work with multiple time zones is to convert everything to UTC as soon as you get it and then use UTC throughout your application, converting to the local time zone just before displaying/printing it. So basically don't use time.time() as that gives you seconds since epoch in your current time zone. Use datetime.datetime.utcnow() instead -- that will give you the current UTC time. If you're getting dates/times for other timezones and timezone info isn't specified in the time format, you'll need to add that when you're creating the object. stdlib support for that isn't great, but pytz fixes that.

Last edited by setrofim on Thu May 09, 2013 9:07 pm, edited 1 time in total.

Well gmtime creates a UTC time from a timestamp and timegm returns a timestamp for a UTC time; they're basically inverse operations. So if you pass a timestamp to gmtime and then pass the result to timegm, you're going to get the same timestamp back. However, since you're not in UTC, the intermediate time object won't be correct -- it will be off by your offset from UTC.

All of this behind-the-scenes time zone shifting is frustrating and confusing. It would make more sense to just convert time to seconds, datetime, etc. without trying to "fix" the locale/UTC offset. While this may work (timegm and gmtime wash each other out), I really dislike this. Is there any method of converting my time strings to seconds since epoch without specifying time zone AND without having the time "changed" to UTC (my time stamps are stored as UTC, so no change required!).

I was looking into pytz yesterday. It may be something I end up implementing. Thanks for the tip.

tnknepp wrote:All of this behind-the-scenes time zone shifting is frustrating and confusing. It would make more sense to just convert time to seconds, datetime, etc. without trying to "fix" the locale/UTC offset.

There is no actual "conversion" taking place. A timestamp does not have time zone information -- it's literaly just a float that is the number of seconds since Epoc. gm functions just assume UTC, becuase that's what they are for. If you don't want UTC, don't use gm functions. If you want your local time zone, you can use localtime function instead; for other time zones, you'll have to attach time zone information manually.

The easiest way to work with dates in various timezones is to parse the dates/times into datetime objects, e.g. using datetime.strptime. If the time zone is not parsed as a part of that, the resulting object will be "naive", i.e. not aware of the time zone. You will then have to attach time zone information manually by providing it with an appropriate tzinfo object. That is what pytz is for. Once your datetime objects are time zone aware, they will behave correctly and you can use them around your program.

The problem is that you may eventually have to do something that loses the time zone information. E.g. you might have to pass the datetime into an API that expects a Unix time stamp; or you might have to persist it in a format that does not support time zone information. If that happens, you will have to track tz info separately and that will quickly become very confusing and error prone. That is why if you have to deal with multiple time zones, the best practice is to convert everything to UTC and always work with that internally in your program and convert into your local (or whatever) time zone at the last possible moment (this is similar to working with unicode and encodings).

I agree completely that everything should be UTC time stamped, which is what we do. All of our data are written to text files, so when I analyze data I am reading date/time strings. So my real concern is that what I am doing is not 100% right, and will bite me big time sometime down the road. What I typically do is this to convert my strings to seconds:

3. By calling localtime(), since Python thinks my string was UTC due to timegm(), I lose four hours to adjust to local time?

When I am so unfortunate as to receive data that is timestamped in another timezone I import it using timegm() as above, then apply the correction factor myself. It's kinda kludgey, but it happens so infrequently that I have never had desire to learn a proper way of handling timezone shifts. If what I am doing works how I think it should (i.e. you answer "yes" to points 1:3 above) then I'm happy.

Yes. The doc string for timegm states "Unrelated but handy function to calculate Unix timestamp from GMT." (GMT is the same thing as UTC). So it assumes UTC. Note also that time.strptime returns a struct_time that does not allow specifying timezone information so the fact that it's not in your string is irrelevant.

Yes. Basically, when time zone info is absent, the assumption is UTC.

Basically, you're using the wrong tools for the job.

When working with dates/times in Python, avoid using raw timestamps and use one of stdlib's classes (struct_time, date, datetime); unless, you really are intersted in the seconds and not the actual date/time, e.g. when timing something (or if you actually need a timestamp to pass to an API, in which case, convert at the last moment, i.e. as you're invoking the API).

When working with dates/times in different time zones use datetime (and if you value your sanity, pytz).

Thanks. That clears up a lot of confusion, and provided relief that I haven't completely hosed my code.

I don't NEED the seconds, actually I'd much prefer using datetime. Thus far I have used seconds to fit everything into one array or matrix when doing analysis. I'm using Python's numpy for my data analysis (trying to break from MatLab where I became accustomed to having dates represented by number values (in MatLab dates are referenced to days since BC/AD switch)), and I would like to have my data in the same array or matrix as my dates. Do you have an alternate recommendation for keeping dates in datetime format with my data?