Working with Date-times and Time Zones in R

The purpose of this blog is to work with dates in R which have times and time zones. All variables of class Date which are imported into the AnalytixAgility platform, are represented by a date-time. An open dataset will be used in the course of this blog to provide examples.

Related blog posts

Data Source

The dataset used in this blog is the lakers data from R, within the package lubridate. It contains play by play statistics of each LA Lakers basketball game in the 2008-2009 season, including the date and time on the clock at which play was made.

Learning Outcomes

This blog will introduce users to working with date-times and time zones in R, both with in-built functions plus functionality provided via the lubridate package. It will introduce:

Objects of classes POSIX (to include POSIXct and POSIXlt)

Extracting date and time components from POSIX objects

Creating POSIXct objects using the ISOdate() function

Time zones

Workflow

Step 1- Reading in the data

The data used for this blog is from the R package lubridate. The code chunk below shows how to read this data into R. This automatically assigns the data to a variable lakers.

library(lubridate)
data(lakers)

The first few rows of this data can be viewed using the head() function:

Step 2 – Data Manipulation

For the purposes of the examples shown in this blog, I am going to go through a number of data manipulation steps, just so that we are only focussing on a very small subset of the lakers data, with only the date and time fields we are interested in. Firstly, a lot of the dates in this data are duplicates and I want the dates in my sample to all be different. So in the code chunk below, I will create a subset in which I am only including rows which have a unique date (or date which is not duplicated). This will be assigned to the variable lakers_unique_dates. See duplicated for more information.

lakers_unique_dates <- lakers[!duplicated(lakers$date), ]

The code chunk below takes a subset of lakers_unique_dates which only includes the date and time fields- this is all that we need for the purposes of this blog. We will then only consider the first 5 rows of this subset for the examples in the blog.

The next code chunk will convert the integers in the date field to character strings before converting to date format. See Working with Dates in R. When using the as.Date() function, we must specify the format that the initial input is in. Here it is the format “%Y%m%d”, a character string in which there is no separation between the year, month and day components. A table of the different forms available can be found in Working with Dates in R.

The next code chunk will combine the date and time fields from the variable lakers_unique_dates_times_subset into one object. This will be assigned to the object dates_times, where the elements are character strings:

Step 3- Objects of class POSIX

Objects of class POSIX can be thought to be more accurate than objects of class Date. This is because time is stored to the nearest second, rather than to the nearest day. Any column of class Date that is imported into the RA, is actually represented by class POSIX (basically a date represented by a date-time) So in the RA an object of class Date will be represented by a POSIX object, with time components zero. There are two POSIX date-time classes, which have slight differences in the way in which they store elements:

Class POSIXct represents the (signed) number of seconds since the beginning of 1970 as a numeric vector. Time zone is included, default of “UTC” (Coordinated Universal time zone) or “GMT” (Greenwich Mean Time) is given if a time zone is not specified.

Class POSIXlt is a named list of vectors representing year, month, day, hours, minutes, seconds. A time zone is not output if it was not specified. Time zones will be covered in Step 6.

To convert to class POSIXct, use the as.POSIXct() function. The code chunk below shows the application of this function to the object dates_times.

The difference between the storage of elements in class POSIXct and POSIXlt can be seen from using the unclass() function in the code chunk below, see unclass. It can be seen that an object of class POSIXct is stored as a numeric vector, while an object of class POSIXlt is stored as a list.

Step 4- Extracting Components from Date-times

There are a couple of useful built-in functions in R which can be used to extract components from date-times.

The strptime() function converts character strings to class POSIXlt and then allows you to extract the required component. In the code chunk below, we begin with the object type_posixct, which is of class POSIXct. Applying the strptime() function to this, we see that it is converted to class POSIXlt. It should be noted that the letters “Y, m, d, H, M, S” represent years, months, days, hours, minutes and seconds respectively.

For the other date components of month and day of month, use the syntax “mon” and “mday” respectively.

To extract the hours component, use the following syntax:

strptime(type_posixct, format = "%Y-%m-%d %H:%M:%S")$hour

## [1] 12 12 12 12 12

For the other time components of minutes and seconds, use the syntax “min” and “sec” respectively.

The strftime() function converts POSIX objects to character vectors. The code chunk below shows how to extract the date component from the object type_posixct. We can see that the new object is of class character.

strftime(type_posixct, format = "%Y%m%d")

## [1] "20081028" "20081029" "20081101" "20081105" "20081109"

class(strftime(type_posixct, format = "%Y%m%d"))

## [1] "character"

The next code chunk shows how to extract the time component from the object type_posixct:

strftime(type_posixct, format = "%H:%M:%S")

## [1] "12:00:00" "12:00:00" "12:00:00" "12:00:00" "12:00:00"

The following code chunks will generate a set of objects that we will go on to use in Step 5. To extract the year components from type_posixct:

years <- strftime(type_posixct, format = "%Y")
years

## [1] "2008" "2008" "2008" "2008" "2008"

To extract the month components from type_posixct:

months <- strftime(type_posixct, format = "%m")
months

## [1] "10" "10" "11" "11" "11"

To extract the day components from type_posixct:

days <- strftime(type_posixct, format = "%d")
days

## [1] "28" "29" "01" "05" "09"

To extract the hours components from type_posixct:

hours <- strftime(type_posixct, format = "%H")
hours

## [1] "12" "12" "12" "12" "12"

To extract the minutes components from type_posixct:

mins <- strftime(type_posixct, format = "%M")
mins

## [1] "00" "00" "00" "00" "00"

To extract the seconds components from type_posixct:

secs <- strftime(type_posixct, format = "%S")
secs

## [1] "00" "00" "00" "00" "00"

The package lubridate can also be used for extracting components, but you do not need to specify a format, as above. This package was loaded in Step 1, so we do not need to load it again here. It provides the following extraction functions, which can be applied to a single date or vector of dates: year(), week(), etc. For month and weekday there is the added functionality that you can specify if you want the full or abbreviated names. To find the abbreaviated months:

Step 5- Creating POSIXct dates using the ISOdate() function

The ISOdate() function can be used to convert components of a date-time into a POSIXct object. It will take components in order of year, month, day, hour, minute, second. Trying to convert an invalid date-time will result in NA. It can be used to convert a single set of date-time components or alternatively an entire vector of date-time components. We will firstly use it for a single set of components, the first element of the object type_posixct:

type_posixct[1]

## [1] "2008-10-28 12:00:00 GMT"

ISOdate(2008, 10, 28, 12, 0, 0)

## [1] "2008-10-28 12:00:00 GMT"

As an alternative approach we can use the vectors of years, months, days, etc that were generated in Step 4.

When converting an object to class POSIXct, the default time zone is given as UTC (Coordinated Universal time zone) or GMT (Greenwich Mean Time) Bascially, the clock time is given as it appears in UTC.

To convert to a different time zone, use the format() function. To convert the object type_posixct to US time:

force_tz() changes the actual instant of the time zone but the display clock remains the same. In the example below we can see that the times have remained the same, but we are alerted to the difference by the change in time zone.

Care must be taken when working with time zones as daylight savings time may need to be considered. This is when in certain parts of the world the clocks are put back an hour in the Autumn and put forward an hour in the Spring. To avoid problems with this, it is best to use UTC, which does not adopt daylight savings hours.

What’s next?

This post has covered the basics of working with date-times and timezones. Subsequent posts will look in more detail at formatting dates and date-times.