Today we are introducing tibbletime v0.0.2, and we’ve got a ton of new features in store for you. We have functions for converting to flexible time periods with the ~period formula~ and making/calculating custom rolling functions with rollify() (plus a bunch more new functionality!). We’ll take the new functionality for a spin with some weather data (from the weatherData package). However, the new tools make tibbletime useful in a number of broad applications such as forecasting, financial analysis, business analysis and more! We truly view tibbletime as the next phase of time series analysis in the tidyverse. If you like what we do, please connect with us on social media to stay up on the latest Business Science news, events and information!

Introduction

We are excited to announce the release of tibbletime v0.0.2 on CRAN. Loads of new
functionality have been added, including:

Generic period support: Perform time-based calculations by a number
of supported periods using a new ~period formula~.

Creating series: Use create_series() to quickly create a tbl_time
object initialized with a regular time series.

Rolling calculations: Turn any function into a rolling version of itself withrollify().

A number of smaller tweaks and helper functions to make life easier.

As we further develop tibbletime, it is becoming clearer that the package
is a tool that should be used in addition to the rest of the tidyverse.The combination of the two makes time series analysis in the tidyverse much easier to do!

In this post

Today we will take a look at weather data for New York and San
Francisco from 2013. It will be an exploratory analysis
to show off some of the new features in tibbletime, but the package
itself has much broader application. As we will see, tibbletime’s time-based
functionality can be a valuable data manipulation tool to help with:

Product and sales forecasting

Financial analysis with custom rolling functions

Grouping data into time buckets to analyze change over time, which is great for any part of an organization including sales, marketing, manufacturing, and HR!

Data and packages

The datasets used are from a neat package called weatherData. While weatherData has functionality to pull weather data for a number of cities, we will use the built-in datasets. We encourage you to explore the weatherData API if you’re interested in collecting weather data.

To get started, load the following packages:

tibbletime: Time-aware tibbles for the tidyverse

tidyverse: Loads packages including dplyr, tidyr, purrr, and ggplot

corrr: Tidy correlations

weatherData: Slick package for getting weather data

Also, load the datasets from weatherData, “NewYork2013” and “SFO2013”.

Combine and convert

To tidy up, we first join our data sets together using bind_rows(). Passing
a named list of tibbles along with specifying the .id argument allowsbind_rows() to create a new City reference column for us.

Period conversion

The first new idea to introduce is the ~period formula~. This tells the tibbletime functions how you want to time-group your data. It is specified
as multiple ~ period, with examples being 1~d for “every 1 day,” and4~m for “every 4 months.”

# Changing to 1 row every 2 days.
# The minimum date per interval is selected by default
as_period(weather,2~d)

In our original data, it looks like weather is an hourly dataset, with each new
data point coming in on the 51st minute of the hour for NYC and the 56th minute
for SFO. Unfortunately, a number of points don’t follow this. Check out the following rows:

Now that we have our data in an hourly format, we probably don’t care about
which minute it came in on. We can floor the date column using a helper function,time_floor(). Credit to Hadley Wickham because this is essentially a convenient
wrapper around lubridate::floor_date(). Setting the period to 1~h floors
each row to the beginning of the last hour.

# Time floor: Shift timestamps to a time-based floor
weathertime_floor(weather,1~h)weather

A closer look at July

July seemed to be one of the hottest months for NYC, let’s take a closer look at it.

To just grab July dates, use time_filter(). If you haven’t seen this before, a time formula is used to specify the dates to filter for. The one-sided formula below expands to include dates between, 2013-07-01 00:00:00 ~ 2013-07-31 23:59:59.

To visualize July’s weather, we will make a boxplot of the temperatures.
Specifically, we will slice July into intervals of 2 days, and create a series
of boxplots based on the data inside those intervals. To do this, we will
use time_collapse(), which collapses a column of dates into a column of the same
lenth, but where every row in a time interval shares the same date. You can use this resulting
column for further grouping or labeling operations.

# Every row where the date falls between
# (2013-07-01 00:00:00, 2013-07-02 23:59:59)
# shares the same date, and so on for the entire series
july_collapsedjuly%>%time_collapse(2~d)july_collapsed

Next, let’s look at monthly correlations. The general idea will be
to nest each month into it’s own data frame, apply correlate() to each
nested data frame, and then unnest the results. We will use time_nest() to easily perform the monthly nesting.

monthly_nestweather%>%spread(key=City,value=Temperature)%>%# Nest by month, don't add the original dates to the inner tibbles
time_nest(1~m,keep_inner_dates=FALSE)monthly_nest

It seems that summer and fall months tend to have higher correlation than colder months.

And finally we will calculate the rolling correlation of NYC and SFO temperatures. The “width” of our roll will be monthly, or 360 hours since we are in hourly format.

There are a number of ways to do this, but for this example
we introduce rollify(), which takes any function that you give it and creates a rolling version of it. The first argument to rollify() is the function that you want to modify, and it is passed to rollify() in the same way that you would pass a function to purrr::map(). The second argument is the window size. Call the rolling function just as you would call a non-rolling version
of cor() from inside mutate().

It looks like the correlation is definitely not stable throughout the year,
so that initial correlation value of .65 definitely has to be taken
with a grain of salt!

Rolling Functions: Pros/Cons and Recommendations

There are a number of ways to do rolling functions, and we recommend based on your needs. If you are interested in:

Flexibility: Use rollify(). You can literally turn any function into a “tidy” rolling function. Think everything from rolling statistics to rolling regressions. Whatever you can dream up, it can do. The speed is fast, but not quite as fast as other Rcpp based libraries.

Performance: Use the roll package, which uses RcppParallel as its backend making it the fastest option available. The only downside is flexibility since you cannot create custom rolling functions and are bound to those available.

Wrapping up

We’ve touched on a few of the new features in tibbletime v0.0.2. Notably:

rollify() for rolling functions

as_period() with generic periods

time_collapse() for collapsing date columns

A full change log can be found in the NEWS file on Github or CRAN.

We are always open to new ideas and encourage you to submit an issue on our
Github repo here.

Have fun with tibbletime!

Final thoughts

Mind you this is only v0.0.2. We have a lot of work to do, but we couldn’t
wait any longer to share this. Feel free to kick the tires on tibbletime, and let us know your thoughts. Please submit any comments, issues or bug reports to us on GitHub here. Enjoy!

About Business Science

Business Science takes the headache out of data science. We specialize in applying machine learning and data science in business applications. We help businesses that seek to build out this capability but may not have the resources currently to implement predictive analytics. Business Science works with clients as diverse as startups to Fortune 500 and seeks to guide organizations in expanding predictive analytics while executing on ROI generating projects. Visit the Business Science website or contact us to learn more!

Connect with Business Science

Connect, communicate and collaborate with us! The easiest way to do so is via social media. Connect with us out on: