Table of Contents

Multivariate Linear Regression

Here I will fit a simple linear regression model over time series data.
The challange here is to transform data beforehand in order to better meet linearity assumptions.

Little discussion of model evaluation follows.

# Load libraries# creating a vector to be called# in the lapply function# silent = T will suppress the messages
x<-c('tidyr','ggplot2','lubridate','stringr','dplyr')try(lapply(x, require, character.only= TRUE), silent = T)# Load dataset
df <- read.csv('/home/vincenzograsso/Desktop/forecast-task/dataset.csv')# r options for plotting in jupyter notebook - no need to run it on Rstudio or similar
options(repr.plot.width=5,repr.plot.height=3)

Outliers or Missing Data: who can tell

Very weird data here. Some values are distorted: what to do?
I would initially take the log - but data points are even zero (log not defined).

I should try a different trasformation (… power transformation, still) that reshapes the data in a nice way, artificially gets rid of the zeros.
I could try a function like the pseudologarithm, but it maps 0 → 0 so nothing special.

Let's try to add some other covariates - interaction terms between time bins and day. To evaluate the model we could check the R squared and RMSE. The R squared is not a very good measure since one can always artificially inflate it (R squared is how much of the variance is explained) by adding extra covariates. We can rely on the RMSE which makes a good general purpose error metric for numerical predictions. In this case, we can simply print the metric. In the second model, the error is in fact smaller (0.9463 vs. 0.9433)

Generate predictions

# This is a dirty way to create prediction. I created a csv containing days and timebins # for the week to forecast and I readjusted the type of variables (dummies to factor, day to POSIXct)# in order to use the predict() command using the model_2 object.
forecast_df <- read.csv('~/forecast_df.csv')# readjust the types
forecast_df$holiday_dummies <- as.factor(forecast_df$holiday_dummies)
forecast_df$day <- as.POSIXct(forecast_df$day)
forecast_df$forecast <- round(log_const(predict(model_2, newdata = forecast_df), undo=TRUE),0)