Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

I often am building a model (classification or regression) where I have some predictor variables that are sequences and I have been trying to find technique recommendations for summarizing them in the best way possible for inclusion as predictors in the model.

As a concrete example, say a model is being built to predict if a customer will leave the company in the next 90 days (anytime between t and t+90; thus a binary outcome). One of the predictors available is the level of the customers financial balance for periods t_0 to t-1. Maybe this represents monthly observations for the prior 12 months (i.e. 12 measurements).

I am looking for ways to construct features from this series. I use descriptives of each customers series such as the mean, high, low, std dev., fit a OLS regression to get the trend. Are their other methods of calculating features? Other measures of change or volatility?

ADD:

As mentioned in a response below, I also considered (but forgot to add here) using Dynamic Time Warping (DTW) and then hierarchical clustering on the resulting distance matrix - creating some number of clusters and then using the cluster membership as a feature. Scoring test data would likely have to follow a process where the DTW was done on new cases and the cluster centroids - matching the new data series to their closest centroids...

What you're trying to do here is reduce the dimensionality of your features. You can search for dimensionality reduction to get several options, but one very popular technique is principal components analysis (PCA). Principal components are not interpretable like the options you've mentioned, but they do a good job of summarizing all of the information.

Feature extraction is always a challenge and the less addressed topic in literature, since it's widely application dependant.

Some ideas you can try:

Raw data, measured day-by-day. That's kind of obvious with some implications and extra preprocessing (normalisation) in order to make timelines of different length comparable.

Higher moments: skewness, kurtosis, etc

Derivative(s): speed of evolution

Time span is not that large but maybe it is worth trying some time series analysis features like for example autocorrelation.

Some customised features like breaking timeline in weeks and measure the quantities you already measure in each week separately. Then a non-linear classifier would be able to combine e.g first-week features with last-week features in order to get insight of evolution in time.

$\begingroup$Nice suggestions! Can you flesh out the use of derivatives more?$\endgroup$
– B_MinerJun 24 '14 at 14:04

$\begingroup$I agree completely with your first statement. I would LOVE to see a box written which collected case studies on feature engineering / extraction. The adage is that feature creation is much more important than the latest greatest algorithm in predictive model performance.$\endgroup$
– B_MinerJun 24 '14 at 14:07

At first glance, you need to extract features from your time series (x - 12) - x. One possible approach is to compute summary metrics: average, dispersion, etc. But doing so, you will loose all time-series related information. But data, extracted from curve shape may be quite useful. I recommend you to look through this article, where authors propose algorithm for time series clustering. Hope, it will be useful. Additionally to such clustering you can add summary statistics to your feature list.

$\begingroup$Thanks for the link. I had also considered using DTW and hierachical clustering. I have experimented with the R package for DWT. jstatsoft.org/v31/i07/paper$\endgroup$
– B_MinerJun 24 '14 at 13:58

1

$\begingroup$I considered specifically creating n clusters and using the clustering membership as a feature.$\endgroup$
– B_MinerJun 24 '14 at 13:59