I have a dataset which contains many (hundreds) of short (3-30 obs) timeseries of different lengths.
Each series is currently represented as a number indicating how long it took to the next event.

To give you an idea:

0, 30, 34, 39, 45, 47, 51, 51
0, 5, 5, 7, 40

I'm sure that the values do not depend on time of day/week/month/... and that the timeseries do not influence each other. I do expect that the previous times between events have predictive value for how long it will take to see the next event.

The source of my data are many (those hundreds) of devices being monitored and all these devices are identical.

What i'm trying to figure out is how to build a single model to make predictions on how long it will take for the next event to happen in a device.

My problem is that there are models enough when having one time series or even multiple time series if they are aligned in time (also not the case with my data).

So what model to use that allows leveraging as much of the data as possible?
If necessary, i could limit my analysis to a subset of the data having equal amounts of data points in their time series.

1 Answer
1

The data that you describe are not time series data of the kind that are analysed by time series models (e.g. ARIMA models).

The good thing is that you don't need to limit the amount of data or force your data to resemble a time series. I would recommend you to review some references about survival analysis. There you will find models that are intended for count data (data that consist of integer values). Those models can be used to study for example the duration of strikes, the transitions between biological states, or the time elapsed between some events as it is the case in your data.

In the R software there are many packages implementing these methods. The function glm and the package survreg can be used to apply some of these models. This reference can be helpful as an introduction.