This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Identifying discontinuities (or change-points) in otherwise stationary time series is a powerful analytic tool. This paper outlines a general strategy for identifying an unknown number of change-points using elementary principles of Bayesian statistics. Using a strategy of binary partitioning by marginal likelihood, a time series is recursively subdivided on the basis of whether adding divisions (and thus increasing model complexity) yields a justified improvement in the marginal model likelihood. When this approach is combined with the use of conjugate priors, it yields the Conjugate Partitioned Recursion (CPR) algorithm, which identifies change-points without computationally intensive numerical integration. Using the CPR algorithm, methods are described for specifying change-point models drawn from a host of familiar distributions, both discrete (binomial, geometric, Poisson) and continuous (exponential, Gaussian, uniform, and multiple linear regression), as well as multivariate distribution (multinomial, multivariate normal, and multivariate linear regression). Methods by which the CPR algorithm could be extended or modified are discussed, and several detailed applications to data published in psychology and biomedical engineering are described.

Author Comment

See also the supplemental material, for further analytic examples, a sensitivity analysis of the described algorithm, implementation of two operational modifications, further mathematical support, and a Matlab package that includes all data presented in the manuscript and implements the algorithm as a function.

This version includes a number of small changes throughout the manuscript.

Supplemental Information

Reaction times from a single subject learning a psychophysical task, originally reported by Palmeri (1997). The dashed line corresponds to a four-parameter “learning curve,” reported by Heathcote et al. (2000), while the solid lines interpret the same data as approximately linear, with two change-points.

Prior hyperparameters α0 and α1 (left) and posterior hyperparameters α0′ and α1′ for the beta distribution given three different prior distributions and the same empirical observations. See Equation 2 and the Supplement for details.

Maximum Likelihood vs. Marginal Model Likelihood

Likelihood as a function of the probability θ for the binomial distribution, given 20 observations, consisting of 10 success and 10 failures. Darker areas indicate higher relative likelihoods. (Left Panel) When a single parameter is used to describe the data, the maximum likelihood estimator is θhat = 10/20 = 0.5. (Center Panel) The first ten observations and the second ten observations are examined independently, and assigned their own MLE parameters θhat_1 = 4/10 = 0.4 and θhat_2 = 6/10 = 0.6. This increase in model complexity improves the maximum likelihood, but also results in a smaller (and therefore less favorable) marginal model likelihood than that observed for the one-parameter model. (Right Panel) If splitting the observations into two groups instead reveals that θhat_1 = 2/10 = 0.2 and θhat_2 = 8/10 = 0.8, this results in greater maximum and marginal model likelihoods. All estimations in this figure assume the prior hyperparameters α0 = α1 = 1. On the basis of these results, choosing a two-parameter model is justified in the scenario depicted in the right panel, but not for the center panel, despite both having higher maximum likelihoods than the one-parameter model in the left panel.

Uniform vs. Non-Uniform Event Times

An example of uniform vs. non-uniform segmentation. (Left) In the uniform case, each segment of time is an equally strong candidate for a putative change-point. In this example, the nine segments each have a common weight of 1/9 . (Right) In the non-uniform case, a change-point is presumed to be located somewhere between t(1) and t(10), and if any point along that interval is as likely as any other, then longer segments are correspondingly more likely to contain change-points.

Values of log(k(c)·T(c) given 100 binary observations [0,1,0,1,··· ,0,1] both uncorrected (left) and corrected (right). The dashed line represents the boundary between evidence favoring a change at time (positive) vs. evidence against (negative) at event c.

Applying the CPR Algorithm To A String of Binary Data

The CPR algorithm applied to a binary string. (Top) The individual Bayes factors k(c) (here presented on a log scale) are calculated for each point, and there is a clear peak when c = 8. (Bottom) The algorithm is then recursively applied to the two segment halves, with weak support found when c = 34. Because this support is too weak to satisfy the decision criterion, it is not selected. The dashed line represents the boundary between evidence favoring a change at time (positive) vs. evidence against (negative) at event c; however, note that a change-point is only added to the model if the posterior odds p1′/p0' for a segment favors a change, which is not the case for either segment in the bottom panel.

Initial Learning For 25 Simultaneous Chains

Plot of item acquisition in 25 different lists in Jensen et al. (2013) , learned using the Simultaneous Chain procedure. (Left) Estimated time of acquisition for each list item is indicated both by elevation and by shade of gray. (Right) Histogram showing frequency of inter-acquisition periods, in trials, for each item, as well as the number of trials overall needed to acquire all list items. Bars do not sum to 100%, because acquisition did not occur in all lists.

Reaction Times As A Function Of Trials And Task Difficulty

Reaction times as a function of trials and task difficulty in Palmeri (1997). (Top) Each of the 6132 reaction times, color-coded according to their speed and positioned with respect to trial and stimulus numerosity. (Bottom) The model fit resulting from a multiple regression, subdivided according to the CPR algorithm.

Stimulus-Specific Reaction Times as a Function of Trials

Reaction times as a function of trials in Palmeri (1997) for specific stimuli in a single participant. Lines represent Bayesian regression estimates whose subdivisions were determined using to the change-point algorithm. (Left) 6-dot stimulus learning for most rapidly acquired stimulus (blue) and the least rapid (red). (Right) 11-dot stimulus learning for most rapidly acquired stimulus (blue) and the least rapid (red).

3D Position Data, Assessed With Respect To A Multivariate Normal Distribution

3D position data collected by Kaluža et al. (2010). Individual points are multivariate, such that an observation an x-coordinate (in the top panel), a y-coordinate (in the center panel), and a z-coordinate (in the bottom panel). Change-points were identified using the CPR algorithm assuming a multivariate normal distribution.

3D Movement Data Visualized With Respect To Time

3D depiction of the a subset of the movement data from Kaluža et al. (2010) and its associated change-point model. Individual points represent discrete observations and are color-coded continuously with respect to time, as noted on the color bar. The points on the color bar denote the times at which change-points were detected by the CPR algorithm. Ellipses represent the means and covariance associated with particular segments of data, estimated post-hoc using the robust method described by Campbell (1980). The thin colored line, which is drawn along the horizontal plane, represents a moving average of 50 points.

Detailed View of 3D Movement Data, Given a Multivariate Normal Model

Detailed plot of the data presented in Figure 11, represented in terms of recorded sensor values alone each axis. Gray zones are identified by Kaluža et al. as ‘transitional periods’ (such as falling or sitting down), while dashed lines indicated change-points detected by the CPR algorithm, given a multivariate normal model.

Supplemental Material

A variety of non-psychological datasets are analyzed to demonstrate broad applications of the Conjugate Partitioned Recursion (CPR) algorithm. A method for performing a sensitivity analysis is demonstrated. A weakness in the algorithm is identified in cases of large-scale stationary distributions, and two modifications to the standard procedure are proposed to circumvent that weakness: The 'dicing' operation (whose function is strictly exploratory) and the 'forward-retrospective' algorithm, which makes assessments sequentially rather than recursively. Finally, the mathematical basis for the conjugate priors invoked in the main article and the formulas for empirical Bayes 'rule-of-thumb' priors are provided.

Supplemental: British Coal-Mining Disaster Analysis

British coal-mining disasters between 1851 and 1962, represented as days between disasters, which are approximately exponentially distributed (left) and number of disasters per year, which are approximately Poisson distributed (right). In both cases, the CPR algorithm, using the appropriate rule-of-thumb empirical prior, finds a single change-point. Each plot shows its distribution’s probability function as a color map, based on the posterior parameter estimates.

Supplemental: US Treasury Bill Data

Nominal rates on three-month U.S. Treasury bills, calculated monthly from June 1947 to February 2013. (Top) Nominal rates, with red lines displaying the model inferred by a linear regression change-point analysis. (Bottom) First differences of nominal rates (xi − xi−1), with the means (red) and one standard deviation (light red) from a Gaussian change-point analysis.

Supplemental: Sensitivity Analysis with Respect to τ

Sensitivity analysis performed on two datasets from the main body text and three from the supplement. The decision criterion τ was varied between 1 and 100, and the SBIC was calculated for each resulting model, using Equation 3. Relative SBIC among the models is plotted in red (with the best model set at zero), and the number of data segments is plotted in blue.

Simulated data generated using a chaotic distribution of Gaussian parameters derived from Equation S4. (Left) The full dataset D, consisting of 5050 simulated observations that were generated in blocks of 50. The default CPR algorithm will not identify most of these change-points. (Right) A subset of D, with the true change-points denoted by dashed gray lines. The CPR algorithm will reliably identify these change-points if it is run only on this subset, despite failing to detect them when run on the complete dataset.

Supplemental: Simulation Data Analyzed Using the 'Dicing' Operation and the Forward-Retrospective Strategy

Change-points identified in simulated data using the alternative strategies laid out for dealing with stationary data. (Top) Trials for which change- points were identified, as a function of the number of subdivisions examined by the dicing operation. (Center) A histogram of how often, across 30 values for the dicing parameter, a precise change-point was identified. (Bottom) Trials for which change-points were identified using the forward-retrospective algorithm.

Matlab Implementation of the CPR Algorithm

Feature-rich Matlab function implementing the CPR algorithm for the ten conjugate priors described in the Supplement. The forward-retrospective variant described in the Supplement is implemented as a second function.

Additional Information

Competing Interests

Author Contributions

Greg Jensen conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, data visualization.

Grant Disclosures

The following grant information was disclosed by the authors:

NIMH grant (5-R01-MH068073-10) awarded to Peter Balsam.

Funding

This work was supported by an NIMH grant (5-R01-MH068073-10) awarded to Peter Balsam. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Follow this preprint for updates

"Following" is like subscribing to any updates related to a preprint.
These updates will appear in your home dashboard each time you visit PeerJ.

You can also choose to receive updates via daily or weekly email digests.
If you are following multiple preprints then we will send you
no more than one email per day or week based on your preferences.

Note: You are now also subscribed to the subject areas of this preprint
and will receive updates in the daily or weekly email digests if turned on.
You can add specific subject areas through your profile settings.