Fitting multiple change-point models to data

https://doi.org/10.1016/S0167-9473(00)00068-2Get rights and content

Abstract

Change-point problems arise when different subsequences of a data series follow different statistical distributions – commonly of the same functional form but having different parameters. This paper develops an exact approach for finding maximum likelihood estimates of the change points and within-segment parameters when the functional form is within the general exponential family. The algorithm, a dynamic program, has execution time only linear in the number of segments and quadratic in the number of potential change points. The details are worked out for the normal, gamma, Poisson and binomial distributions.

Introduction

The change-point model is appropriate for some data sets with a natural ordering. This model is that the sequence of data can be broken down into segments with the observations following the same statistical model within each segment, but different models in different segments. One example of a change-point model is that in which the data follow a common distributional form (for example normal) whose parameters (mean, variance or both) change from one segment to another. Another more complex model is the discontinuous segmented regression model in which the observations in each segment follow a linear regression, but the parameter(s) of this regression (slopes and/or intercept) change from one segment to the next.

Change-point models involve three issues – the choice of suitable parametric forms for the within-segment models; the choice of segment boundaries, or change-points, and the determination of the appropriate number of change-points to use in modeling the specific data set. Our discussion focuses on the second of these questions. The third question is outside the scope of this article but will be commented upon.

The best-known application of change-point modeling in data analysis is that of regression trees. In the most widely used implementation (Breiman et al., 1984), the data set is ordered by a continuous or ordinal predictor and then split into two subsequences – those cases whose predictor value falls below some change-point and those whose predictor value is above the change-point. The change-point is chosen to maximize the separation between the two subsequences. The same binary splitting algorithm is then applied to each of the subsequences, and repeated recursively until the subsequences can no longer be usefully subdivided. This is a “greedy” algorithm – it seeks to select each change-point to maximize an immediate return. As is generally the case with greedy algorithms (and as we shall see later by example) this hierarchic binary splitting, though fast, usually fails to give the optimum splits if there are two or more of them.

In this paper, we provide an exact and reasonably fast algorithm for performing a multiway split. We will do this, not only for the case of a normal mean (as used in regression trees) but for an arbitrary parameter in an exponential family model.

In the following sections, we will derive the likelihood equations for optimal multiway splitting of data following an exponential-family distribution. Showing that this satisfies Bellman's ‘Principle of Optimality’ it then follows that the optimal splits can be found with a dynamic programming algorithm. Finally, we will work out the details for a number of common data modeling distributions and illustrate them with actual data sets.

The exponential family provides a rich set of models for data. Familiar members of the family are the normal distribution, the exponential, the gamma, the binomial and the Poisson. The family also includes normal-error linear regression and some generalized linear models. Starting with the simpler (non-regression) models, the canonical form of the exponential family distribution or density function isf(x,θ)=exp[−θx+c(x)+d(θ)].

The parameter θ and data X may be either scalar or vector-valued. If vectors, they must be of the same dimension. Given a random sample of size n, X1,X2,…,Xn, all mutually independent, the sufficient statistic for θ isS=i=1nXi.

This statistic is the maximum likelihood estimator (MLE) of the parametric function nd′(θ), for which it is unbiased. Solving the equation d′(θ̂)=S/n gives the MLE of θ. Substituting this back into the likelihood gives the maximized likelihood.

Section snippets

The change-point model

Now extend the formulation to the change-point model. In this model, there are a number of change points, τ1,τ2,…,τk−1 such that the observations Xi with τj−1<iτj follow the particular exponential family model with parameter θj. In other words, the distributional form remains the same for all segments, but the parameter changes whenever one crosses over one of the change points τj.

As there are k−1 change-points, there are a total of k segments in this model. To simplify notation, we will

Particular applications

Changepoint in normal mean: We will start with the familiar example of scalar normal data with constant variance, where the mean may change from one segment to the next. This problem and the DP solution are discussed in more detail in Hawkins and Merriam 1973, Hawkins and Merriam 1975. As this is the problem addressed by regression trees (Breiman et al., 1984), it is particularly interesting to compare their implementation with exact optimization.

Turning the normal density into canonical

Formal testing for the number of segments

F(k,n) is the negative doubled maximized likelihood of the model fitting k segments to the full sequence of data. It therefore gives rise to generalized likelihood ratio tests:

To test the null hypothesis of a single segment versus the alternative of k segments, the GLR statistic is F(1,n)−F(k,n).

To test the null hypothesis of at most (k−1) segments against the alternative of k, the GLR statistic is F(k−1,n)−F(k,n).

On the face of it, the incremental change F(k−1,n)−F(k,n) should follow an

A regression-tree-type example

We start with a data set showing a non-linear relationship between a predictor and a dependent variable. In the absence of a parametric model, this data set might be subjected to analysis with a regression tree. The data set is shown as Fig. 1a, and the optimal segmentation into 2,3,…,6 segments is Fig. 1b. Table 1 shows the optimal segment boundaries, the pooled residual sum of squares F(r,n) and the change in residual sum of squares as we go from one value of r to the next. Note that the

Conclusion

The change-point model for the general exponential family can be thought of as a generalized non-linear model. As such it would seem to be computationally intensive in the number of non-linear parameters – the changepoints. On the contrary however, the model can be fitted in a time linear in the number of change-points using a dynamic programming formulation making it quite a small task with moderate size data sets.

We have discussed the single-parameter exponential family in some detail. The

Acknowledgements

The author is grateful to the referees for several suggestions for improving the paper.

References (19)

  • D.M. Hawkins et al.

    Zonation of sequences of heteroscedastic multivariate data

    Comput. Geosci.

    (1979)
  • J.H. Venter et al.

    Finding multiple abrupt change points

    Comput. Statist. Data Anal.

    (1996)
  • R.E. Bellman et al.

    Applied Dynamic Programming

    (1962)
  • R. Bellman et al.

    Curve fitting by segmented straight lines

    J. Amer. Statist. Assoc.

    (1969)
  • Bhattacharya, P.K., 1994. Some aspects of change-point analysis. In: Carlstein, E., Muller, H.G., Siegmund, D. (Eds.),...
  • L. Breiman et al.

    Classification and Regression Trees.

    (1984)
  • J. Chen et al.

    Testing and locating variance change-points with applications to stock prices

    J. Amer. Statist. Assoc.

    (1997)
  • A.L. Halpern

    Multiple-changepoint testing for an alternating segments model of a binary sequence

    Biometrics

    (2000)
  • D.M. Hawkins

    On the choice of segments in piecewise approximation

    J. Inst. Math. Appl.

    (1972)
There are more references available in the full text version of this article.

Cited by (159)

  • Change-detection-assisted multiple testing for spatiotemporal data

    2023, Journal of Statistical Planning and Inference
  • A Quasi-Bayesian change point detection with exchangeable weights

    2023, Journal of Statistical Planning and Inference
  • Influence of climate variability on water resource availability in the upper basin of Oum-Er-Rabiaa, Morocco

    2022, Groundwater for Sustainable Development
    Citation Excerpt :

    This change is marked by positive peaks, pointing to a maximum value of the Pettitt U statistic and indicating the onset of a significant change in the rainfall dynamics. Conversely, minimum values indicate that the series tends to regain the central tendency (Xie et al., 2013; Hawkins, 2001). In this way, the entire series is segmented into several sub-periods according to the location of the fluctuation points.

  • Revisiting HISTALP precipitation dataset

    2023, International Journal of Climatology
View all citing articles on Scopus

Work supported by the National Science Foundation under grant DMS 9803622.

View full text