Fitting multiple change-point models to data

doi:10.1016/S0167-9473(00)00068-2

Computational Statistics & Data Analysis

Volume 37, Issue 3, 28 September 2001, Pages 323-341

https://doi.org/10.1016/S0167-9473(00)00068-2 Get rights and content

Abstract

Change-point problems arise when different subsequences of a data series follow different statistical distributions – commonly of the same functional form but having different parameters. This paper develops an exact approach for finding maximum likelihood estimates of the change points and within-segment parameters when the functional form is within the general exponential family. The algorithm, a dynamic program, has execution time only linear in the number of segments and quadratic in the number of potential change points. The details are worked out for the normal, gamma, Poisson and binomial distributions.

Introduction

The change-point model is appropriate for some data sets with a natural ordering. This model is that the sequence of data can be broken down into segments with the observations following the same statistical model within each segment, but different models in different segments. One example of a change-point model is that in which the data follow a common distributional form (for example normal) whose parameters (mean, variance or both) change from one segment to another. Another more complex model is the discontinuous segmented regression model in which the observations in each segment follow a linear regression, but the parameter(s) of this regression (slopes and/or intercept) change from one segment to the next.

Change-point models involve three issues – the choice of suitable parametric forms for the within-segment models; the choice of segment boundaries, or change-points, and the determination of the appropriate number of change-points to use in modeling the specific data set. Our discussion focuses on the second of these questions. The third question is outside the scope of this article but will be commented upon.

The best-known application of change-point modeling in data analysis is that of regression trees. In the most widely used implementation (Breiman et al., 1984), the data set is ordered by a continuous or ordinal predictor and then split into two subsequences – those cases whose predictor value falls below some change-point and those whose predictor value is above the change-point. The change-point is chosen to maximize the separation between the two subsequences. The same binary splitting algorithm is then applied to each of the subsequences, and repeated recursively until the subsequences can no longer be usefully subdivided. This is a “greedy” algorithm – it seeks to select each change-point to maximize an immediate return. As is generally the case with greedy algorithms (and as we shall see later by example) this hierarchic binary splitting, though fast, usually fails to give the optimum splits if there are two or more of them.

In this paper, we provide an exact and reasonably fast algorithm for performing a multiway split. We will do this, not only for the case of a normal mean (as used in regression trees) but for an arbitrary parameter in an exponential family model.

In the following sections, we will derive the likelihood equations for optimal multiway splitting of data following an exponential-family distribution. Showing that this satisfies Bellman's ‘Principle of Optimality’ it then follows that the optimal splits can be found with a dynamic programming algorithm. Finally, we will work out the details for a number of common data modeling distributions and illustrate them with actual data sets.

The exponential family provides a rich set of models for data. Familiar members of the family are the normal distribution, the exponential, the gamma, the binomial and the Poisson. The family also includes normal-error linear regression and some generalized linear models. Starting with the simpler (non-regression) models, the canonical form of the exponential family distribution or density function is $f(x, θ)= exp [− θ ′ x +c(x)+d(θ)].$

The parameter $θ$ and data $X$ may be either scalar or vector-valued. If vectors, they must be of the same dimension. Given a random sample of size n, $X_{1}, X_{2},…, X_{n}$ , all mutually independent, the sufficient statistic for $θ$ is $S = ∑ i=1 n X_{i} .$

This statistic is the maximum likelihood estimator (MLE) of the parametric function $nd′(θ)$ , for which it is unbiased. Solving the equation $d′(θ ̂)= S /n$ gives the MLE of $θ$ . Substituting this back into the likelihood gives the maximized likelihood.

Section snippets

The change-point model

Now extend the formulation to the change-point model. In this model, there are a number of change points, τ₁,τ₂,…,τ_k−1 such that the observations $X_{i}$ with τ_j−1<i≤τ_j follow the particular exponential family model with parameter $θ_{j} .$ In other words, the distributional form remains the same for all segments, but the parameter changes whenever one crosses over one of the change points τ_j.

As there are k−1 change-points, there are a total of k segments in this model. To simplify notation, we will

Particular applications

Changepoint in normal mean: We will start with the familiar example of scalar normal data with constant variance, where the mean may change from one segment to the next. This problem and the DP solution are discussed in more detail in Hawkins and Merriam 1973, Hawkins and Merriam 1975. As this is the problem addressed by regression trees (Breiman et al., 1984), it is particularly interesting to compare their implementation with exact optimization.

Turning the normal density into canonical

Formal testing for the number of segments

F(k,n) is the negative doubled maximized likelihood of the model fitting k segments to the full sequence of data. It therefore gives rise to generalized likelihood ratio tests:

To test the null hypothesis of a single segment versus the alternative of k segments, the GLR statistic is F(1,n)−F(k,n).

To test the null hypothesis of at most (k−1) segments against the alternative of k, the GLR statistic is F(k−1,n)−F(k,n).

On the face of it, the incremental change F(k−1,n)−F(k,n) should follow an

A regression-tree-type example

We start with a data set showing a non-linear relationship between a predictor and a dependent variable. In the absence of a parametric model, this data set might be subjected to analysis with a regression tree. The data set is shown as Fig. 1a, and the optimal segmentation into 2,3,…,6 segments is Fig. 1b. Table 1 shows the optimal segment boundaries, the pooled residual sum of squares F(r,n) and the change in residual sum of squares as we go from one value of r to the next. Note that the

Conclusion

The change-point model for the general exponential family can be thought of as a generalized non-linear model. As such it would seem to be computationally intensive in the number of non-linear parameters – the changepoints. On the contrary however, the model can be fitted in a time linear in the number of change-points using a dynamic programming formulation making it quite a small task with moderate size data sets.

We have discussed the single-parameter exponential family in some detail. The

Acknowledgements

The author is grateful to the referees for several suggestions for improving the paper.

References (19)

D.M. Hawkins et al.
Zonation of sequences of heteroscedastic multivariate data
Comput. Geosci.
(1979)
J.H. Venter et al.
Finding multiple abrupt change points
Comput. Statist. Data Anal.
(1996)
R.E. Bellman et al.
Applied Dynamic Programming
(1962)
R. Bellman et al.
Curve fitting by segmented straight lines
J. Amer. Statist. Assoc.
(1969)
Bhattacharya, P.K., 1994. Some aspects of change-point analysis. In: Carlstein, E., Muller, H.G., Siegmund, D. (Eds.),...
L. Breiman et al.
Classification and Regression Trees.
(1984)
J. Chen et al.
Testing and locating variance change-points with applications to stock prices
J. Amer. Statist. Assoc.
(1997)
A.L. Halpern
Multiple-changepoint testing for an alternating segments model of a binary sequence
Biometrics
(2000)
D.M. Hawkins
On the choice of segments in piecewise approximation
J. Inst. Math. Appl.
(1972)

There are more references available in the full text version of this article.

Cited by (159)

Change-detection-assisted multiple testing for spatiotemporal data
2023, Journal of Statistical Planning and Inference
This paper considers a large-scale multiple testing problem for spatiotemporal data with multiple change points. A data-driven procedure that aims to fully utilize the clustering information is proposed. Specifically, we first develop a new change-point detection algorithm that integrates the kernel-based aggregation of spatial observations with a global loss function at the temporal level to group data into several sets, and then derive an FDR control scheme for set-wise multiple testing. Under some mild conditions on the spatiotemporal dependence structure, FDR is shown to be strongly controlled. Theoretical analysis and numerical studies demonstrate the advantages of the algorithm over competing methods.
A shape-based multiple segmentation algorithm for change-point detection
2023, Computers and Industrial Engineering
We consider the detection and localization of change points for the off-line sequence of observations. Specifically, we propose a new multi-segmentation algorithm for detecting multiple change-points, named shape-based multiple segmentation algorithm, which is a generalization of binary segmentation. The proposed method is combined with deep mining on the shape information of the test statistics curve to overcome the Gaussian distribution hypothesis limitation and the limitation of traditional segmentation methods only being able to detect one change-point per stage. Combined with shape context, a robust testing statistic was developed via a shape-based descriptor statistic instead of the traditional CUSUM statistic. Then a data-driven threshold by the rightmost sudden-drop point is proposed, and the change points are further identified by single-peak identification. An efficient multiple segmentation based on a shape recognition procedure is implemented to locate change points. The effectiveness of the proposed procedure is illustrated using both synthetic data sets and real world data from electrical distribution networks.
A Quasi-Bayesian change point detection with exchangeable weights
2023, Journal of Statistical Planning and Inference
A Quasi-Bayesian change point test statistic is derived, under the fixed and random exchangeable priors, which is asymptotically close to an important subclass of Poisson–Dirichlet weights. While detecting change at each point, the random prior can be updated for future change points in a sequential sampling manner. Asymptotic behaviors of quasi-Bayesian test statistics, under the null and alternative hypothesis are represented in terms of stochastic integrals. Also, the M-estimate approaches for change in mean for both cases of finite and infinite variance observations are discussed. Aside to the application in change point detections, the asymptotic analysis reveals some interesting probabilistic properties. Multiple change point detection and simulation of finite sample distribution of test statistics are studied Finally, a conclusion section is also given.
Influence of climate variability on water resource availability in the upper basin of Oum-Er-Rabiaa, Morocco
2022, Groundwater for Sustainable Development
Citation Excerpt :
This change is marked by positive peaks, pointing to a maximum value of the Pettitt U statistic and indicating the onset of a significant change in the rainfall dynamics. Conversely, minimum values indicate that the series tends to regain the central tendency (Xie et al., 2013; Hawkins, 2001). In this way, the entire series is segmented into several sub-periods according to the location of the fluctuation points.
This study aims to analyze rainfall data series of four hydrometric stations in a mountainous context in Morocco, over the period 1970–2017. Periods of disruption were identified through statistical approaches based on rupture detection: the Pettitt test, the Bayesian procedure of Lee and Heghinian, and the Buishand test. The results show that the studied series are characterized by several breaks which indicate a variation in the overall trend of the rainfall regime during 1980, 1995, 2000, and around 2010. In addition, a diagnostic of dry and wet years is carried out by applying the Standardized Precipitation Index (SPI); a prolonged period of drought was observed from 1980 to 2010, with the exception of several short-term rainy events, such as in 1994–1996 and the early 2000s. The variability of spring discharge indicates an evolution consistent with that of rainfall; the largest decrease in flow rates was recorded during the period from 1980 to 1993. In addition, periods of abrupt increases were identified, with a maximum recorded in 1994–1996. The phase opposition showed a marked consistency between the North Atlantic Oscillation (NAO), the SPI, and the variability of spring discharge, which provides an overview of the influence of atmospheric circulation on the evolution of precipitation and, consequently, on the availability of water resources in the area.
Asymptotic properties of M-estimators based on estimating equations and censored data in semi-parametric models with multiple change points
2021, Journal of Mathematical Analysis and Applications
Statistical models with multiple change points in presence of censored data are used in many fields; however, the theoretical properties of M-estimators of such models have received relatively little attention. The main purpose of the present work is to investigate the asymptotic properties of M-estimators of the parameters of a multiple change-point model for a general class of models in which the form of the distribution can change from segment to segment and in which, possibly, there are parameters that are common to all segments, in the setting of a known number of change points. Consistency of the M-estimators of the change points is established and the rate of convergence is determined. The asymptotic normality of the M-estimators of the parameters of the within-segment distributions is established. Since the approaches used in the complete data models are not easily extended to multiple change-point models in the presence of censoring, we have used some general results of Kaplan-Meier integrals. We investigate the performance of the methodology for small samples through a simulation study.
Revisiting HISTALP precipitation dataset
2023, International Journal of Climatology

View all citing articles on Scopus

^☆: Work supported by the National Science Foundation under grant DMS 9803622.

View full text

Fitting multiple change-point models to data☆

Abstract

Introduction

Section snippets

The change-point model

Particular applications

Formal testing for the number of segments

A regression-tree-type example

Conclusion

Acknowledgements

Comput. Geosci.

Comput. Statist. Data Anal.

Applied Dynamic Programming

Curve fitting by segmented straight lines

J. Amer. Statist. Assoc.

Classification and Regression Trees.

Testing and locating variance change-points with applications to stock prices

J. Amer. Statist. Assoc.

Multiple-changepoint testing for an alternating segments model of a binary sequence

Biometrics

On the choice of segments in piecewise approximation

J. Inst. Math. Appl.