Elsevier

Signal Processing

Volume 108, March 2015, Pages 576-588
Signal Processing

Slope estimation in noisy piecewise linear functions

https://doi.org/10.1016/j.sigpro.2014.10.003Get rights and content

Highlights

  • A hidden Markov model formulation for estimating slopes in a noisy piecewise linear function is proposed.

  • A computationally tractable dynamic programming routine on a linearly growing trellis for fast maximum a posteriori estimation is developed.

  • Optimality analysis of this routine via simulations and a comparison with reasonable upper and lower bounds is presented.

  • An alternating maximization algorithm to estimate unknown model parameters from data is also derived.

  • Results show that the algorithm is general enough to encompass different applications.

Abstract

This paper discusses the development of a slope estimation algorithm called MAPSlope for piecewise linear data that is corrupted by Gaussian noise. The number and locations of slope change points (also known as breakpoints) are assumed to be unknown a priori though it is assumed that the possible range of slope values lies within known bounds. A stochastic hidden Markov model that is general enough to encompass real world sources of piecewise linear data is used to model the transitions between slope values and the problem of slope estimation is addressed using a Bayesian maximum a posteriori approach. The set of possible slope values is discretized, enabling the design of a dynamic programming algorithm for posterior density maximization. Numerical simulations are used to justify choice of a reasonable number of quantization levels and also to analyze mean squared error performance of the proposed algorithm. An alternating maximization algorithm is proposed for estimation of unknown model parameters and a convergence result for the method is provided. Finally, results using data from political science, finance and medical imaging applications are presented to demonstrate the practical utility of this procedure.

Introduction

The need for piecewise linear regression arises in many different fields, as diverse as biology, geology, and the social sciences. This paper addresses the problem of direct estimation of slopes from piecewise linear data. An important application of interest for this paper is ultrasound shear wave elastography, where ultrasonic echoes are used to track the motion of an externally generated mechanical shear wave pulse traveling through multiple tissue interfaces [19]. The time of arrival of this shear wave pulse is recorded as a function of spatial coordinates in the ultrasound imaging plane and the reciprocal of the slope of this function gives an estimate of the speed of the wave. Breakpoints (where the slope changes) indicate tissue interfaces. These estimates are useful from a clinical perspective because they provide a way to quantify mechanical properties of tissue, thereby adding value to subjective judgments about the location and size of cancerous tumors.

A similar issue in larger spatial dimensions occurs in seismology where the time of arrival of seismic waves is tracked at different locations relative to the epicenter of an earthquake. The velocity of these waves provides information about the mechanical properties of the geological medium. Piecewise linear data also occurs in the study of flow of soil through water streams and is referred to as bedload data [20].

Assume that the piecewise linear data is generated by the following discrete time hidden Markov model (HMM). The underlying (unknown) function takes on values Zn at each discrete index 1nN. This function value is obtained by accumulating slope values Sk up to the time index n. Zero mean Gaussian noise with variance σ2 is added to each running sum resulting in the observed function value Xn. Also, suppose that for any n, the probability of maintaining the previous slope value is p and the probability of transitioning into a new slope value is 1p. These relations can be written mathematically as follows:Z0=0withprobability1,Zn=Zn1+Sn,Xn=Zn+wnfor n=1,,N where wn~iidN(0,σ2). A Markov structure is imposed on the slope values as follows: Sn={Sn1withprobabilitypUnwithprobability1pfor n=2,,N where Un~U({0,1/(M1),,(M2)/(M1),1}\{Sn1}) denotes a discrete uniform random variable taking on one of M1 possible slope values and the initial slope value is drawn uniformly as S1~U({0,1/(M1),,(M2)/(M1),1}).

Another implicit assumption is that the slopes can take on values on a closed bounded interval [sl,su] with upper and lower limits 0<sl<su< known a priori. For instance, in the ultrasound-based wave tracking application, the values of sl and su can be obtained from the underlying physics which dictates that such mechanical waves travel with speeds between 0.5 and 10 m/s in homogeneous tissue. With the knowledge of sl and su, the given data vector can be translated and rescaled so that all slope values lie in the interval [0,1]. Hence, without loss of generality, it suffices to design a slope estimation algorithm that operates with a finite set of slopes S={0,1/(M1),,(M2)/(M1),1}. Intuitively, this quantization step is justified because in the presence of noise it is impossible to detect the difference between slope values that differ only slightly.

The main contributions of this paper are as follows:

  • (a)

    a hidden Markov model formulation of the slope estimation problem that is general enough to encompass different applications;

  • (b)

    a procedure for maximum a posteriori (MAP) estimation of slopes from this Markov model;

  • (c)

    a dynamic programming routine on a linearly growing trellis for fast MAP estimation;

  • (d)

    a mean squared error (MSE) optimality analysis of this routine via simulations and a comparison with reasonable upper and lower bounds;

  • (e)

    an alternating maximization algorithm that alternately maximizes an objective function with respect to the unknown sequence of slope values and unknown model parameters to jointly estimate both of them from data;

  • (f)

    a comparison of the performance of this algorithm with other methods in literature applied to real world data.

In many real world applications, the local slope values of an observed noisy function have interesting physical interpretations. Most of the existing methods do not directly address slope estimation; rather, they attempt to fit a model to the data. For instance, standard regression or spline-based methods can be used to fit a smooth function to the data and local slopes can be estimated from this fit. However, even if the function-fitting algorithm generates optimal fits (according to a cost function such as the minimum MSE), there is no guarantee that the local slope estimates obtained from this fit are themselves optimal. This paper bypasses the need for such post-processing by directly estimating the slopes and breakpoints. This is particularly useful when the slopes correspond directly to the variables of interest and the breakpoints correspond to where those variables change.

The topic of slope estimation from noisy data is quite old; an early paper can be traced back to 1964 where the popular Savitzky–Golay differentiator [10] was introduced. Their main idea is to use a locally windowed least squares fit to estimate the slope at each data sample, where the window coefficients are chosen to satisfy a certain frequency response that mimics a high pass filter together with some level of noise averaging. Another similar technique that is used in statistics is called locally weighted least squares regression (LOWESS) [22]. However, these methods undesirably smooth out the breakpoint locations in when data has sharp transitions or jumps. In contrast, the algorithm in the present paper prevents blurring the transitions by explicitly allowing for sharp slope transitions using a Markov model.

In some situations, the raw data can be massaged using a preprocessing step so that it becomes piecewise linear. The simplest example is the case of piecewise constant data — the running sum (integral) of such data yields a piecewise linear function. Ratkovic and Eng [21] discuss a statistical spline fitting approach combined with the Bayesian information criterion (BIC) to detect abrupt transitions in political approval ratings. Data from their paper is used in Section 6.1. As a special case, their method can be applied when function values stay almost constant over long intervals and occasionally shift to a new value. In another application, Bai and Perron [16] use statistical regression techniques to detect multiple regime shifts in interest rate data. The algorithm developed in the present paper provides comparable numerical performance as the Bai–Perron algorithm as shown in Section 6.2.

Closely related problems of piecewise linear regression for noisy data have been addressed over the years. For example, an early paper by Hudson [1] focuses on a technique to obtain piecewise linear fits in a least-squares sense with only two segments. The break point location is included as a parameter in the least squares optimization problem. The method is extended by dealing with multiple breakpoint locations on a case-by-case basis which becomes combinatorially intractable as the number of breakpoints increases. Bellman [2] suggests a dynamic programming approach when the number of breakpoints is unknown. However, this method requires the use of a large number of grid points for accurate results. Gallant and Fuller [3] generalize this to the fitting of arbitrary polynomials with unknown breakpoints while requiring the function to be composed of segments with continuous first derivatives. They apply a nonlinear optimization routine (Gauss–Newton minimization) to fine-tune breakpoint locations while minimizing the squared error relative to the data. Another non-parametric approach involves use of edge preserving penalized optimization such as total variation minimization [4]. Denison et al. [5] use a Markov chain Monte Carlo approach to fit piecewise polynomials with different numbers and locations of knot points. Tishler and Frey [6] discuss a maximum likelihood approach to fit a convex piecewise linear function expressed as a point-wise maximum of a collection of affine functions with unknown coefficients. Maximum likelihood estimates are obtained by running a constrained optimization routine for a smoothed approximation of an MSE cost function to bypass non-differentiability issues. A similar data model coupled with data clustering heuristics is utilized in a more recent paper by Magnani and Boyd [7] on fitting convex piecewise linear functions.

The use of adaptive methods is an attractive way of handling the issue of unknown number of breakpoints. One of the first algorithms using this technique was proposed by Friedman [8] under the name “adaptive regression splines (ARES).” Recursive partitioning is used to obtain better partitions of the set of data points at each iteration. Either goodness of fit criteria or generalized cross validation is used to estimate the number of partitions. On similar lines, Kolaczyk and Nowak [9] apply the method of recursive dyadic partitioning and fit a smooth function in each partition using maximum likelihood estimation. A penalty term for the number of partitions is introduced to trade off model complexity and quality of fit. In recent work, Saucier and Audet [11] propose a different class of adaptively constructed basis functions that can capture the transition points in otherwise piecewise smooth functions.

In [15] Bai and Perron discuss the problem of detecting structural changes in data without requiring the estimated function to be piecewise linear or even continuous. Their related paper [16] discusses a dynamic programming approach to obtain a least sum of squares fit. The model order is determined by using the Akaike information criterion (AIC) [18] and they impose a minimum limit on the “run length” of each segment in the piecewise model. In contrast, the present paper proposes a dynamic program that generates optimum MAP estimates based on a stochastic finite state HMM.

In the signal processing literature, two kinds of paradigms have been applied to this problem — Bayesian estimation and pattern recognition approaches. Punskaya et al. [23] model the function using the number and locations of the breakpoints as free parameters with certain prior distributions. The posterior density of the parameters conditioned on the noisy data is estimated through Monte Carlo techniques. In response to this method, Fearnhead [24] proposes a direct method for estimating parameters of the same model without resorting to Monte Carlo simulations and exploiting a Markov property in the model that allows calculation of the probability of future data points conditioned on the most recent breakpoint location. In the present paper, a Markov structure is imposed on the underlying slope values that are chosen from a finite set and the MAP algorithm estimates these slopes at each data sample.

The rest of this paper is organized as follows. The problem statement is discussed further in Section 2. A computationally tractable algorithm that uses the principle of dynamic programming is presented in Section 3. The issue of automatic selection of model parameters from data is addressed in Section 4. The problem of choosing the right number of quantization levels is analyzed through simulations and MSE distortion bounds in Section 5. A series of diverse applications are presented in Section 6, followed by some closing remarks in Section 7.

The notation vi:j is hereafter used to denote a vector (vi,vi+1,,vj) and the model parameters are denoted by θ=(σ2,p).

Section snippets

Problem statement and stochastic formulation

A pictorial representation of the relationships between various random variables of the piecewise linear data model for this paper is shown in Fig. 1. It can be seen that these relationships lead to an HMM with two hidden layers. Moreover, the cardinality of the state space of each of the random variables Zn increases with n. The next two subsections show why standard inference and parameter learning algorithms for HMMs cannot be directly applied to this problem.

Maximum a posteriori estimation of slope values

The maximum a posteriori slope sequence can be obtained by maximizing (4) as a function of the slope sequence, which is equivalent to the following optimization problem:s1:N=argmaxs1:N(12σ2n=1N(xnzn)2+n=2Nlog[pχ{0}(snsn1)+1pM1χ{0}c(snsn1)]).The term containing p(x1:N|θ) is dropped because it does not depend on s1:N. Besides producing the MAP optimal solution, the objective function has certain intuitively appealing characteristics. The first summation on the right-hand side of (5) is

Alternating maximization algorithm for estimating model parameters

The following result provides a useful alternative re-estimation procedure that bypasses the computational issues with the full EM approach. Yet, it has a similar advantage as an EM algorithm in that it increases the likelihood at each iteration. Under an additional minor technical assumption, it can also be shown that the procedure converges.

Theorem 2

Let s1:N be the current slope sequence estimate, s1:N be the new sequence estimate obtained by running FastTrellis with the current model parameter

Mean squared error optimality analysis and model order selection

The aim of this section is to propose an empirical method for selecting the right number of quantization levels M for the set of slope values. Simple arguments from source coding theory are used to obtain upper and lower bounds for the MSE performance of the algorithm. This method can be applied directly to the sequence of estimated slope values.

Assume that the slope values originate from a continuous amplitude Markov source with amplitude levels in the interval [0,1]. The goal is to

Presidential approval ratings

United States presidential job approval ratings have been published by various agencies over the years. These ratings are usually quoted as percentage values that are computed from the results of opinion polls administered to a sample of the country׳s population. An analysis of the approval ratings for President George W. Bush is presented by Ratkovic and Eng [21]. The result of applying the MAPSlope algorithm to a snippet from the same dataset is shown in Fig. 6. A sudden transition in the

Conclusion

This paper presented the MAPSlope method for estimating slopes from noisy piecewise linear data which applies techniques from Bayesian estimation. An alternating maximization routine was presented for estimating unknown model parameters, and its convergence properties were analyzed theoretically. Further, MSE performance of this algorithm was tested using simulated data for different parameter values. The MSE performance was compared with suitable upper and lower bounds, and it was shown that

References (44)

  • A. Tishler et al.

    A new maximum likelihood algorithm for piecewise regression

    J. Am. Stat. Assoc.

    (1981)
  • A. Magnani et al.

    Convex piecewise-linear fitting

    Optim. Eng.

    (2009)
  • J. Friedman

    Multivariate adaptive regression splines

    Ann. Stat.

    (1991)
  • E. Kolaczyk et al.

    Multiscale generalized linear models for nonparametric function estimation

    Biometrica

    (2005)
  • A. Savitzky et al.

    Smoothing and differentiation of data by simplified least squares procedures

    Anal. Chem.

    (1964)
  • L. Rabiner

    A tutorial on hidden Markov models and selected applications in speech recognition

    Proc. IEEE

    (1989)
  • R. Garica et al.

    An analysis of the real interest rate under regime shifts

    Rev. Econ. Stat.

    (1996)
  • J. Hamilton

    A new approach to the economic analysis of nonstationary time series and the business cycle

    Econometrica

    (1989)
  • J. Bai et al.

    Estimating and testing linear models with multiple structural changes

    Econometrica

    (1998)
  • J. Bai et al.

    Computation and analysis of multiple structural change models

    J. Appl. Econ.

    (2003)
  • T.-J. Yen

    A majorization–minimization approach to variable selection using spike and slab priors

    Ann. Stat.

    (2011)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Trans. Autom. Control

    (1974)
  • Cited by (8)

    • Local trend analysis method of hydrological time series based on piecewise linear representation and hypothesis test

      2022, Journal of Cleaner Production
      Citation Excerpt :

      Based on a certain number of change points, the piecewise linear representation (PLR) algorithm can split a time series into a number of subseries, which can be fitted by a number of local trend lines (Garcia et al., 2017). Currently, various PLR algorithms have been widely used in many fields such as finance (Luo and Chen, 2013), signal processing (Ingle et al., 2015) and biology (Kong et al., 2020), but they are rarely used in hydrology. One reason is that overall trends and changes in mean values rather than local trends are considered as a matter of more concern (Wang et al., 2017b).

    • A kernel smoothing algorithm for ablation visualization in ultrasound elastography

      2019, Ultrasonics
      Citation Excerpt :

      Displacements between consecutive frames are estimated using a one-dimensional (1D) cross-correlation algorithm and the peak displacements at each pixel are recorded to localize the shear wave pulse in space and time. The reciprocal of the slope of a plot of the time of arrival of the shear wave pulse as a function of lateral distance away from the needle is used to estimate shear wave velocity at each pixel in the scan plane [33]. (Reproducing Kernel Hilbert Space)

    • Regularised differentiation of measurement data in systems for monitoring of human movements

      2018, Biomedical Signal Processing and Control
      Citation Excerpt :

      Furthermore, no systematic strategy for the selection of the optimum differentiation method for a given practical problem has been proposed. Therefore, it is quite common to empirically compare the performance of different methods using real-world data related to a particular application − it has been done recently, e.g., in the context of automation [97–99], political science [100], aircraft diagnostics [101], medical imaging [102], image enhancement [103], oil-well hydraulics [104] and optical metrology [105]. The aim of this paper is to compare five methods for numerical differentiation in terms of their applicability in the systems for estimation of walking velocity using real-world data acquired by means of radar sensors and depth sensors.

    • Highly efficient hierarchical online nonlinear regression using second order methods

      2017, Signal Processing
      Citation Excerpt :

      Recent developments in information technologies, intelligent use of mobile devices and Internet have procured an extensive amount of data for the nonlinear modeling systems [1,2].

    • On characterizing scale effect of Chinese mutual funds via text mining

      2016, Signal Processing
      Citation Excerpt :

      Specifically, we choose to test the correlation between fund scale and the performance with the important help of text retrieval on the sheer volume of online financial reports. This is not unusual since big data analytics has been widely adopted in various application domains [8,17,21]. Previous studies have reported mixed findings, most of which are with data from financial markets of developed countries.

    • A Minimal Cardinality Solution to Fitting Sawtooth Piecewise-Linear Functions

      2022, Journal of Optimization Theory and Applications
    View all citing articles on Scopus

    This work was supported in part by NIH-NCI Grants R01CA112192-S103 and R01CA112192-05.

    1

    Sample MATLAB code is available on the author׳s website at http://homepages.cae.wisc.edu/~ingle/mapslope.html.

    View full text