Wavelets in functional data analysis: Estimation of multidimensional curves and their derivatives

https://doi.org/10.1016/j.csda.2011.12.016Get rights and content

Abstract

A wavelet-based method is proposed to obtain accurate estimates of curves in more than one dimension and of their derivatives. By means of simulation studies, this novel method is compared to another locally-adaptive estimation technique for multidimensional functional data, based on free-knot regression splines. This comparison shows that the proposed method is particularly attractive when the curves to be estimated present strongly localized features. The multidimensional wavelet estimation method is thus applied to multi-lead electrocardiogram records, where strongly localized features are indeed expected.

Introduction

Functional Data Analysis (FDA) is the branch of statistics which focuses on data that can be seen as the observed value of a functional random variable (see, e.g., Ramsay and Silverman, 2005, Ferraty and Vieu, 2006). However, from a practical point of view, every data is observed on a discrete grid and a measurement error is also present. A crucial step of the analysis thus consists in the estimation of the continuous functional data starting from its discrete observation. Here, in particular, we are especially interested in the estimation of multidimensional curves and their derivatives. Sangalli et al. (2009) proposed a smoothing technique based on free-knot splines, that was shown to provide very accurate estimates of multidimensional curves and their derivatives, even when the curves are characterized by spatial inhomogeneities. In this work, we instead describe a technique based on wavelet bases expansion, that is also capable of accurate estimation of multidimensional curves presenting strongly localized features, such us peaks and oscillations.

Wavelet bases have already proved to be very useful in functional data analysis, dealing with one-dimensional curves, thanks largely to their natural local-adaptivity, that allows them to accommodate a wide variety of functional forms. Besides the problem of curve estimation (see, e.g., Antoniadis et al., 1994), other settings where wavelets have been used include for instance the estimation parameters of stochastic processes (see, e.g., Frías and Ruiz-Medina, 2011, and reference therein), the framework of functional regression (see, e.g., Aguilera et al., 2008), as well as functional anova (see, e.g., Yang and Nie, 2008) and functional classification (see, e.g., Wang et al., 2007, Berlinet et al., 2008, Antoniadis et al., 2010, Timmermans et al., 2011, and references therein). Due to the strong increase in the recording of multidimensional functional data, it seems to be of interest to extend the field of application of the above mentioned methods to the multidimensional case, developing a wavelet-based estimation technique that can accurately handle multidimensional curves.

Likewise in Sangalli et al. (2009), our data being noisy and discrete observations of some p-dimensional curve, we look for an estimate that is itself a proper p-dimensional curve. This means that we discard the simplistic idea of obtaining an estimate by juxtaposition of p separate smoothing of the p functional coordinates of the curve. In fact, if the curve has a significant feature at some point of the physical space, we expect that this will be to some extent reflected on all p coordinates concurrently; for instance, if the curve has at some point a discontinuity in some of the derivatives, this will be present on all p coordinates. For this reason, we develop a novel estimation procedure which takes into account simultaneously all the space coordinates of the multidimensional curve. Moreover, we also consider the case where the components of error in the p dimensions are correlated, and show how to efficiently deal with this issue. The proposed estimation technique also provides consistent estimates of the curve derivatives. It should be noticed that wavelet bases have been so far mainly applied in problems where there were no interest in derivatives, because of the absence of close analytical forms for smooth wavelet bases, an issue that has until now restricted their application to a confined part of the FDA field. To overcome this limitation, we resort here to a numerical method (see Strang and Nguyen, 1996) that allows to obtain derivatives of wavelet estimated data. An additional contribution of the paper is a scheme for aligning an orthogonal basis so that the sampled curve values are a better approximation to the scaling coefficients needed to initialize the discrete wavelet transform, thus obtaining a more accurate estimation of curve derivatives. An extended abstract of this work appeared in Pigoli and Sangalli (2011).

Our research has been stimulated by the analysis of Electro Cardio Gram (ECG) records, collected by the 118 Dispatch Center (the medical operating emergency unit) in Milano, Italy, as part of the PROMETEO project “PROgetto sull’area Milanese Elettrocardiogrammi Teletrasferiti dall’Extra Ospedaliero”. See Ieva et al. (2011) for details. The aim of this Project is to anticipate the diagnostic time in heart ischemia, in order to improve the prognosis of reperfusive treatments and reduce infarction complications. In particular, we consider a sample of multi-lead tele-transmitted ECG records, both physiological and pathological. The estimates here, derived via the proposed multidimensional smoothing technique, are thus used in Ieva et al. (2011), where a semi-automatic diagnostic procedure is proposed, based on the ECG morphology, that is able to classify physiological and pathological traces.

ECG data have a multidimensional nature, because these records provide potential differences, named leads, between multiple electrodes; in fact, as it will be described in Section 6, ECG traces can be seen as eight-dimensional functional data, whose eight coordinates, corresponding to eight leads, measure different projections of the same physical dynamics in different directions. Smoothing of these data hence calls for a technique that takes into account simultaneously the eight coordinates of this functional data; besides helping in detecting significant features which reflect on more than one lead, thus enhancing patter recognition, such procedure provides coherent estimates, where the different projections of the heart dynamics are among them consistent. Moreover, as it will be clarified later, the components of error on the eight leads are correlated, an issue that can be appropriately taken into account within our technique working jointly on the p coordinates. It should also be noticed that wavelet basis are particularly well suited to capture ECG shapes, that are characterized by localized strong peaks and oscillations.

As mentioned in the previous section, we devote particular attention to the computation of estimates that are accompanied by good estimates of their derivatives. This is paid off in Ieva et al. (2011), where it is shown that, to better study ECG morphology and efficiently distinguish between physiological and pathological ECG traces, it is necessary to take into account both the ECG traces and also their first derivatives.

The paper is organized as follows. In Section 2, we briefly recall wavelet bases, we review a numerical method that allows to compute pointwise values of a wavelet and its derivatives, and we summarize wavelet smoothing for one dimensional functional data; in this section we moreover derive an optimal translation of the orthogonal basis so that the sampled curve values are a better approximation to the scaling coefficients at the finer scale. Section 3 accurately extends wavelet-based estimation techniques to the case of curves in more than one dimension. Section 4 illustrates the good performances of the proposed technique, especially in the case of multidimensional functional data characterized by strongly localized features. In Section 5, we consider the case where the components of error in the p dimensions are correlated. Section 6 is devoted to the application to the multi-lead ECG data, that have been stimulus to this research. Finally, some conclusive considerations are drawn in Section 7.

Section snippets

An overview on wavelets

We briefly recall wavelet bases for L2(R). For a systematic introduction to wavelets, see, e.g., Mallat (1999) or Nason (2008). Wavelets are defined starting from an orthogonal multiresolution.

Definition 2.1

Let {Vj}jZ be a sequence of closed subspaces VjL2(R) and let φV0. An orthogonal multiresolution for L2(R) is a couple ({Vj}j,φ) such that:

  • 1.

    VjVj+1

  • 2.

    jVj¯=L2(R) and j=+Vj={0}

  • 3.

    {lf(l)}Vj{lf(2l)}Vj+1

  • 4.

    {φ(lk)}kZ is an orthonormal basis for V0 and Rφ0.

The projections of fL2(R) on the sequence {Vj}j

Wavelet estimation for curves in more than one dimension

We now extend wavelet-based estimation techniques to the case of curves in more than one dimension. The function f we want to estimate has the form f:Rl(f1(l),,fp(l))Rp which describes parametric curves in p dimensions. The observed values are generated by the model wi=f(li)+εii=1,,n=2J where εi are i.i.d. multinormal errors with mean the null vector 0Rp and variance–covariance matrix σ2Ip. Our aim is to estimate the function f and its derivatives. As anticipated in the Introduction, we

Simulation studies

In this section, we illustrate, via a two-case simulation study, the good performances of the proposed wavelet fitting technique for multi-dimensional functional data, particularly when the true curves to be estimated are characterized by strongly localized features. In the implementation of the technique, we use here the Daubechies wavelet basis with 10 vanishing moments, because this basis is compactly supported and smooth enough to allow the estimation of second derivatives (see Daubechies,

Errors correlated in the p dimensions

The method proposed in Section 3 for the estimation of multidimensional wavelet coefficients assumes that the p components of the error in the p dimensions are uncorrelated, i.e., V ar(εi)=σ2Ip. However, in many applications it might be useful to allow for correlation of the components of error in the various directions, since these may capture the same source of noise. This is the case, for instance, of the ECG data, whose analysis has motivated our research. In fact, as it will be clearer

Application to ECG data

In this section, we apply the proposed multidimensional wavelet fitting technique for the estimation of ECG records collected by the 118 Dispatch Center in Milano within the PROMETEO project; see Ieva et al. (2011) and Ieva and Paganoni (2011). These ECG traces have been tele-transmitted from ambulances during emergency rescue operations (in Italy most emergency rescue operations are connected to ischemic heart diseases, that alone cause more than 40% of the overall deaths in the country). One

Discussion

We have described a wavelet-based method for the accurate estimation of multidimensional curves and their derivatives; the method also allows for correlation of the components of error in the p dimensions. As illustrated by means of simulation studies, the proposed estimation technique is particularly attractive when the multidimensional functional data are characterized by strongly localized features. In particular, the motivating application for this research concerned the fitting of

Acknowledgments

We are very grateful to Piercesare Secchi and James O. Ramsay for a careful reading of the present manuscript and many helpful suggestions. We would also like to thank Marco Verani for constructive discussions. The data analyzed in Section 6 have been provided by 118 Milan Dispatch Center and Mortara Rangoni Europe s.r.l.; we wish to thank Anna Maria Paganoni, leader of the statistical group within the PROMETEO Project, for support on the analysis of these data. This work has been funded by the

References (34)

  • F. Ferraty et al.

    Curves discrimination: a nonparametric functional approach

    Computational Statistics & Data Analysis

    (2003)
  • M.P. Frías et al.

    Computing functional estimators of spatiotemporal long-range dependence parameters in the spectral-wavelet domain

    Journal of Statistical Planning and Inference

    (2011)
  • A. Aguilera et al.

    Estimation of functional regression models for functional responses by wavelet approximation

  • Antoniadis, A., Brossat, X., Cugliari, J., Poggi, J.M., 2010. Clustering functional data using wavelets. In: Electronic...
  • A. Antoniadis et al.

    Wavelet methods for curve estimation

    Journal of the American Statistical Association

    (1994)
  • A. Berlinet et al.

    Functional supervised classification with wavelets

    Annales de l’I.S.U.P.

    (2008)
  • G. Beylkin et al.

    Fast wavelet transforms and numerical algorithms I

    Communications on Pure and Applied Mathematics

    (1991)
  • S. Boudaoud et al.

    Corrected integral shape averaging applied to obstructive sleep apnea detection from the electrocardiogram

    EURASIP Journal on Advances in Signal Processing

    (2007)
  • Brown, C.L., Brcich, R.F., Debes, C., 2005. Adaptive M-estimators for use in structured and unstructured robust...
  • T.T. Cai et al.

    A data-driven block thresholding approach to wavelet estimation

    Annals of Statistics

    (2009)
  • I. Daubechies

    Orthonormal basis of compactly supported wavelets

    Communictions on Pure and Applied Mathematics

    (1988)
  • D.L. Donoho et al.

    Adapting to unknown smoothness via wavelet shrinkage

    Journal of the American Statistical Association

    (1995)
  • D.L. Donoho et al.

    Wavelet shrinkage: asymptopia

    Journal of the Royal Statistical Society. Series B

    (1995)
  • T.R. Downie et al.

    The discrete multiple wavelet transform and thresholding methods

    IEEE Transactions on Signal Processing

    (1998)
  • F. Ferraty et al.

    Nonparametric regression on functional data: inference and practical aspects

    Australian & New Zealand Journal of Statistics

    (2007)
  • F. Ferraty et al.
  • Grieco, N., Ieva, F., Paganoni, A.M., 2011. Performance assessment using mixed effects models: a case study on coronary...
  • Cited by (17)

    • Best estimation of functional linear models

      2016, Journal of Multivariate Analysis
      Citation Excerpt :

      Recent developments in the estimation of derivatives are contained in Sangalli et al. [12] and in Pigoli and Sangalli [10]. See also Baraldo et al. [3],

    View all citing articles on Scopus
    View full text