Wavelets in functional data analysis: Estimation of multidimensional curves and their derivatives
Introduction
Functional Data Analysis (FDA) is the branch of statistics which focuses on data that can be seen as the observed value of a functional random variable (see, e.g., Ramsay and Silverman, 2005, Ferraty and Vieu, 2006). However, from a practical point of view, every data is observed on a discrete grid and a measurement error is also present. A crucial step of the analysis thus consists in the estimation of the continuous functional data starting from its discrete observation. Here, in particular, we are especially interested in the estimation of multidimensional curves and their derivatives. Sangalli et al. (2009) proposed a smoothing technique based on free-knot splines, that was shown to provide very accurate estimates of multidimensional curves and their derivatives, even when the curves are characterized by spatial inhomogeneities. In this work, we instead describe a technique based on wavelet bases expansion, that is also capable of accurate estimation of multidimensional curves presenting strongly localized features, such us peaks and oscillations.
Wavelet bases have already proved to be very useful in functional data analysis, dealing with one-dimensional curves, thanks largely to their natural local-adaptivity, that allows them to accommodate a wide variety of functional forms. Besides the problem of curve estimation (see, e.g., Antoniadis et al., 1994), other settings where wavelets have been used include for instance the estimation parameters of stochastic processes (see, e.g., Frías and Ruiz-Medina, 2011, and reference therein), the framework of functional regression (see, e.g., Aguilera et al., 2008), as well as functional anova (see, e.g., Yang and Nie, 2008) and functional classification (see, e.g., Wang et al., 2007, Berlinet et al., 2008, Antoniadis et al., 2010, Timmermans et al., 2011, and references therein). Due to the strong increase in the recording of multidimensional functional data, it seems to be of interest to extend the field of application of the above mentioned methods to the multidimensional case, developing a wavelet-based estimation technique that can accurately handle multidimensional curves.
Likewise in Sangalli et al. (2009), our data being noisy and discrete observations of some -dimensional curve, we look for an estimate that is itself a proper -dimensional curve. This means that we discard the simplistic idea of obtaining an estimate by juxtaposition of separate smoothing of the functional coordinates of the curve. In fact, if the curve has a significant feature at some point of the physical space, we expect that this will be to some extent reflected on all coordinates concurrently; for instance, if the curve has at some point a discontinuity in some of the derivatives, this will be present on all coordinates. For this reason, we develop a novel estimation procedure which takes into account simultaneously all the space coordinates of the multidimensional curve. Moreover, we also consider the case where the components of error in the dimensions are correlated, and show how to efficiently deal with this issue. The proposed estimation technique also provides consistent estimates of the curve derivatives. It should be noticed that wavelet bases have been so far mainly applied in problems where there were no interest in derivatives, because of the absence of close analytical forms for smooth wavelet bases, an issue that has until now restricted their application to a confined part of the FDA field. To overcome this limitation, we resort here to a numerical method (see Strang and Nguyen, 1996) that allows to obtain derivatives of wavelet estimated data. An additional contribution of the paper is a scheme for aligning an orthogonal basis so that the sampled curve values are a better approximation to the scaling coefficients needed to initialize the discrete wavelet transform, thus obtaining a more accurate estimation of curve derivatives. An extended abstract of this work appeared in Pigoli and Sangalli (2011).
Our research has been stimulated by the analysis of Electro Cardio Gram (ECG) records, collected by the 118 Dispatch Center (the medical operating emergency unit) in Milano, Italy, as part of the PROMETEO project “PROgetto sull’area Milanese Elettrocardiogrammi Teletrasferiti dall’Extra Ospedaliero”. See Ieva et al. (2011) for details. The aim of this Project is to anticipate the diagnostic time in heart ischemia, in order to improve the prognosis of reperfusive treatments and reduce infarction complications. In particular, we consider a sample of multi-lead tele-transmitted ECG records, both physiological and pathological. The estimates here, derived via the proposed multidimensional smoothing technique, are thus used in Ieva et al. (2011), where a semi-automatic diagnostic procedure is proposed, based on the ECG morphology, that is able to classify physiological and pathological traces.
ECG data have a multidimensional nature, because these records provide potential differences, named leads, between multiple electrodes; in fact, as it will be described in Section 6, ECG traces can be seen as eight-dimensional functional data, whose eight coordinates, corresponding to eight leads, measure different projections of the same physical dynamics in different directions. Smoothing of these data hence calls for a technique that takes into account simultaneously the eight coordinates of this functional data; besides helping in detecting significant features which reflect on more than one lead, thus enhancing patter recognition, such procedure provides coherent estimates, where the different projections of the heart dynamics are among them consistent. Moreover, as it will be clarified later, the components of error on the eight leads are correlated, an issue that can be appropriately taken into account within our technique working jointly on the coordinates. It should also be noticed that wavelet basis are particularly well suited to capture ECG shapes, that are characterized by localized strong peaks and oscillations.
As mentioned in the previous section, we devote particular attention to the computation of estimates that are accompanied by good estimates of their derivatives. This is paid off in Ieva et al. (2011), where it is shown that, to better study ECG morphology and efficiently distinguish between physiological and pathological ECG traces, it is necessary to take into account both the ECG traces and also their first derivatives.
The paper is organized as follows. In Section 2, we briefly recall wavelet bases, we review a numerical method that allows to compute pointwise values of a wavelet and its derivatives, and we summarize wavelet smoothing for one dimensional functional data; in this section we moreover derive an optimal translation of the orthogonal basis so that the sampled curve values are a better approximation to the scaling coefficients at the finer scale. Section 3 accurately extends wavelet-based estimation techniques to the case of curves in more than one dimension. Section 4 illustrates the good performances of the proposed technique, especially in the case of multidimensional functional data characterized by strongly localized features. In Section 5, we consider the case where the components of error in the dimensions are correlated. Section 6 is devoted to the application to the multi-lead ECG data, that have been stimulus to this research. Finally, some conclusive considerations are drawn in Section 7.
Section snippets
An overview on wavelets
We briefly recall wavelet bases for . For a systematic introduction to wavelets, see, e.g., Mallat (1999) or Nason (2008). Wavelets are defined starting from an orthogonal multiresolution. Definition 2.1 Let be a sequence of closed subspaces and let . An orthogonal multiresolution for is a couple such that: and is an orthonormal basis for and .
Wavelet estimation for curves in more than one dimension
We now extend wavelet-based estimation techniques to the case of curves in more than one dimension. The function we want to estimate has the form which describes parametric curves in dimensions. The observed values are generated by the model where are i.i.d. multinormal errors with mean the null vector and variance–covariance matrix . Our aim is to estimate the function and its derivatives. As anticipated in the Introduction, we
Simulation studies
In this section, we illustrate, via a two-case simulation study, the good performances of the proposed wavelet fitting technique for multi-dimensional functional data, particularly when the true curves to be estimated are characterized by strongly localized features. In the implementation of the technique, we use here the Daubechies wavelet basis with 10 vanishing moments, because this basis is compactly supported and smooth enough to allow the estimation of second derivatives (see Daubechies,
Errors correlated in the dimensions
The method proposed in Section 3 for the estimation of multidimensional wavelet coefficients assumes that the components of the error in the dimensions are uncorrelated, i.e., . However, in many applications it might be useful to allow for correlation of the components of error in the various directions, since these may capture the same source of noise. This is the case, for instance, of the ECG data, whose analysis has motivated our research. In fact, as it will be clearer
Application to ECG data
In this section, we apply the proposed multidimensional wavelet fitting technique for the estimation of ECG records collected by the 118 Dispatch Center in Milano within the PROMETEO project; see Ieva et al. (2011) and Ieva and Paganoni (2011). These ECG traces have been tele-transmitted from ambulances during emergency rescue operations (in Italy most emergency rescue operations are connected to ischemic heart diseases, that alone cause more than 40% of the overall deaths in the country). One
Discussion
We have described a wavelet-based method for the accurate estimation of multidimensional curves and their derivatives; the method also allows for correlation of the components of error in the dimensions. As illustrated by means of simulation studies, the proposed estimation technique is particularly attractive when the multidimensional functional data are characterized by strongly localized features. In particular, the motivating application for this research concerned the fitting of
Acknowledgments
We are very grateful to Piercesare Secchi and James O. Ramsay for a careful reading of the present manuscript and many helpful suggestions. We would also like to thank Marco Verani for constructive discussions. The data analyzed in Section 6 have been provided by 118 Milan Dispatch Center and Mortara Rangoni Europe s.r.l.; we wish to thank Anna Maria Paganoni, leader of the statistical group within the PROMETEO Project, for support on the analysis of these data. This work has been funded by the
References (34)
- et al.
Curves discrimination: a nonparametric functional approach
Computational Statistics & Data Analysis
(2003) - et al.
Computing functional estimators of spatiotemporal long-range dependence parameters in the spectral-wavelet domain
Journal of Statistical Planning and Inference
(2011) - et al.
Estimation of functional regression models for functional responses by wavelet approximation
- Antoniadis, A., Brossat, X., Cugliari, J., Poggi, J.M., 2010. Clustering functional data using wavelets. In: Electronic...
- et al.
Wavelet methods for curve estimation
Journal of the American Statistical Association
(1994) - et al.
Functional supervised classification with wavelets
Annales de l’I.S.U.P.
(2008) - et al.
Fast wavelet transforms and numerical algorithms I
Communications on Pure and Applied Mathematics
(1991) - et al.
Corrected integral shape averaging applied to obstructive sleep apnea detection from the electrocardiogram
EURASIP Journal on Advances in Signal Processing
(2007) - Brown, C.L., Brcich, R.F., Debes, C., 2005. Adaptive M-estimators for use in structured and unstructured robust...
- et al.
A data-driven block thresholding approach to wavelet estimation
Annals of Statistics
(2009)
Orthonormal basis of compactly supported wavelets
Communictions on Pure and Applied Mathematics
Adapting to unknown smoothness via wavelet shrinkage
Journal of the American Statistical Association
Wavelet shrinkage: asymptopia
Journal of the Royal Statistical Society. Series B
The discrete multiple wavelet transform and thresholding methods
IEEE Transactions on Signal Processing
Nonparametric regression on functional data: inference and practical aspects
Australian & New Zealand Journal of Statistics
Cited by (17)
Dynamic modeling for multivariate functional and longitudinal data
2024, Journal of EconometricsBest estimation of functional linear models
2016, Journal of Multivariate AnalysisCitation Excerpt :Recent developments in the estimation of derivatives are contained in Sangalli et al. [12] and in Pigoli and Sangalli [10]. See also Baraldo et al. [3],
Differential equation model of carbon dioxide emission using functional linear regression
2019, Journal of Applied Statistics