Abstract
In this paper we introduce a flexible approach to approximate the regression function in the case of a functional predictor and a scalar response. Following the Projection Pursuit Regression principle, we derive an additive decomposition which exploits the most interesting projections of the prediction variable to explain the response. On one hand, this approach allows us to avoid the well-known curse of dimensionality problem, and, on the other one, it can be used as an exploratory tool for the analysis of functional dataset. The terms of such decomposition are estimated with a procedure that combines a spline approximation and the one-dimensional Nadaraya–Watson approach. The good behavior of our procedure is illustrated from theoretical and practical points of view. Asymptotic results state that the terms in the additive decomposition can be estimated without suffering from the dimensionality problem, while some applications to real and simulated data show the high predictive performance of our method.
Similar content being viewed by others
References
Ait-Saidi A, Ferraty F, Kassa R, Vieu P (2008) Cross-validated estimation in the single-functional index model. Statistics 42:475–494
Amato U, Antoniadis A, De Feis I (2006) Dimension reduction in functional regression with application. Comput Stat Data Anal 50:2422–2446
Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76:1102–1110
Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, Berlin
Cardot H, Sarda P (2005) Estimation in generalized linear models for functional data via penalized likelihood. J Multivar Anal 92:24–41
Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45:11–22
Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591
Cardot H, Mas A, Sarda P (2007) CLT in functional linear regression models. Probab Theory Relat Fields 138:325–361
Chen H (1991) Estimation of a projection-pursuit type regression model. Ann Stat 19:142–157
Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Ann Stat 37:35–72
De Boor C (2001) A practical guide to splines. Series in probability and statistics. Springer, Berlin
Eilers PHC, Li B, Marx BD (2009) Multivariate calibration with single-index signal regression. Chemom Intell Lab 96:196–202
Fan J, Gijbels I (2000) Local polynomial fitting. In: Schimek MG (ed) Smoothing and regression. Approaches, computation, and application. Wiley series in probability and statistics
Febrero-Bande M, Gonzalez-Manteiga W (2011) Generalized additive models for functional data. In: Ferraty F (ed) Recent advanced in functional data analysis and related topics. Physica-Verlag, Heidelberg
Ferraty F (ed) (2011) Recent advanced in functional data analysis and related topics. Physica-Verlag, Heidelberg
Ferraty F, Romain Y (2010) Oxford handbook on functional data analysis. Oxford University Press, London
Ferraty F, Vieu P (2002) The functional nonparametric model and applications to spectrometric data. Comput Stat 17:545–564
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Ferraty F, Vieu P (2009) Additive prediction and boosting for functional data. Comput Stat Data Anal 53:1400–1413
Ferraty F, Peuch A, Vieu P (2003) Modèle à indice fonctionnel simple. C R Acad Sci Paris 336:1025–1028
Ferraty F, Hall P, Vieu P (2010a) Most predictive design points for functional data predictor. Biometrika 97:807–824
Ferraty F, Laksaci A, Tadj A, Vieu P (2010b) Rate of uniform consistency for nonparametric estimates with functional variables. J Stat Plan Inference 140:235–260
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823
Gasser T, Kneip A, Koehler W (1991) A flexible and fast method for automatic smoothing. J Am Stat Assoc 86:643–652
Hall P (1989) On projection pursuit regression. Ann Stat 17:573–588
Huber PJ (1985) Projection pursuit. Ann Stat 13:435–475
James G (2002) Generalized linear models with functional predictors. J R Stat Soc B 64:411–432
James GM, Silverman BW (2005) Functional adaptive model estimation. J Am Stat Assoc 100:565–576
Jones MC, Sibson R (1987) What is projection pursuit? J R Stat Soc A 150:1–37
Martens H, Naes T (1991) Multivariate calibration. Wiley, New York
Műller HG, Yao F (2008) Functional additive model. J Am Stat Assoc 103:1534–1544
Nelder JA, Mead R (1965) A simplex algorithm for function minimization. Comput J 7:308–313
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Stone C (1982) Optimal global rates of convergence for nonparametric estimators. Ann Stat 10:1040–1053
Vieu P (1991) Quadratic errors for nonparametric estimates under dependence. J Multivar Anal 39:324–347
Vieu P (2002) Data-driven model choice in non parametric regression estimation. Statistics 36:231–246
Acknowledgements
This work is part of the current research advances on functional statistics developed through the working group STAPH in Toulouse (http://www.math.univ-toulouse.fr/staph). The first and fourth authors would like to thank all the participants in the activities of this group for continuous and fruitful support. All the authors want to thank two anonymous referees and the Associate Editor for their pertinent remarks which have deeply improved our paper. All the authors wish also to thank prof. Juyhun Park (Lancaster) for her very fruitful comments and proofreading on an earlier version of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Domingo Morales.
Appendix: Proofs of theoretical results
Appendix: Proofs of theoretical results
1.1 A.1 Proof of Theorem 4
For both assertions (i) and (ii), we first give proofs for the first estimated component \(\widehat{g}_{1,\theta_{1}}\), and we then quickly indicate how the results can be iterated for higher-order component \(\widehat{g}_{j,\theta_{j}}\).
-
(a)
Proof of (i): Case j=1. The problem of estimating the function \(g_{1,\theta _{1}}\) is a standard problem of estimating a regression function between one scalar variable U and a functional variable V valued in some semi-metric space \(({\mathcal{V}},d)\). It corresponds here to U=Y, V=X and for fixed θ 1 to the semi-metric d(v,v′)=〈θ 1,v−v′〉. The results in Theorem 2 of Ferraty et al. (2010b) concern this general situation and state that
where the functions \(\varPsi_{\mathcal{C}}\) and \(\phi_{\mathcal{C}}\) control the topological structure induced by the semi-metric d. More precisely, the function \(\phi_{\mathcal{C}}\) is the small ball probability function
which controls the concentration of the variable V (in the sense of the topology induced by d), and the function \(\varPsi_{\mathcal{C}}\) is the Kolmogorov entropy
which controls the complexity of the set \(\mathcal{C}\). The compactness condition (8) on \(\mathcal{C}\) allows us to directly write that for the semi-metric d(v,v′)=〈θ 1,v−v′〉, we have
while the additional density condition (11) allows us to get
Finally, the claimed result (17) follows directly for j=1 from the bandwidth condition (13).
-
(b)
Proof of (i): Case j>1. We just present the proof for j=2, and, by induction, it will directly be valid for j=2,…,m. We have the following decomposition:
(29)with
and
Because \(\mathbb{E} [ Y-g_{1}( \langle \theta_{1},X \rangle )| \langle \theta_{2},X \rangle ] =g_{2}( \langle \theta_{2},X \rangle )\), we have directly by using the same methodology as before in point (a)
On the other hand, using condition (12) on the kernel and then the rate of convergence stated in part (a) of this proof, we directly have that
These two last results are enough to show that assertion (17) is true for j=2, and it extends obviously in an iterative way to any j≥2.
-
(c)
Proof of (ii): Case j=1. This has been obtained in various earlier papers on nonparametric one-dimensional regression setting (see, for instance, Vieu 1991 for a general form of the result (18)).
-
(d)
Proof of (ii): Case j>1. We just present the proof for j=2, and, by induction, it will directly be valid for j=2,…,m. It suffices to use again decomposition (29) and to follow step by step the scheme of proof of (b) before. By this way:
-
the term A can be treated by means of usual results on univariate kernel regression (see again Vieu 1991 for instance), and the rate of convergence \(C^{st}n^{-2k_{j}/(2k_{j}+1)}\) appears naturally;
-
the term B involves quantities like \(g_{1,\theta_{1}}-\widehat {g}_{1,\theta_{1}}\) which are of lower order than \(n^{-2k_{j}/(2k_{j}+1)}\) because of point (c) before and because of condition (16), which insures that k 2<k 1 .
-
This is enough to prove that (18) holds for j=2, and, in an obvious way, the same reasoning can be done for any j≥2.
1.2 A.2 Proof of Theorem 5
We first give the proof for the two first estimated directions \(\widehat{\theta}_{1}\) and \(\widehat{\theta}_{2}\). Then it quickly extends just by iterating for higher-order directions \(\widehat{\theta}_{j}\). All these proofs are presented in a rather sketched way because they follow standard route on cross-validation.
-
(a)
Case j=1. Because the Functional Single Index Model is a special case (with m=1) of the FPPR studied here, we can directly apply Theorem 1 in Ait-Saidi et al. (2008) to get the result (24) for j=1. We just recall, because it will be helpful in the following, two intermediary results stated in Ait-Saidi et al. (2008). First of all, note that the main line of the proof in Ait-Saidi et al. (2008) consists in showing that uniformly over Θ we have
(30)Secondly, note that one key tool used in Ait-Saidi et al. (2008) was the equivalence between various quadratic measures of errors; in particular, it was stated that uniformly over Θ one has the equivalence
(31) -
(b)
Case j=2. The proof is based on the following decomposition:
(32)with
We will now treat each of the four terms involved in this decomposition.
-
–
The term A 2(θ 2) corresponds exactly to the cross-validation criterion studied in point (a) with a new response variable \(Y-g_{1}( \langle \theta_{1}^{\ast},X \rangle )\), and without any additional computation we can write directly from (30) that, uniformly for θ 2∈Θ,
-
–
Although B 2 does not depend on θ 2 and has no influence on the minimization of the criterion CV 2, we will give its asymptotic expansion because it will be helpful later. For that, note that Theorem 4-(ii) (with j=1), combined with (31) and condition (22), gives directly that
Because k 2<k 1, one gets directly by using again 4-(ii) (but this time with j=2):
(33)It is worth noting that the small o(⋅) involved in this last equation is uniform over Θ (see the discussion at the end of Sect. 6.1).
-
–
The term C 2 can be written as a linear combination
of the centered independent variables
and so, it can be treated by means of standard higher-order moment inequalities. More precisely, this term C 2 has exactly the same structure as the one treated in Lemma 4 of Ait-Saidi et al. (2008), and the proof in this paper can be followed line by line by using, as often as needed, the Cauchy–Schwarz inequality and (33) to bound the terms involving the factors W i . At the end, we easily get that, uniformly over Θ,
(34) -
-
The term D 2(θ 2) can be treated directly by using the Cauchy–Schwarz inequality together with the result (33), and we have, uniformly over Θ,
(35)
At the end, from the results (32)–(35) we arrive at the same expression as (30) before, namely,
(36)Because this last result is uniform over θ 2∈Θ and because the second term in right-hand side of (36) does not depend on θ 2∈Θ, the claimed result (24) is shown for j=2.
-
–
-
(c)
Case j>2. The result (24) can be shown for higher values of j in an obvious similar way.
This proof is finished, but it will be useful for the remaining of the paper to note that we have obtained precisely that, for any j≥1,
This result, being uniform for θ j ∈Θ, can be combined with Theorem 4(ii) to lead for any j≥1 to
1.3 A.3 Proof of Theorem 6
Condition (25) insures that the result (37) holds as well for the penalized criterion PCV as for the standard one CV. That means that we have
and finally (27) insures that, for n large enough,
This is enough to see that, for n large enough,
Rights and permissions
About this article
Cite this article
Ferraty, F., Goia, A., Salinelli, E. et al. Functional projection pursuit regression. TEST 22, 293–320 (2013). https://doi.org/10.1007/s11749-012-0306-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-012-0306-2
Keywords
- Additive decomposition
- Consistency
- Functional predictor
- Functional regression
- Predictive directions
- Projection pursuit regression