Functional projection pursuit regression

Ferraty, F.; Goia, A.; Salinelli, E.; Vieu, P.

doi:10.1007/s11749-012-0306-2

Functional projection pursuit regression

Original Paper
Published: 04 August 2012

Volume 22, pages 293–320, (2013)
Cite this article

TEST Aims and scope Submit manuscript

F. Ferraty¹,
A. Goia²,
E. Salinelli² &
…
P. Vieu¹

1379 Accesses
62 Citations
Explore all metrics

Abstract

In this paper we introduce a flexible approach to approximate the regression function in the case of a functional predictor and a scalar response. Following the Projection Pursuit Regression principle, we derive an additive decomposition which exploits the most interesting projections of the prediction variable to explain the response. On one hand, this approach allows us to avoid the well-known curse of dimensionality problem, and, on the other one, it can be used as an exploratory tool for the analysis of functional dataset. The terms of such decomposition are estimated with a procedure that combines a spline approximation and the one-dimensional Nadaraya–Watson approach. The good behavior of our procedure is illustrated from theoretical and practical points of view. Asymptotic results state that the terms in the additive decomposition can be estimated without suffering from the dimensionality problem, while some applications to real and simulated data show the high predictive performance of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

Tucker3-PCovR: The Tucker3 principal covariates regression model

Article Open access 05 April 2024

Elisa Frutos-Bernal, Laura Vicente-González & Jose Luis Vicente-Villardón

Bayesian inference for psychology. Part II: Example applications with JASP

Article Open access 06 July 2017

Eric-Jan Wagenmakers, Jonathon Love, … Richard D. Morey

References

Ait-Saidi A, Ferraty F, Kassa R, Vieu P (2008) Cross-validated estimation in the single-functional index model. Statistics 42:475–494
Article MathSciNet MATH Google Scholar
Amato U, Antoniadis A, De Feis I (2006) Dimension reduction in functional regression with application. Comput Stat Data Anal 50:2422–2446
Article MATH Google Scholar
Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76:1102–1110
Article MATH Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, Berlin
MATH Google Scholar
Cardot H, Sarda P (2005) Estimation in generalized linear models for functional data via penalized likelihood. J Multivar Anal 92:24–41
Article MathSciNet MATH Google Scholar
Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45:11–22
Article MathSciNet MATH Google Scholar
Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591
MathSciNet MATH Google Scholar
Cardot H, Mas A, Sarda P (2007) CLT in functional linear regression models. Probab Theory Relat Fields 138:325–361
Article MathSciNet MATH Google Scholar
Chen H (1991) Estimation of a projection-pursuit type regression model. Ann Stat 19:142–157
Article MATH Google Scholar
Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Ann Stat 37:35–72
Article MathSciNet MATH Google Scholar
De Boor C (2001) A practical guide to splines. Series in probability and statistics. Springer, Berlin
MATH Google Scholar
Eilers PHC, Li B, Marx BD (2009) Multivariate calibration with single-index signal regression. Chemom Intell Lab 96:196–202
Article Google Scholar
Fan J, Gijbels I (2000) Local polynomial fitting. In: Schimek MG (ed) Smoothing and regression. Approaches, computation, and application. Wiley series in probability and statistics
Google Scholar
Febrero-Bande M, Gonzalez-Manteiga W (2011) Generalized additive models for functional data. In: Ferraty F (ed) Recent advanced in functional data analysis and related topics. Physica-Verlag, Heidelberg
Google Scholar
Ferraty F (ed) (2011) Recent advanced in functional data analysis and related topics. Physica-Verlag, Heidelberg
Google Scholar
Ferraty F, Romain Y (2010) Oxford handbook on functional data analysis. Oxford University Press, London
Google Scholar
Ferraty F, Vieu P (2002) The functional nonparametric model and applications to spectrometric data. Comput Stat 17:545–564
Article MathSciNet MATH Google Scholar
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
MATH Google Scholar
Ferraty F, Vieu P (2009) Additive prediction and boosting for functional data. Comput Stat Data Anal 53:1400–1413
Article MathSciNet MATH Google Scholar
Ferraty F, Peuch A, Vieu P (2003) Modèle à indice fonctionnel simple. C R Acad Sci Paris 336:1025–1028
Article MathSciNet MATH Google Scholar
Ferraty F, Hall P, Vieu P (2010a) Most predictive design points for functional data predictor. Biometrika 97:807–824
Article MathSciNet MATH Google Scholar
Ferraty F, Laksaci A, Tadj A, Vieu P (2010b) Rate of uniform consistency for nonparametric estimates with functional variables. J Stat Plan Inference 140:235–260
Article MathSciNet Google Scholar
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823
Article MathSciNet Google Scholar
Gasser T, Kneip A, Koehler W (1991) A flexible and fast method for automatic smoothing. J Am Stat Assoc 86:643–652
Article MATH Google Scholar
Hall P (1989) On projection pursuit regression. Ann Stat 17:573–588
Article MATH Google Scholar
Huber PJ (1985) Projection pursuit. Ann Stat 13:435–475
Article MATH Google Scholar
James G (2002) Generalized linear models with functional predictors. J R Stat Soc B 64:411–432
Article MATH Google Scholar
James GM, Silverman BW (2005) Functional adaptive model estimation. J Am Stat Assoc 100:565–576
Article MathSciNet MATH Google Scholar
Jones MC, Sibson R (1987) What is projection pursuit? J R Stat Soc A 150:1–37
Article MathSciNet MATH Google Scholar
Martens H, Naes T (1991) Multivariate calibration. Wiley, New York
Google Scholar
Műller HG, Yao F (2008) Functional additive model. J Am Stat Assoc 103:1534–1544
Article Google Scholar
Nelder JA, Mead R (1965) A simplex algorithm for function minimization. Comput J 7:308–313
Article MATH Google Scholar
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Google Scholar
Stone C (1982) Optimal global rates of convergence for nonparametric estimators. Ann Stat 10:1040–1053
Article MATH Google Scholar
Vieu P (1991) Quadratic errors for nonparametric estimates under dependence. J Multivar Anal 39:324–347
Article MathSciNet MATH Google Scholar
Vieu P (2002) Data-driven model choice in non parametric regression estimation. Statistics 36:231–246
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is part of the current research advances on functional statistics developed through the working group STAPH in Toulouse (http://www.math.univ-toulouse.fr/staph). The first and fourth authors would like to thank all the participants in the activities of this group for continuous and fruitful support. All the authors want to thank two anonymous referees and the Associate Editor for their pertinent remarks which have deeply improved our paper. All the authors wish also to thank prof. Juyhun Park (Lancaster) for her very fruitful comments and proofreading on an earlier version of this work.

Author information

Authors and Affiliations

Institut de Mathématiques de Toulouse - Équipe de Statistique et Probabilités (ESP), Université Paul Sabatier, 118, route de Narbonne, 31062, Toulouse Cedex, France
F. Ferraty & P. Vieu
Dipartimento di Studi per l’Economia e l’Impresa, Università del Piemonte Orientale “A. Avogadro”, Via Perrone, 18, 28100, Novara, Italy
A. Goia & E. Salinelli

Authors

F. Ferraty
View author publications
You can also search for this author in PubMed Google Scholar
A. Goia
View author publications
You can also search for this author in PubMed Google Scholar
E. Salinelli
View author publications
You can also search for this author in PubMed Google Scholar
P. Vieu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Goia.

Additional information

Communicated by Domingo Morales.

Appendix: Proofs of theoretical results

1.1 A.1 Proof of Theorem 4

For both assertions (i) and (ii), we first give proofs for the first estimated component \(\widehat{g}_{1,\theta_{1}}\), and we then quickly indicate how the results can be iterated for higher-order component \(\widehat{g}_{j,\theta_{j}}\).

(a)
Proof of (i): Case j=1. The problem of estimating the function \(g_{1,\theta _{1}}\) is a standard problem of estimating a regression function between one scalar variable U and a functional variable V valued in some semi-metric space \(({\mathcal{V}},d)\). It corresponds here to U=Y, V=X and for fixed θ ₁ to the semi-metric d(v,v′)=〈θ ₁,v−v′〉. The results in Theorem 2 of Ferraty et al. (2010b) concern this general situation and state that
where the functions \(\varPsi_{\mathcal{C}}\) and \(\phi_{\mathcal{C}}\) control the topological structure induced by the semi-metric d. More precisely, the function \(\phi_{\mathcal{C}}\) is the small ball probability function
which controls the concentration of the variable V (in the sense of the topology induced by d), and the function \(\varPsi_{\mathcal{C}}\) is the Kolmogorov entropy
which controls the complexity of the set \(\mathcal{C}\). The compactness condition (8) on \(\mathcal{C}\) allows us to directly write that for the semi-metric d(v,v′)=〈θ ₁,v−v′〉, we have
while the additional density condition (11) allows us to get
Finally, the claimed result (17) follows directly for j=1 from the bandwidth condition (13).
(b)
Proof of (i): Case j>1. We just present the proof for j=2, and, by induction, it will directly be valid for j=2,…,m. We have the following decomposition:
(29)
with
and
Because \(\mathbb{E} [ Y-g_{1}( \langle \theta_{1},X \rangle )| \langle \theta_{2},X \rangle ] =g_{2}( \langle \theta_{2},X \rangle )\), we have directly by using the same methodology as before in point (a)
On the other hand, using condition (12) on the kernel and then the rate of convergence stated in part (a) of this proof, we directly have that
These two last results are enough to show that assertion (17) is true for j=2, and it extends obviously in an iterative way to any j≥2.
(c)
Proof of (ii): Case j=1. This has been obtained in various earlier papers on nonparametric one-dimensional regression setting (see, for instance, Vieu 1991 for a general form of the result (18)).
(d)
Proof of (ii): Case j>1. We just present the proof for j=2, and, by induction, it will directly be valid for j=2,…,m. It suffices to use again decomposition (29) and to follow step by step the scheme of proof of (b) before. By this way:
- the term A can be treated by means of usual results on univariate kernel regression (see again Vieu 1991 for instance), and the rate of convergence \(C^{st}n^{-2k_{j}/(2k_{j}+1)}\) appears naturally;
- the term B involves quantities like \(g_{1,\theta_{1}}-\widehat {g}_{1,\theta_{1}}\) which are of lower order than \(n^{-2k_{j}/(2k_{j}+1)}\) because of point (c) before and because of condition (16), which insures that k ₂<k ₁ .

This is enough to prove that (18) holds for j=2, and, in an obvious way, the same reasoning can be done for any j≥2.

1.2 A.2 Proof of Theorem 5

We first give the proof for the two first estimated directions \(\widehat{\theta}_{1}\) and \(\widehat{\theta}_{2}\). Then it quickly extends just by iterating for higher-order directions \(\widehat{\theta}_{j}\). All these proofs are presented in a rather sketched way because they follow standard route on cross-validation.

(a)
Case j=1. Because the Functional Single Index Model is a special case (with m=1) of the FPPR studied here, we can directly apply Theorem 1 in Ait-Saidi et al. (2008) to get the result (24) for j=1. We just recall, because it will be helpful in the following, two intermediary results stated in Ait-Saidi et al. (2008). First of all, note that the main line of the proof in Ait-Saidi et al. (2008) consists in showing that uniformly over Θ we have
(30)
Secondly, note that one key tool used in Ait-Saidi et al. (2008) was the equivalence between various quadratic measures of errors; in particular, it was stated that uniformly over Θ one has the equivalence
(31)
(b)
Case j=2. The proof is based on the following decomposition:
(32)
with
We will now treat each of the four terms involved in this decomposition.
1. –
  The term A ₂(θ ₂) corresponds exactly to the cross-validation criterion studied in point (a) with a new response variable \(Y-g_{1}( \langle \theta_{1}^{\ast},X \rangle )\), and without any additional computation we can write directly from (30) that, uniformly for θ ₂∈Θ,
2. –
  Although B ₂ does not depend on θ ₂ and has no influence on the minimization of the criterion CV ₂, we will give its asymptotic expansion because it will be helpful later. For that, note that Theorem 4-(ii) (with j=1), combined with (31) and condition (22), gives directly that
  Because k ₂<k ₁, one gets directly by using again 4-(ii) (but this time with j=2):
  (33)
  It is worth noting that the small o(⋅) involved in this last equation is uniform over Θ (see the discussion at the end of Sect. 6.1).
3. –
  The term C ₂ can be written as a linear combination
  of the centered independent variables
  and so, it can be treated by means of standard higher-order moment inequalities. More precisely, this term C ₂ has exactly the same structure as the one treated in Lemma 4 of Ait-Saidi et al. (2008), and the proof in this paper can be followed line by line by using, as often as needed, the Cauchy–Schwarz inequality and (33) to bound the terms involving the factors W _i. At the end, we easily get that, uniformly over Θ,
  (34)
4. -
  The term D ₂(θ ₂) can be treated directly by using the Cauchy–Schwarz inequality together with the result (33), and we have, uniformly over Θ,
  (35)
At the end, from the results (32)–(35) we arrive at the same expression as (30) before, namely,
(36)
Because this last result is uniform over θ ₂∈Θ and because the second term in right-hand side of (36) does not depend on θ ₂∈Θ, the claimed result (24) is shown for j=2.
(c)
Case j>2. The result (24) can be shown for higher values of j in an obvious similar way.

This proof is finished, but it will be useful for the remaining of the paper to note that we have obtained precisely that, for any j≥1,

This result, being uniform for θ _j∈Θ, can be combined with Theorem 4(ii) to lead for any j≥1 to

(37)

1.3 A.3 Proof of Theorem 6

Condition (25) insures that the result (37) holds as well for the penalized criterion PCV as for the standard one CV. That means that we have

and finally (27) insures that, for n large enough,

This is enough to see that, for n large enough,

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferraty, F., Goia, A., Salinelli, E. et al. Functional projection pursuit regression. TEST 22, 293–320 (2013). https://doi.org/10.1007/s11749-012-0306-2

Download citation

Received: 23 June 2011
Accepted: 16 July 2012
Published: 04 August 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11749-012-0306-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Functional projection pursuit regression

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Tucker3-PCovR: The Tucker3 principal covariates regression model

Bayesian inference for psychology. Part II: Example applications with JASP

References

Acknowledgements