Skip to main content
Log in

Functional projection pursuit regression

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In this paper we introduce a flexible approach to approximate the regression function in the case of a functional predictor and a scalar response. Following the Projection Pursuit Regression principle, we derive an additive decomposition which exploits the most interesting projections of the prediction variable to explain the response. On one hand, this approach allows us to avoid the well-known curse of dimensionality problem, and, on the other one, it can be used as an exploratory tool for the analysis of functional dataset. The terms of such decomposition are estimated with a procedure that combines a spline approximation and the one-dimensional Nadaraya–Watson approach. The good behavior of our procedure is illustrated from theoretical and practical points of view. Asymptotic results state that the terms in the additive decomposition can be estimated without suffering from the dimensionality problem, while some applications to real and simulated data show the high predictive performance of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ait-Saidi A, Ferraty F, Kassa R, Vieu P (2008) Cross-validated estimation in the single-functional index model. Statistics 42:475–494

    Article  MathSciNet  MATH  Google Scholar 

  • Amato U, Antoniadis A, De Feis I (2006) Dimension reduction in functional regression with application. Comput Stat Data Anal 50:2422–2446

    Article  MATH  Google Scholar 

  • Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76:1102–1110

    Article  MATH  Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • Cardot H, Sarda P (2005) Estimation in generalized linear models for functional data via penalized likelihood. J Multivar Anal 92:24–41

    Article  MathSciNet  MATH  Google Scholar 

  • Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45:11–22

    Article  MathSciNet  MATH  Google Scholar 

  • Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591

    MathSciNet  MATH  Google Scholar 

  • Cardot H, Mas A, Sarda P (2007) CLT in functional linear regression models. Probab Theory Relat Fields 138:325–361

    Article  MathSciNet  MATH  Google Scholar 

  • Chen H (1991) Estimation of a projection-pursuit type regression model. Ann Stat 19:142–157

    Article  MATH  Google Scholar 

  • Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Ann Stat 37:35–72

    Article  MathSciNet  MATH  Google Scholar 

  • De Boor C (2001) A practical guide to splines. Series in probability and statistics. Springer, Berlin

    MATH  Google Scholar 

  • Eilers PHC, Li B, Marx BD (2009) Multivariate calibration with single-index signal regression. Chemom Intell Lab 96:196–202

    Article  Google Scholar 

  • Fan J, Gijbels I (2000) Local polynomial fitting. In: Schimek MG (ed) Smoothing and regression. Approaches, computation, and application. Wiley series in probability and statistics

    Google Scholar 

  • Febrero-Bande M, Gonzalez-Manteiga W (2011) Generalized additive models for functional data. In: Ferraty F (ed) Recent advanced in functional data analysis and related topics. Physica-Verlag, Heidelberg

    Google Scholar 

  • Ferraty F (ed) (2011) Recent advanced in functional data analysis and related topics. Physica-Verlag, Heidelberg

    Google Scholar 

  • Ferraty F, Romain Y (2010) Oxford handbook on functional data analysis. Oxford University Press, London

    Google Scholar 

  • Ferraty F, Vieu P (2002) The functional nonparametric model and applications to spectrometric data. Comput Stat 17:545–564

    Article  MathSciNet  MATH  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York

    MATH  Google Scholar 

  • Ferraty F, Vieu P (2009) Additive prediction and boosting for functional data. Comput Stat Data Anal 53:1400–1413

    Article  MathSciNet  MATH  Google Scholar 

  • Ferraty F, Peuch A, Vieu P (2003) Modèle à indice fonctionnel simple. C R Acad Sci Paris 336:1025–1028

    Article  MathSciNet  MATH  Google Scholar 

  • Ferraty F, Hall P, Vieu P (2010a) Most predictive design points for functional data predictor. Biometrika 97:807–824

    Article  MathSciNet  MATH  Google Scholar 

  • Ferraty F, Laksaci A, Tadj A, Vieu P (2010b) Rate of uniform consistency for nonparametric estimates with functional variables. J Stat Plan Inference 140:235–260

    Article  MathSciNet  Google Scholar 

  • Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823

    Article  MathSciNet  Google Scholar 

  • Gasser T, Kneip A, Koehler W (1991) A flexible and fast method for automatic smoothing. J Am Stat Assoc 86:643–652

    Article  MATH  Google Scholar 

  • Hall P (1989) On projection pursuit regression. Ann Stat 17:573–588

    Article  MATH  Google Scholar 

  • Huber PJ (1985) Projection pursuit. Ann Stat 13:435–475

    Article  MATH  Google Scholar 

  • James G (2002) Generalized linear models with functional predictors. J R Stat Soc B 64:411–432

    Article  MATH  Google Scholar 

  • James GM, Silverman BW (2005) Functional adaptive model estimation. J Am Stat Assoc 100:565–576

    Article  MathSciNet  MATH  Google Scholar 

  • Jones MC, Sibson R (1987) What is projection pursuit? J R Stat Soc A 150:1–37

    Article  MathSciNet  MATH  Google Scholar 

  • Martens H, Naes T (1991) Multivariate calibration. Wiley, New York

    Google Scholar 

  • Műller HG, Yao F (2008) Functional additive model. J Am Stat Assoc 103:1534–1544

    Article  Google Scholar 

  • Nelder JA, Mead R (1965) A simplex algorithm for function minimization. Comput J 7:308–313

    Article  MATH  Google Scholar 

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

    Google Scholar 

  • Stone C (1982) Optimal global rates of convergence for nonparametric estimators. Ann Stat 10:1040–1053

    Article  MATH  Google Scholar 

  • Vieu P (1991) Quadratic errors for nonparametric estimates under dependence. J Multivar Anal 39:324–347

    Article  MathSciNet  MATH  Google Scholar 

  • Vieu P (2002) Data-driven model choice in non parametric regression estimation. Statistics 36:231–246

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is part of the current research advances on functional statistics developed through the working group STAPH in Toulouse (http://www.math.univ-toulouse.fr/staph). The first and fourth authors would like to thank all the participants in the activities of this group for continuous and fruitful support. All the authors want to thank two anonymous referees and the Associate Editor for their pertinent remarks which have deeply improved our paper. All the authors wish also to thank prof. Juyhun Park (Lancaster) for her very fruitful comments and proofreading on an earlier version of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Goia.

Additional information

Communicated by Domingo Morales.

Appendix: Proofs of theoretical results

Appendix: Proofs of theoretical results

1.1 A.1 Proof of Theorem 4

For both assertions (i) and (ii), we first give proofs for the first estimated component \(\widehat{g}_{1,\theta_{1}}\), and we then quickly indicate how the results can be iterated for higher-order component \(\widehat{g}_{j,\theta_{j}}\).

  1. (a)

    Proof of (i): Case j=1. The problem of estimating the function \(g_{1,\theta _{1}}\) is a standard problem of estimating a regression function between one scalar variable U and a functional variable V valued in some semi-metric space \(({\mathcal{V}},d)\). It corresponds here to U=Y, V=X and for fixed θ 1 to the semi-metric d(v,v′)=〈θ 1,vv′〉. The results in Theorem 2 of Ferraty et al. (2010b) concern this general situation and state that

    where the functions \(\varPsi_{\mathcal{C}}\) and \(\phi_{\mathcal{C}}\) control the topological structure induced by the semi-metric d. More precisely, the function \(\phi_{\mathcal{C}}\) is the small ball probability function

    which controls the concentration of the variable V (in the sense of the topology induced by d), and the function \(\varPsi_{\mathcal{C}}\) is the Kolmogorov entropy

    which controls the complexity of the set \(\mathcal{C}\). The compactness condition (8) on \(\mathcal{C}\) allows us to directly write that for the semi-metric d(v,v′)=〈θ 1,vv′〉, we have

    while the additional density condition (11) allows us to get

    Finally, the claimed result (17) follows directly for j=1 from the bandwidth condition (13).

  2. (b)

    Proof of (i): Case j>1. We just present the proof for j=2, and, by induction, it will directly be valid for j=2,…,m. We have the following decomposition:

    (29)

    with

    and

    Because \(\mathbb{E} [ Y-g_{1}( \langle \theta_{1},X \rangle )| \langle \theta_{2},X \rangle ] =g_{2}( \langle \theta_{2},X \rangle )\), we have directly by using the same methodology as before in point (a)

    On the other hand, using condition (12) on the kernel and then the rate of convergence stated in part (a) of this proof, we directly have that

    These two last results are enough to show that assertion (17) is true for j=2, and it extends obviously in an iterative way to any j≥2.

  3. (c)

    Proof of (ii): Case j=1. This has been obtained in various earlier papers on nonparametric one-dimensional regression setting (see, for instance, Vieu 1991 for a general form of the result (18)).

  4. (d)

    Proof of (ii): Case j>1. We just present the proof for j=2, and, by induction, it will directly be valid for j=2,…,m. It suffices to use again decomposition (29) and to follow step by step the scheme of proof of (b) before. By this way:

    • the term A can be treated by means of usual results on univariate kernel regression (see again Vieu 1991 for instance), and the rate of convergence \(C^{st}n^{-2k_{j}/(2k_{j}+1)}\) appears naturally;

    • the term B involves quantities like \(g_{1,\theta_{1}}-\widehat {g}_{1,\theta_{1}}\) which are of lower order than \(n^{-2k_{j}/(2k_{j}+1)}\) because of point (c) before and because of condition (16), which insures that k 2<k 1 .

This is enough to prove that (18) holds for j=2, and, in an obvious way, the same reasoning can be done for any j≥2.

1.2 A.2 Proof of Theorem 5

We first give the proof for the two first estimated directions \(\widehat{\theta}_{1}\) and \(\widehat{\theta}_{2}\). Then it quickly extends just by iterating for higher-order directions \(\widehat{\theta}_{j}\). All these proofs are presented in a rather sketched way because they follow standard route on cross-validation.

  1. (a)

    Case j=1. Because the Functional Single Index Model is a special case (with m=1) of the FPPR studied here, we can directly apply Theorem 1 in Ait-Saidi et al. (2008) to get the result (24) for j=1. We just recall, because it will be helpful in the following, two intermediary results stated in Ait-Saidi et al. (2008). First of all, note that the main line of the proof in Ait-Saidi et al. (2008) consists in showing that uniformly over Θ we have

    (30)

    Secondly, note that one key tool used in Ait-Saidi et al. (2008) was the equivalence between various quadratic measures of errors; in particular, it was stated that uniformly over Θ one has the equivalence

    (31)
  2. (b)

    Case j=2. The proof is based on the following decomposition:

    (32)

    with

    We will now treat each of the four terms involved in this decomposition.

    1. The term A 2(θ 2) corresponds exactly to the cross-validation criterion studied in point (a) with a new response variable \(Y-g_{1}( \langle \theta_{1}^{\ast},X \rangle )\), and without any additional computation we can write directly from (30) that, uniformly for θ 2Θ,

    2. Although B 2 does not depend on θ 2 and has no influence on the minimization of the criterion CV 2, we will give its asymptotic expansion because it will be helpful later. For that, note that Theorem 4-(ii) (with j=1), combined with (31) and condition (22), gives directly that

      Because k 2<k 1, one gets directly by using again 4-(ii) (but this time with j=2):

      (33)

      It is worth noting that the small o(⋅) involved in this last equation is uniform over Θ (see the discussion at the end of Sect. 6.1).

    3. The term C 2 can be written as a linear combination

      of the centered independent variables

      and so, it can be treated by means of standard higher-order moment inequalities. More precisely, this term C 2 has exactly the same structure as the one treated in Lemma 4 of Ait-Saidi et al. (2008), and the proof in this paper can be followed line by line by using, as often as needed, the Cauchy–Schwarz inequality and (33) to bound the terms involving the factors W i . At the end, we easily get that, uniformly over Θ,

      (34)
    4. -

      The term D 2(θ 2) can be treated directly by using the Cauchy–Schwarz inequality together with the result (33), and we have, uniformly over Θ,

      (35)

    At the end, from the results (32)–(35) we arrive at the same expression as (30) before, namely,

    (36)

    Because this last result is uniform over θ 2Θ and because the second term in right-hand side of (36) does not depend on θ 2Θ, the claimed result (24) is shown for j=2.

  3. (c)

    Case j>2. The result (24) can be shown for higher values of j in an obvious similar way.

This proof is finished, but it will be useful for the remaining of the paper to note that we have obtained precisely that, for any j≥1,

This result, being uniform for θ j Θ, can be combined with Theorem 4(ii) to lead for any j≥1 to

(37)

1.3 A.3 Proof of Theorem 6

Condition (25) insures that the result (37) holds as well for the penalized criterion PCV as for the standard one CV. That means that we have

and finally (27) insures that, for n large enough,

This is enough to see that, for n large enough,

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferraty, F., Goia, A., Salinelli, E. et al. Functional projection pursuit regression. TEST 22, 293–320 (2013). https://doi.org/10.1007/s11749-012-0306-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-012-0306-2

Keywords

Mathematics Subject Classification

Navigation