Skip to main content
Log in

Tuning selection for two-scale kernel density estimators

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Reducing the bias of kernel density estimators has been a classical topic in nonparametric statistics. Schucany and Sommers (1977) proposed a two-scale estimator which cancelled the lower order bias by subtracting an additional kernel density estimator with a different scale of bandwidth. Different from existing literatures that treat the scale parameter in the two-scale estimator as a static global parameter, in this paper we consider an adaptive scale (i.e., dependent on the data point) so that the theoretical mean squared error can be further reduced. Practically, both the bandwidth and the scale parameter would require tuning, using for example, cross validation. By minimizing the point-wise mean squared error, we derive an approximate equation for the optimal scale parameter, and correspondingly propose to determine the scale parameter by solving an estimated equation. As a result, the only parameter that requires tuning using cross validation is the bandwidth. Point-wise consistency of the proposed estimator for the optimal scale is established with further discussions. The promising performance of the two-scale estimator based on the adaptive variable scale is illustrated via numerical studies on density functions with different shapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abramson, I. S. (1982). On bandwidth variation in kernel estimates-a square root law. The annals of Statistics, pages 1217–1223

  • Bowman AW (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2):353–360

    Article  MathSciNet  Google Scholar 

  • Breiman L, Meisel W, Purcell E (1977) Variable kernel estimates of multivariate densities. Technometrics 19(2):135–144

    Article  Google Scholar 

  • Chen Z, Leng C (2016) Dynamic covariance models. J Am Stat Assoc 111(515):1196–1207

    Article  MathSciNet  Google Scholar 

  • Hall P (1990) On the bias of variable bandwidth curve estimators. Biometrika 77(3):529–535

    Article  MathSciNet  Google Scholar 

  • Hall P (2013) The bootstrap and edgeworth expansion. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Hansen BE (2009) Lecture notes on nonparametrics. Lecture Notes 5:81

    Google Scholar 

  • Igarashi G, Kakizawa Y (2015) Bias corrections for some asymmetric kernel estimators. J Stat Plan Inference 159:37–63

    Article  MathSciNet  Google Scholar 

  • Jenkins MA, Traub JF (1972) Algorithm 419: zeros of a complex polynomial [c2]. Commun ACM 15(2):97–99

    Article  Google Scholar 

  • Jiang B, Chen Z, Leng C et al (2020) Dynamic linear discriminant analysis in high dimensional space. Bernoulli 26(2):1234–1268

    Article  MathSciNet  Google Scholar 

  • Jones M, Foster P (1993) Generalized jackknifing and higher order kernels. J Nonpara Stat 3(1):81–94

    Article  MathSciNet  Google Scholar 

  • Jones M, Linton O, Nielsen J (1995) A simple bias reduction method for density estimation. Biometrika 82(2):327–338

    Article  MathSciNet  Google Scholar 

  • Kolar M, Song L, Ahmed A, Xing EP et al (2010) Estimating time-varying networks. Annal Appl Stat 4(1):94–123

    MathSciNet  MATH  Google Scholar 

  • Loftsgaarden DO, Quesenberry CP et al (1965) A nonparametric estimate of a multivariate density function. Annal Math Stat 36(3):1049–1051

    Article  MathSciNet  Google Scholar 

  • Mack Y, Rosenblatt M (1979) Multivariate k-nearest neighbor density estimates. J Multivar Anal 9(1):1–15

    Article  MathSciNet  Google Scholar 

  • Rudemo M (1982) Empirical choice of histograms and kernel density estimators. Scandinavian J Stat 3:65–78

    MathSciNet  MATH  Google Scholar 

  • Schucany W, Sommers JP (1977) Improvement of kernel type density estimators. J Am Stat Assoc 72(358):420–423

    Article  MathSciNet  Google Scholar 

  • Schucany WR (1989) On nonparametric regression with higher-order kernels. J Stat Plan Inference 23(2):145–151

    Article  MathSciNet  Google Scholar 

  • Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Terrell GR, Scott DW (1992) Variable kernel density estimation. Annal Stat 6:1236–1265

    MathSciNet  MATH  Google Scholar 

  • Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Tukey P, Tukey JW (1981) Data driven view selection, agglomeration, and sharpening. Interpret Multivar Data 61:215–243

    Google Scholar 

  • Wand MP, Schucany WR (1990) Gaussian-based kernels. Can J Stat 18(3):197–204

    Article  MathSciNet  Google Scholar 

  • Yao W (2012) A bias corrected nonparametric regression estimator. Stat Prob Lett 82(2):274–282

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the Editor, an Associate Editor, and an anonymous reviewer for their insightful comments. Wang’ research is supported by the National Natural Science Foundation of China (12031005), and NSF of Shanghai (21ZR1432900). Jiang is partially supported by the National Natural Science Foundation of China (12001459), and HKPolyU Internal Grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binyan Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Technical Lemmas and Proofs

Appendix: Technical Lemmas and Proofs

Proof of Proposition 1

By Taylor expansion,

$$\begin{aligned}&\mathrm{E}\left[ \hat{f}_{h}(x) -f(x) \right] \\&= \int { K(w)f(wh+x)}dw-f(x)\\&= \int { K(w) f^{(1)}(x)wh }dw+ \int { K(w) \frac{f^{(2)}(x)}{2!}w^2 h^2}dw \nonumber \\&\quad + \int {K(w) \frac{f^{(3)}(x)}{3!}w^3 h^3}dw + \int {K(w) \frac{f^{(4)}(x)}{4!}w^4 h^4 }dw + o\left( h^4\right) \nonumber \\&= h^2 \int { K(w) \frac{f^{(2)}(x)}{2!}w^2 }dw +h^4\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 }dw + o\left( h^4\right) .\nonumber \end{aligned}$$

Similarly, we have

$$\begin{aligned}&E\left[ \hat{f}_{ah}(x) -f(x) \right] \\&= a^2h^2 \int { K(w) \frac{f^{(2)}(x)}{2!}w^2 }dw +a^4h^4\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 }dw + o\left( a^4h^4\right) . \end{aligned}$$

Consequently, we have

$$\begin{aligned}&\mathrm{bias}(\hat{f}_{a_x,h}(x)) = E \left[ \frac{\hat{f}_{h}(x) - a^{-2}\hat{f}_{ah}(x)}{1 - a^{-2}} \right] - f(x)\\&= \frac{\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 h^4 }dw - a^{-2}\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 (ah)^4 }dw}{1 - a^{-2}} + O\left( h^5\right) \\&= -a^2h^4 C_1+O\left( h^5\right) , \end{aligned}$$

where \( C_1 =\frac{f^{(4)}(x)}{4!} \int {K(w)w^4}dw\). Similarly, for the variance, we have

$$\begin{aligned}&\mathrm{Var}(\hat{f}_{a_x,h}(x))\\&= \left( 1 - a^{-2} \right) ^{-2} \left( \mathrm{Var}({\hat{f}_h}(x)) + a^{-4} \mathrm{Var}({\hat{f}_{ah}}(x)) - 2a^{-2} \text {Cov}\left( \hat{f}_h(x), \hat{f}_{ah}(x)\right) \right) \\&= \left( 1 - a^{-2} \right) ^{-2} \left( \frac{f(x)\int { K(w)^2}dw}{nh} + \frac{f(x)\int { K(w)^2}dw}{na^5h} \right. \\&\quad + \left. 2a^{-2} \sum _{i=1}^{n} \sum _{j=1}^{n} \frac{1}{n^2} \text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) + O\left( \frac{1}{n}\right) \right) . \end{aligned}$$

Notice that

$$\begin{aligned}&2a^{-2} \sum _{i=1}^{n} \sum _{j=1}^{n} \frac{1}{n^2} \text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) \\&\qquad = 2a^{-2} \frac{1}{n} E \left[ \text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) \right] . \end{aligned}$$

Since

$$\begin{aligned}&\text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) \\&\qquad = \frac{1}{a} \int K(w) K\left( \frac{w}{a} \right) f(wh+x) dw + O(1), \end{aligned}$$

and K(u) is bounded, we have

$$\begin{aligned} \int K(w) K\left( \frac{w}{a} \right) f(wh+x) dw =O\Big (\int K(w) f(wh+x) dw \Big )=O(1). \end{aligned}$$

With \(C_2 = f(x)\int { K(w)^2}dw\), the variance of \(\hat{f}(x)\) can be simplified as:

$$\begin{aligned} \mathrm{Var}(\hat{f}(x)) = \left( 1 - a^{-2} \right) ^{-2} \left( \frac{C_2}{nh} + \frac{C_2}{na^5h} + O\left( \frac{1}{na^3}\right) + O\left( \frac{1}{n}\right) \right) . \end{aligned}$$

Consequently,

$$\begin{aligned}&\mathrm{MSE}(\hat{f}(x))\\&\qquad = \mathrm{bias}^2(\hat{f}(x)) + \mathrm{Var}(\hat{f}(x))\\&\qquad = a^4h^8 C_1^2 + O(h^9)+\left( 1 - a^{-2} \right) ^{-2} \left( \frac{C_2}{nh} + \frac{C_2}{na^5h} + O\left( \frac{1}{na^3}\right) + O\left( \frac{1}{n}\right) \right) . \end{aligned}$$

\(\square \)

Proof of Theorem 1

For simplicity, we take \(C_3 = \frac{C_2}{nh}\). It suffices to show that

$$\begin{aligned} \text {log}\left\{ \left( 1 - a^{-2} \right) ^{-2} \left( C_3 + \frac{C_3}{a^5} \right) \right\} , \end{aligned}$$

is strictly convex in \(a>1\). Note that

$$\begin{aligned} \frac{d}{da}\text {log}\left\{ \left( 1 - a^{-2} \right) ^{-2} \left( C_3 + \frac{C_3}{a^5} \right) \right\} = \frac{d}{da}\left\{ \text {log}\left( 1 - a^{-2} \right) ^{-2} + \text {log}\left\{ C_3 + \frac{C_3}{a^5}\right\} \right\} \end{aligned}$$

and for any \(a>1\),

$$\begin{aligned} \frac{d^2}{da^2}\text {log}\left\{ \left( 1 - a^{-2} \right) ^{-2}\right\} = \frac{12a^2 - 4}{a^2(a^2-1)^2}> 0, \quad \frac{d^2}{da^2} \left\{ C_3 + \frac{C_3}{a^5} \right\} >0. \end{aligned}$$

We thus conclude that \(\widetilde{\mathrm{MSE}}(\hat{f}_{a_x,h}(x))\) is a convex function of a for \(a>1\). By setting \(\frac{d}{da}\left( \mathrm{MSE}(\hat{f}(x)) \right) = 0\), we obtain the implicit format of the global minimizer:

$$\begin{aligned} \frac{4h^9nC_1^2}{C_2} = \frac{4a^5+5a^2-1}{(a^2-1)^3a^5}. \end{aligned}$$

\(\square \)

Proof of Theorem 2

For any given x, note the fact that \(a^*_x\) is the root of

$$\begin{aligned} \frac{4a_x^5+5a_x^2-1}{(a_x^2-1)^3a_x^5}=\frac{4h^9nC_1^2}{C_2}, \end{aligned}$$

while \(h\simeq O\left( n^{-1/9} \right) \), we have \(a^*_x-1>c_0\) for some constant \(c_0\). By Hansen (2009),

$$\begin{aligned}&\mathrm{bias}\left( \hat{f}^{(4)}_{h_1}(x) \right) = \frac{f^{(6)}(x)h_{1}^2 \int {K(u) u^2}du }{2} + o\left( h_{1}^2\right) , \\&\mathrm{Var}\left( \hat{f}^{(4)}_{h_{1}}(x) \right) = \frac{f(x)\int {K^{(4)}(u)^2}du }{nh_{1}^{9}}+ O\left( \frac{1}{n}\right) ,\\&\mathrm{bias}\left( \hat{f}_{h_{2}}(x) \right) = \frac{f^{(2)}(x)h_{2}^2 \int {K(u) u^2}du }{2} + o\left( h_{2}^2\right) , \\&\mathrm{Var}\left( \hat{f}_{h_{2}}(x) \right) = \frac{f(x)\int {K(u)^2}du }{nh_{2}}+ O\left( \frac{1}{n}\right) . \end{aligned}$$

Let

$$\begin{aligned} l(a_x,C_1,C_2)= 4(1+a_x^5)C_2 + 5(a_x^2-1)C_2 - 4(a_x^2-1)^3a_x^5nh^9C_1^2, \end{aligned}$$

then

$$\begin{aligned}&\frac{\partial l(a_x,C_1,C_2)}{\partial a_x} = 20a_x^4C_2 + 10a_xC_2 - 4\left( 11a_x^2-5 \right) a_x^4\left( a_x^2-1\right) ^2 nh^9C_{1}^2,\\&\frac{\partial l(a_x,C_1,C_2)}{\partial C_1} = -8 \left( a_x^2-1 \right) ^{3}a_x^5 nh^9 C_1,\\&\frac{\partial l(a_x,C_1,C_2)}{\partial C_2} = 4a_x^5+5a_x^2-1. \end{aligned}$$

Under the assumptions of this theorem, there exist positive constants \(c_1\) and \(c_2\) s.t. \(\int {K(u) u^2}du\), \(\int {K(u) u^4}du\), \(\int {K^{(4)}(u)^2}du\) and \( \int {K(u)^2}du\) are all smaller or equal than \(c_1\), and \(\max \{\) \(f^{(6)}(x)\), \(f^{(2)}(x)\), \(f(x)\}\) \(<c_2\). Then we have

$$\begin{aligned}&\mathrm{bias}( \hat{C}_1 ) \simeq h_1^{2} + o\left( h_1^2\right) ,\quad \mathrm{Var}( \hat{C}_1 ) \le \frac{c^3_1c_2 }{nh_1^{9}}+ O\left( \frac{1}{n}\right) ,\\&\mathrm{bias}( \hat{C}_2 ) \simeq h_2^{2} + o\left( h_2^2\right) ,\quad \mathrm{Var}( \hat{C}_2 ) \le \frac{c^3_1c_2 }{nh_2}+ O\left( \frac{1}{nh_2}\right) . \end{aligned}$$

As \(n\rightarrow \infty \), we have

$$\begin{aligned} \left| \hat{C}_1 - \mathrm{E}( \hat{C}_1 ) \right| =O_{p}\left( \sqrt{\frac{1}{nh_1^{9}}} \right) ,\quad \left| \hat{C}_2 - \mathrm{E}( \hat{C}_2 ) \right| =O_{p}\left( \sqrt{\frac{1}{nh_2}}\right) , \end{aligned}$$

thus, with \(h_1\simeq O(n^{-1/13})\) and \(h_2\simeq O(n^{-1/5})\), we have that

$$\begin{aligned}&\left| \hat{C}_1 -C_1\right| \le \left| \hat{C}_1 - \mathrm{E}( \hat{C}_1 ) \right| + \left| \mathrm{E}( \hat{C}_1 ) - C_1\right| =O_{p}\left( \sqrt{\frac{1}{nh_1^{9}}}+h_1^2 \right) , \\&\left| \hat{C}_2 -C_2\right| \le \left| \hat{C}_2 - \mathrm{E}( \hat{C}_2 ) \right| + \left| \mathrm{E}( \hat{C}_2 ) - C_2\right| = O_{p}\left( \sqrt{\frac{1}{nh_2}}+ h_2^2 \right) . \end{aligned}$$

Consequently, as \(n\rightarrow \infty \), for any given x, \(\hat{C}_1\) and \(\hat{C}_2\) are in the same order with \(C_1\) and \(C_2\) in probability. Similar to \(a_x\), we can get that \(\hat{a}_{x}-1>c_3\) for some constant \(c_3\).

By Taylor expansion, there exist \((a_{\xi },C_{1,\xi } ,C_{2,\xi } )\) s.t.

$$\begin{aligned}&l(\hat{a}_{x},\hat{C}_1,\hat{C}_2)- l(a_{x}^{*},C_1,C_2) \nonumber \\&= \left( \hat{C}_1 -C_1 \right) \frac{\partial l(a_{\xi },C_{1,\xi },C_{2,\xi }) }{\partial C_1}+ \left( \hat{C}_2 -C_2 \right) \frac{\partial l(a_{\xi },C_{1,\xi },C_{2,\xi } )}{\partial C_2}\nonumber \\&\qquad + \left( \hat{a}_{x} -a_{x}^{*} \right) \frac{\partial l(a_{\xi },C_{1,\xi },C_{2,\xi })}{\partial a_{\xi }}\nonumber \\&=0. \end{aligned}$$
(9)

Note that \(a_{\xi }=O(1)\), \(C_{1,\xi }=O(1)\) and \(C_{2,\xi }=O(1) \), together with equation (9), we have

$$\begin{aligned} \left| \hat{a}_{x} -a_{x}^{*}\right|&=\left| \frac{(4a_{\xi }^5+5a_{\xi }^2-1)\left( \hat{C}_2 -C_2 \right) -8 \left( a_{\xi }^2-1 \right) ^{3}a_{\xi }^5 nh^9 C_{1,\xi }\left( \hat{C}_1 -C_1 \right) }{20a_{\xi }^4C_{2,\xi } + 10a_{\xi }C_{2,\xi } - 4\left( 11a_{\xi }^2-5 \right) a_{\xi }^4\left( a_{\xi }^2-1\right) ^2 nh^9C_{1,\xi }^2}\right| \nonumber \\&= O\left( \frac{\left| \hat{C}_2 -C_2 \right| }{\left( a_{\xi }-1\right) ^2 nh^9}\right) + O\left( \left( a_{\xi }-1\right) \left| \hat{C}_1 -C_1 \right| \right) \nonumber \\&=O_P\left( \frac{h_2^2 }{\left( a_{\xi }-1\right) ^2 nh^9}\right) + O_P\left( \left( a_{\xi }-1\right) h_1^2\right) \nonumber \\&=O_{p}\left( \sqrt{\frac{1}{nh_1^{9}}}+h_1^2 +\sqrt{\frac{1}{nh_2}}+ h_2^2\right) . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, X., Wang, C., Yang, Z. et al. Tuning selection for two-scale kernel density estimators. Comput Stat 37, 2231–2247 (2022). https://doi.org/10.1007/s00180-022-01196-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01196-6

Keywords

Navigation