Tuning selection for two-scale kernel density estimators

Yu, Xinyang; Wang, Cheng; Yang, Zhongqing; Jiang, Binyan

doi:10.1007/s00180-022-01196-6

Tuning selection for two-scale kernel density estimators

Original paper
Published: 25 January 2022

Volume 37, pages 2231–2247, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Xinyang Yu¹^na1,
Cheng Wang²^na1,
Zhongqing Yang¹ &
…
Binyan Jiang ORCID: orcid.org/0000-0002-1992-4815¹

386 Accesses
1 Citation
Explore all metrics

Abstract

Reducing the bias of kernel density estimators has been a classical topic in nonparametric statistics. Schucany and Sommers (1977) proposed a two-scale estimator which cancelled the lower order bias by subtracting an additional kernel density estimator with a different scale of bandwidth. Different from existing literatures that treat the scale parameter in the two-scale estimator as a static global parameter, in this paper we consider an adaptive scale (i.e., dependent on the data point) so that the theoretical mean squared error can be further reduced. Practically, both the bandwidth and the scale parameter would require tuning, using for example, cross validation. By minimizing the point-wise mean squared error, we derive an approximate equation for the optimal scale parameter, and correspondingly propose to determine the scale parameter by solving an estimated equation. As a result, the only parameter that requires tuning using cross validation is the bandwidth. Point-wise consistency of the proposed estimator for the optimal scale is established with further discussions. The promising performance of the two-scale estimator based on the adaptive variable scale is illustrated via numerical studies on density functions with different shapes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiplicative bias correction for generalized Birnbaum-Saunders kernel density estimators and application to nonnegative heavy tailed data

Article 17 July 2015

Multiplicative bias correction for discrete kernels

Article 02 September 2017

On automatic kernel density estimate-based tests for goodness-of-fit

Article 01 February 2022

References

Abramson, I. S. (1982). On bandwidth variation in kernel estimates-a square root law. The annals of Statistics, pages 1217–1223
Bowman AW (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2):353–360
Article MathSciNet Google Scholar
Breiman L, Meisel W, Purcell E (1977) Variable kernel estimates of multivariate densities. Technometrics 19(2):135–144
Article Google Scholar
Chen Z, Leng C (2016) Dynamic covariance models. J Am Stat Assoc 111(515):1196–1207
Article MathSciNet Google Scholar
Hall P (1990) On the bias of variable bandwidth curve estimators. Biometrika 77(3):529–535
Article MathSciNet Google Scholar
Hall P (2013) The bootstrap and edgeworth expansion. Springer Science & Business Media, New York
MATH Google Scholar
Hansen BE (2009) Lecture notes on nonparametrics. Lecture Notes 5:81
Google Scholar
Igarashi G, Kakizawa Y (2015) Bias corrections for some asymmetric kernel estimators. J Stat Plan Inference 159:37–63
Article MathSciNet Google Scholar
Jenkins MA, Traub JF (1972) Algorithm 419: zeros of a complex polynomial [c2]. Commun ACM 15(2):97–99
Article Google Scholar
Jiang B, Chen Z, Leng C et al (2020) Dynamic linear discriminant analysis in high dimensional space. Bernoulli 26(2):1234–1268
Article MathSciNet Google Scholar
Jones M, Foster P (1993) Generalized jackknifing and higher order kernels. J Nonpara Stat 3(1):81–94
Article MathSciNet Google Scholar
Jones M, Linton O, Nielsen J (1995) A simple bias reduction method for density estimation. Biometrika 82(2):327–338
Article MathSciNet Google Scholar
Kolar M, Song L, Ahmed A, Xing EP et al (2010) Estimating time-varying networks. Annal Appl Stat 4(1):94–123
MathSciNet MATH Google Scholar
Loftsgaarden DO, Quesenberry CP et al (1965) A nonparametric estimate of a multivariate density function. Annal Math Stat 36(3):1049–1051
Article MathSciNet Google Scholar
Mack Y, Rosenblatt M (1979) Multivariate k-nearest neighbor density estimates. J Multivar Anal 9(1):1–15
Article MathSciNet Google Scholar
Rudemo M (1982) Empirical choice of histograms and kernel density estimators. Scandinavian J Stat 3:65–78
MathSciNet MATH Google Scholar
Schucany W, Sommers JP (1977) Improvement of kernel type density estimators. J Am Stat Assoc 72(358):420–423
Article MathSciNet Google Scholar
Schucany WR (1989) On nonparametric regression with higher-order kernels. J Stat Plan Inference 23(2):145–151
Article MathSciNet Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton
MATH Google Scholar
Terrell GR, Scott DW (1992) Variable kernel density estimation. Annal Stat 6:1236–1265
MathSciNet MATH Google Scholar
Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media, New York
MATH Google Scholar
Tukey P, Tukey JW (1981) Data driven view selection, agglomeration, and sharpening. Interpret Multivar Data 61:215–243
Google Scholar
Wand MP, Schucany WR (1990) Gaussian-based kernels. Can J Stat 18(3):197–204
Article MathSciNet Google Scholar
Yao W (2012) A bias corrected nonparametric regression estimator. Stat Prob Lett 82(2):274–282
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the Editor, an Associate Editor, and an anonymous reviewer for their insightful comments. Wang’ research is supported by the National Natural Science Foundation of China (12031005), and NSF of Shanghai (21ZR1432900). Jiang is partially supported by the National Natural Science Foundation of China (12001459), and HKPolyU Internal Grants.

Author information

Yu and Wang contributed equally to this work and should be considered co-first authors.

Authors and Affiliations

Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong, China
Xinyang Yu, Zhongqing Yang & Binyan Jiang
School of Mathematical Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai, China
Cheng Wang

Authors

Xinyang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Binyan Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binyan Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Technical Lemmas and Proofs

Proof of Proposition 1

By Taylor expansion,

$$\begin{aligned}&\mathrm{E}\left[ \hat{f}_{h}(x) -f(x) \right] \\&= \int { K(w)f(wh+x)}dw-f(x)\\&= \int { K(w) f^{(1)}(x)wh }dw+ \int { K(w) \frac{f^{(2)}(x)}{2!}w^2 h^2}dw \nonumber \\&\quad + \int {K(w) \frac{f^{(3)}(x)}{3!}w^3 h^3}dw + \int {K(w) \frac{f^{(4)}(x)}{4!}w^4 h^4 }dw + o\left( h^4\right) \nonumber \\&= h^2 \int { K(w) \frac{f^{(2)}(x)}{2!}w^2 }dw +h^4\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 }dw + o\left( h^4\right) .\nonumber \end{aligned}$$

Similarly, we have

$$\begin{aligned}&E\left[ \hat{f}_{ah}(x) -f(x) \right] \\&= a^2h^2 \int { K(w) \frac{f^{(2)}(x)}{2!}w^2 }dw +a^4h^4\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 }dw + o\left( a^4h^4\right) . \end{aligned}$$

Consequently, we have

$$\begin{aligned}&\mathrm{bias}(\hat{f}_{a_x,h}(x)) = E \left[ \frac{\hat{f}_{h}(x) - a^{-2}\hat{f}_{ah}(x)}{1 - a^{-2}} \right] - f(x)\\&= \frac{\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 h^4 }dw - a^{-2}\int {K(w) \frac{f^{(4)}(x)}{4!}w^4 (ah)^4 }dw}{1 - a^{-2}} + O\left( h^5\right) \\&= -a^2h^4 C_1+O\left( h^5\right) , \end{aligned}$$

where $ C_1 =\frac{f^{(4)}(x)}{4!} \int {K(w)w^4}dw$. Similarly, for the variance, we have

$$\begin{aligned}&\mathrm{Var}(\hat{f}_{a_x,h}(x))\\&= \left( 1 - a^{-2} \right) ^{-2} \left( \mathrm{Var}({\hat{f}_h}(x)) + a^{-4} \mathrm{Var}({\hat{f}_{ah}}(x)) - 2a^{-2} \text {Cov}\left( \hat{f}_h(x), \hat{f}_{ah}(x)\right) \right) \\&= \left( 1 - a^{-2} \right) ^{-2} \left( \frac{f(x)\int { K(w)^2}dw}{nh} + \frac{f(x)\int { K(w)^2}dw}{na^5h} \right. \\&\quad + \left. 2a^{-2} \sum _{i=1}^{n} \sum _{j=1}^{n} \frac{1}{n^2} \text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) + O\left( \frac{1}{n}\right) \right) . \end{aligned}$$

Notice that

$$\begin{aligned}&2a^{-2} \sum _{i=1}^{n} \sum _{j=1}^{n} \frac{1}{n^2} \text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) \\&\qquad = 2a^{-2} \frac{1}{n} E \left[ \text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) \right] . \end{aligned}$$

Since

$$\begin{aligned}&\text {Cov}\left( \frac{1}{h}K\left( \frac{X_i -x}{h} \right) , \frac{1}{ah}K\left( \frac{X_j -x}{ah} \right) \right) \\&\qquad = \frac{1}{a} \int K(w) K\left( \frac{w}{a} \right) f(wh+x) dw + O(1), \end{aligned}$$

and K(u) is bounded, we have

$$\begin{aligned} \int K(w) K\left( \frac{w}{a} \right) f(wh+x) dw =O\Big (\int K(w) f(wh+x) dw \Big )=O(1). \end{aligned}$$

With $C_2 = f(x)\int { K(w)^2}dw$, the variance of $\hat{f}(x)$ can be simplified as:

$$\begin{aligned} \mathrm{Var}(\hat{f}(x)) = \left( 1 - a^{-2} \right) ^{-2} \left( \frac{C_2}{nh} + \frac{C_2}{na^5h} + O\left( \frac{1}{na^3}\right) + O\left( \frac{1}{n}\right) \right) . \end{aligned}$$

Consequently,

$$\begin{aligned}&\mathrm{MSE}(\hat{f}(x))\\&\qquad = \mathrm{bias}^2(\hat{f}(x)) + \mathrm{Var}(\hat{f}(x))\\&\qquad = a^4h^8 C_1^2 + O(h^9)+\left( 1 - a^{-2} \right) ^{-2} \left( \frac{C_2}{nh} + \frac{C_2}{na^5h} + O\left( \frac{1}{na^3}\right) + O\left( \frac{1}{n}\right) \right) . \end{aligned}$$

$\square $

Proof of Theorem 1

For simplicity, we take $C_3 = \frac{C_2}{nh}$. It suffices to show that

$$\begin{aligned} \text {log}\left\{ \left( 1 - a^{-2} \right) ^{-2} \left( C_3 + \frac{C_3}{a^5} \right) \right\} , \end{aligned}$$

is strictly convex in $a>1$. Note that

$$\begin{aligned} \frac{d}{da}\text {log}\left\{ \left( 1 - a^{-2} \right) ^{-2} \left( C_3 + \frac{C_3}{a^5} \right) \right\} = \frac{d}{da}\left\{ \text {log}\left( 1 - a^{-2} \right) ^{-2} + \text {log}\left\{ C_3 + \frac{C_3}{a^5}\right\} \right\} \end{aligned}$$

and for any $a>1$,

$$\begin{aligned} \frac{d^2}{da^2}\text {log}\left\{ \left( 1 - a^{-2} \right) ^{-2}\right\} = \frac{12a^2 - 4}{a^2(a^2-1)^2}> 0, \quad \frac{d^2}{da^2} \left\{ C_3 + \frac{C_3}{a^5} \right\} >0. \end{aligned}$$

We thus conclude that $\widetilde{\mathrm{MSE}}(\hat{f}_{a_x,h}(x))$ is a convex function of a for $a>1$. By setting $\frac{d}{da}\left( \mathrm{MSE}(\hat{f}(x)) \right) = 0$, we obtain the implicit format of the global minimizer:

$$\begin{aligned} \frac{4h^9nC_1^2}{C_2} = \frac{4a^5+5a^2-1}{(a^2-1)^3a^5}. \end{aligned}$$

$\square $

Proof of Theorem 2

For any given x, note the fact that $a^*_x$ is the root of

$$\begin{aligned} \frac{4a_x^5+5a_x^2-1}{(a_x^2-1)^3a_x^5}=\frac{4h^9nC_1^2}{C_2}, \end{aligned}$$

while $h\simeq O\left( n^{-1/9} \right) $, we have $a^*_x-1>c_0$ for some constant $c_0$. By Hansen (2009),

$$\begin{aligned}&\mathrm{bias}\left( \hat{f}^{(4)}_{h_1}(x) \right) = \frac{f^{(6)}(x)h_{1}^2 \int {K(u) u^2}du }{2} + o\left( h_{1}^2\right) , \\&\mathrm{Var}\left( \hat{f}^{(4)}_{h_{1}}(x) \right) = \frac{f(x)\int {K^{(4)}(u)^2}du }{nh_{1}^{9}}+ O\left( \frac{1}{n}\right) ,\\&\mathrm{bias}\left( \hat{f}_{h_{2}}(x) \right) = \frac{f^{(2)}(x)h_{2}^2 \int {K(u) u^2}du }{2} + o\left( h_{2}^2\right) , \\&\mathrm{Var}\left( \hat{f}_{h_{2}}(x) \right) = \frac{f(x)\int {K(u)^2}du }{nh_{2}}+ O\left( \frac{1}{n}\right) . \end{aligned}$$

Let

$$\begin{aligned} l(a_x,C_1,C_2)= 4(1+a_x^5)C_2 + 5(a_x^2-1)C_2 - 4(a_x^2-1)^3a_x^5nh^9C_1^2, \end{aligned}$$

then

$$\begin{aligned}&\frac{\partial l(a_x,C_1,C_2)}{\partial a_x} = 20a_x^4C_2 + 10a_xC_2 - 4\left( 11a_x^2-5 \right) a_x^4\left( a_x^2-1\right) ^2 nh^9C_{1}^2,\\&\frac{\partial l(a_x,C_1,C_2)}{\partial C_1} = -8 \left( a_x^2-1 \right) ^{3}a_x^5 nh^9 C_1,\\&\frac{\partial l(a_x,C_1,C_2)}{\partial C_2} = 4a_x^5+5a_x^2-1. \end{aligned}$$

Under the assumptions of this theorem, there exist positive constants $c_1$ and $c_2$ s.t. $\int {K(u) u^2}du$, $\int {K(u) u^4}du$, $\int {K^{(4)}(u)^2}du$ and $ \int {K(u)^2}du$ are all smaller or equal than $c_1$, and $\max \{$ $f^{(6)}(x)$, $f^{(2)}(x)$, $f(x)\}$ $<c_2$. Then we have

$$\begin{aligned}&\mathrm{bias}( \hat{C}_1 ) \simeq h_1^{2} + o\left( h_1^2\right) ,\quad \mathrm{Var}( \hat{C}_1 ) \le \frac{c^3_1c_2 }{nh_1^{9}}+ O\left( \frac{1}{n}\right) ,\\&\mathrm{bias}( \hat{C}_2 ) \simeq h_2^{2} + o\left( h_2^2\right) ,\quad \mathrm{Var}( \hat{C}_2 ) \le \frac{c^3_1c_2 }{nh_2}+ O\left( \frac{1}{nh_2}\right) . \end{aligned}$$

As $n\rightarrow \infty $, we have

$$\begin{aligned} \left| \hat{C}_1 - \mathrm{E}( \hat{C}_1 ) \right| =O_{p}\left( \sqrt{\frac{1}{nh_1^{9}}} \right) ,\quad \left| \hat{C}_2 - \mathrm{E}( \hat{C}_2 ) \right| =O_{p}\left( \sqrt{\frac{1}{nh_2}}\right) , \end{aligned}$$

thus, with $h_1\simeq O(n^{-1/13})$ and $h_2\simeq O(n^{-1/5})$, we have that

$$\begin{aligned}&\left| \hat{C}_1 -C_1\right| \le \left| \hat{C}_1 - \mathrm{E}( \hat{C}_1 ) \right| + \left| \mathrm{E}( \hat{C}_1 ) - C_1\right| =O_{p}\left( \sqrt{\frac{1}{nh_1^{9}}}+h_1^2 \right) , \\&\left| \hat{C}_2 -C_2\right| \le \left| \hat{C}_2 - \mathrm{E}( \hat{C}_2 ) \right| + \left| \mathrm{E}( \hat{C}_2 ) - C_2\right| = O_{p}\left( \sqrt{\frac{1}{nh_2}}+ h_2^2 \right) . \end{aligned}$$

Consequently, as $n\rightarrow \infty $, for any given x, $\hat{C}_1$ and $\hat{C}_2$ are in the same order with $C_1$ and $C_2$ in probability. Similar to $a_x$, we can get that $\hat{a}_{x}-1>c_3$ for some constant $c_3$.

By Taylor expansion, there exist $(a_{\xi },C_{1,\xi } ,C_{2,\xi } )$ s.t.

$$\begin{aligned}&l(\hat{a}_{x},\hat{C}_1,\hat{C}_2)- l(a_{x}^{*},C_1,C_2) \nonumber \\&= \left( \hat{C}_1 -C_1 \right) \frac{\partial l(a_{\xi },C_{1,\xi },C_{2,\xi }) }{\partial C_1}+ \left( \hat{C}_2 -C_2 \right) \frac{\partial l(a_{\xi },C_{1,\xi },C_{2,\xi } )}{\partial C_2}\nonumber \\&\qquad + \left( \hat{a}_{x} -a_{x}^{*} \right) \frac{\partial l(a_{\xi },C_{1,\xi },C_{2,\xi })}{\partial a_{\xi }}\nonumber \\&=0. \end{aligned}$$

(9)

Note that $a_{\xi }=O(1)$, $C_{1,\xi }=O(1)$ and $C_{2,\xi }=O(1) $, together with equation (9), we have

$$\begin{aligned} \left| \hat{a}_{x} -a_{x}^{*}\right|&=\left| \frac{(4a_{\xi }^5+5a_{\xi }^2-1)\left( \hat{C}_2 -C_2 \right) -8 \left( a_{\xi }^2-1 \right) ^{3}a_{\xi }^5 nh^9 C_{1,\xi }\left( \hat{C}_1 -C_1 \right) }{20a_{\xi }^4C_{2,\xi } + 10a_{\xi }C_{2,\xi } - 4\left( 11a_{\xi }^2-5 \right) a_{\xi }^4\left( a_{\xi }^2-1\right) ^2 nh^9C_{1,\xi }^2}\right| \nonumber \\&= O\left( \frac{\left| \hat{C}_2 -C_2 \right| }{\left( a_{\xi }-1\right) ^2 nh^9}\right) + O\left( \left( a_{\xi }-1\right) \left| \hat{C}_1 -C_1 \right| \right) \nonumber \\&=O_P\left( \frac{h_2^2 }{\left( a_{\xi }-1\right) ^2 nh^9}\right) + O_P\left( \left( a_{\xi }-1\right) h_1^2\right) \nonumber \\&=O_{p}\left( \sqrt{\frac{1}{nh_1^{9}}}+h_1^2 +\sqrt{\frac{1}{nh_2}}+ h_2^2\right) . \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, X., Wang, C., Yang, Z. et al. Tuning selection for two-scale kernel density estimators. Comput Stat 37, 2231–2247 (2022). https://doi.org/10.1007/s00180-022-01196-6

Download citation

Received: 21 May 2021
Accepted: 07 January 2022
Published: 25 January 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00180-022-01196-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tuning selection for two-scale kernel density estimators

Abstract

Access this article

Similar content being viewed by others

Multiplicative bias correction for generalized Birnbaum-Saunders kernel density estimators and application to nonnegative heavy tailed data

Multiplicative bias correction for discrete kernels

On automatic kernel density estimate-based tests for goodness-of-fit

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Technical Lemmas and Proofs

Proof of Proposition 1

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tuning selection for two-scale kernel density estimators

Abstract

Access this article

Similar content being viewed by others

Multiplicative bias correction for generalized Birnbaum-Saunders kernel density estimators and application to nonnegative heavy tailed data

Multiplicative bias correction for discrete kernels

On automatic kernel density estimate-based tests for goodness-of-fit

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Technical Lemmas and Proofs

Appendix: Technical Lemmas and Proofs

Proof of Proposition 1

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation