Abstract
Reducing the bias of kernel density estimators has been a classical topic in nonparametric statistics. Schucany and Sommers (1977) proposed a two-scale estimator which cancelled the lower order bias by subtracting an additional kernel density estimator with a different scale of bandwidth. Different from existing literatures that treat the scale parameter in the two-scale estimator as a static global parameter, in this paper we consider an adaptive scale (i.e., dependent on the data point) so that the theoretical mean squared error can be further reduced. Practically, both the bandwidth and the scale parameter would require tuning, using for example, cross validation. By minimizing the point-wise mean squared error, we derive an approximate equation for the optimal scale parameter, and correspondingly propose to determine the scale parameter by solving an estimated equation. As a result, the only parameter that requires tuning using cross validation is the bandwidth. Point-wise consistency of the proposed estimator for the optimal scale is established with further discussions. The promising performance of the two-scale estimator based on the adaptive variable scale is illustrated via numerical studies on density functions with different shapes.
Similar content being viewed by others
References
Abramson, I. S. (1982). On bandwidth variation in kernel estimates-a square root law. The annals of Statistics, pages 1217–1223
Bowman AW (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2):353–360
Breiman L, Meisel W, Purcell E (1977) Variable kernel estimates of multivariate densities. Technometrics 19(2):135–144
Chen Z, Leng C (2016) Dynamic covariance models. J Am Stat Assoc 111(515):1196–1207
Hall P (1990) On the bias of variable bandwidth curve estimators. Biometrika 77(3):529–535
Hall P (2013) The bootstrap and edgeworth expansion. Springer Science & Business Media, New York
Hansen BE (2009) Lecture notes on nonparametrics. Lecture Notes 5:81
Igarashi G, Kakizawa Y (2015) Bias corrections for some asymmetric kernel estimators. J Stat Plan Inference 159:37–63
Jenkins MA, Traub JF (1972) Algorithm 419: zeros of a complex polynomial [c2]. Commun ACM 15(2):97–99
Jiang B, Chen Z, Leng C et al (2020) Dynamic linear discriminant analysis in high dimensional space. Bernoulli 26(2):1234–1268
Jones M, Foster P (1993) Generalized jackknifing and higher order kernels. J Nonpara Stat 3(1):81–94
Jones M, Linton O, Nielsen J (1995) A simple bias reduction method for density estimation. Biometrika 82(2):327–338
Kolar M, Song L, Ahmed A, Xing EP et al (2010) Estimating time-varying networks. Annal Appl Stat 4(1):94–123
Loftsgaarden DO, Quesenberry CP et al (1965) A nonparametric estimate of a multivariate density function. Annal Math Stat 36(3):1049–1051
Mack Y, Rosenblatt M (1979) Multivariate k-nearest neighbor density estimates. J Multivar Anal 9(1):1–15
Rudemo M (1982) Empirical choice of histograms and kernel density estimators. Scandinavian J Stat 3:65–78
Schucany W, Sommers JP (1977) Improvement of kernel type density estimators. J Am Stat Assoc 72(358):420–423
Schucany WR (1989) On nonparametric regression with higher-order kernels. J Stat Plan Inference 23(2):145–151
Silverman BW (1986) Density estimation for statistics and data analysis, vol 26. CRC Press, Boca Raton
Terrell GR, Scott DW (1992) Variable kernel density estimation. Annal Stat 6:1236–1265
Tsybakov AB (2008) Introduction to nonparametric estimation. Springer Science & Business Media, New York
Tukey P, Tukey JW (1981) Data driven view selection, agglomeration, and sharpening. Interpret Multivar Data 61:215–243
Wand MP, Schucany WR (1990) Gaussian-based kernels. Can J Stat 18(3):197–204
Yao W (2012) A bias corrected nonparametric regression estimator. Stat Prob Lett 82(2):274–282
Acknowledgements
We thank the Editor, an Associate Editor, and an anonymous reviewer for their insightful comments. Wang’ research is supported by the National Natural Science Foundation of China (12031005), and NSF of Shanghai (21ZR1432900). Jiang is partially supported by the National Natural Science Foundation of China (12001459), and HKPolyU Internal Grants.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Technical Lemmas and Proofs
Appendix: Technical Lemmas and Proofs
Proof of Proposition 1
By Taylor expansion,
Similarly, we have
Consequently, we have
where \( C_1 =\frac{f^{(4)}(x)}{4!} \int {K(w)w^4}dw\). Similarly, for the variance, we have
Notice that
Since
and K(u) is bounded, we have
With \(C_2 = f(x)\int { K(w)^2}dw\), the variance of \(\hat{f}(x)\) can be simplified as:
Consequently,
\(\square \)
Proof of Theorem 1
For simplicity, we take \(C_3 = \frac{C_2}{nh}\). It suffices to show that
is strictly convex in \(a>1\). Note that
and for any \(a>1\),
We thus conclude that \(\widetilde{\mathrm{MSE}}(\hat{f}_{a_x,h}(x))\) is a convex function of a for \(a>1\). By setting \(\frac{d}{da}\left( \mathrm{MSE}(\hat{f}(x)) \right) = 0\), we obtain the implicit format of the global minimizer:
\(\square \)
Proof of Theorem 2
For any given x, note the fact that \(a^*_x\) is the root of
while \(h\simeq O\left( n^{-1/9} \right) \), we have \(a^*_x-1>c_0\) for some constant \(c_0\). By Hansen (2009),
Let
then
Under the assumptions of this theorem, there exist positive constants \(c_1\) and \(c_2\) s.t. \(\int {K(u) u^2}du\), \(\int {K(u) u^4}du\), \(\int {K^{(4)}(u)^2}du\) and \( \int {K(u)^2}du\) are all smaller or equal than \(c_1\), and \(\max \{\) \(f^{(6)}(x)\), \(f^{(2)}(x)\), \(f(x)\}\) \(<c_2\). Then we have
As \(n\rightarrow \infty \), we have
thus, with \(h_1\simeq O(n^{-1/13})\) and \(h_2\simeq O(n^{-1/5})\), we have that
Consequently, as \(n\rightarrow \infty \), for any given x, \(\hat{C}_1\) and \(\hat{C}_2\) are in the same order with \(C_1\) and \(C_2\) in probability. Similar to \(a_x\), we can get that \(\hat{a}_{x}-1>c_3\) for some constant \(c_3\).
By Taylor expansion, there exist \((a_{\xi },C_{1,\xi } ,C_{2,\xi } )\) s.t.
Note that \(a_{\xi }=O(1)\), \(C_{1,\xi }=O(1)\) and \(C_{2,\xi }=O(1) \), together with equation (9), we have
\(\square \)
Rights and permissions
About this article
Cite this article
Yu, X., Wang, C., Yang, Z. et al. Tuning selection for two-scale kernel density estimators. Comput Stat 37, 2231–2247 (2022). https://doi.org/10.1007/s00180-022-01196-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01196-6