Abstract
Although estimation and testing are different statistical problems, if we want to use a test statistic based on the Parzen–Rosenblatt estimator to test the hypothesis that the underlying density function f is a member of a location-scale family of probability density functions, it may be found reasonable to choose the smoothing parameter in such a way that the kernel density estimator is an effective estimator of f irrespective of which of the null or the alternative hypothesis is true. In this paper we address this question by considering the well-known Bickel–Rosenblatt test statistics which are based on the quadratic distance between the nonparametric kernel estimator and two parametric estimators of f under the null hypothesis. For each one of these test statistics we describe their asymptotic behaviours for a general data-dependent smoothing parameter, and we state their limiting Gaussian null distribution and the consistency of the associated goodness-of-fit test procedures for location-scale families. In order to compare the finite sample power performance of the Bickel–Rosenblatt tests based on a null hypothesis-based bandwidth selector with other bandwidth selector methods existing in the literature, a simulation study for the normal, logistic and Gumbel null location-scale models is included in this work.
Similar content being viewed by others
References
Anderson TW, Darling DA (1954) A test of goodness of fit. J Am Stat Assoc 49:765–769
Bickel PJ, Rosenblatt M (1973) On some global measures of the deviations of density function estimates. Ann Stat 1:1071–1095
Bosq D, Lecoutre J-P (1987) Théorie de l’estimation fonctionnelle. Economica, Paris
Bowman AW (1992) Density based tests for goodness-of-fit normality. J Stat Comput Simul 40:1–13
Bowman AW, Foster PJ (1993) Adaptive smoothing and density-based tests of multivariate normality. J Am Stat Assoc 88:529–537
Cao R, Lugosi G (2005) Goodness-of-fit tests based on the kernel density estimator. Scand J Stat 32:599–616
Cao R, Van Keilegom I (2006) Empirical likelihood tests for two-sample problems via nonparametric density estimation. Can J Stat 34:61–77
Chacón JE, Tenreiro C (2013) Data-based choice of the number of pilot stages for plug-in bandwidth selection. Commun Stat Theory Methods 42:2200–2214
Chacón JE, Montanero J, Nogales AG, Pérez P (2007) On the existence and limit behavior of the optimal bandwidth in kernel density estimation. Stat Sin 17:289–300
Devroye L, Györfi L (1985) Nonparametric density estimation: the L\(_1\) view. Wiley, New York
Ebner B, Henze N (2020) Tests for multivariate normality—a critical review with emphasis on weighted \(L^2\)-statistics. TEST 29:845–892
Epps TW (2005) Tests for location-scale families based on the empirical characteristic function. Metrika 62:99–114
Epps TW, Pulley LB (1983) A test for normality based on the empirical characteristic function. Biometrika 70:723–726
Fan Y (1994) Testing the goodness of fit of a parametric density function by kernel method. Econom Theory 10:316–356
Fan Y (1995) Bootstrapping a consistent nonparametric goodness-of-fit test. Econom Rev 14:367–382
Fan Y (1998) Goodness-of-fit tests based on kernel density estimators with fixed smoothing parameters. Econom Theory 14:604–621
Ghosh BK, Huang W-M (1991) The power and optimal kernel of the Bickel–Rosenblatt test for goodness of fit. Ann Stat 19:999–1009
Gouriéroux C, Tenreiro C (2001) Local power properties of kernel based goodness of fit tests. J Multivar Anal 78:161–190
Gürtler N (2000) Asymptotic theorems for the class of BHEP-tests for multivariate normality with fixed and variable smoothing parameter (in German). Doctoral dissertation, University of Karlsruhe, Germany
Hall P (1984) Central limit theorem for integrated square error of multivariate nonparametric density estimators. J Multivar Anal 14:1–16
Hall P, Marron JS (1987) Extent to which least-squares cross-validation minimizes integrated square error in nonparametric density estimation. Probab Theory Rel Fields 74:567–581
Hall P, Marron JS (1991) Lower bounds for bandwidth selection in density estimation. Probab Theory Rel Fields 90:149–173
Hall P, Sheather SJ, Jones MC, Marron JS (1991) On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78:263–269
Hall P, Marron JS, Park BU (1992) Smoothed cross-validation. Probab Theory Rel Fields 92:1–20
Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap 43:467–506
Henze N, Zirkler B (1990) A class of invariant consistent tests for multivariate normality. Commun Stat Theory Methods 19:3595–3617
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions 2. Wiley, New York
Krzanowski WJ, Hand DJ (2009) ROC curves for continuous data. CRC Press, Boca Raton
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20:712–736
Martínez-Camblor P, Uña-Álvarez J (2009) Nonparametric k-sample tests: density function vs. distribution function. Comput Stat Data Anal 53:3344–3357
Martínez-Camblor P, Uña-Álvarez J (2013) Density comparison for independent and right censored samples via kernel smoothing. Comput Stat 28:269–288
Martínez-Camblor P, Uña-Álvarez J, Corral N (2008) \(k\)-Sample test based on the common area of kernel density estimator. J Stat Plan Inference 138:4006–4020
Meintanis SG (2004) Goodness-of-fit tests for the logistic distribution based on empirical transforms. Sankhya Ser A 66:306–326
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
R Development Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org
Romão X, Delgado R, Costa A (2010) An empirical power comparison of univariate goodness-of-fit tests for normality. J Stat Comput Simul 80:545–591
Rosenblatt M (1956) Remarks on some non-parametric estimates of a density function. Ann Math Stat 27:832–837
Rosenblatt M (1975) A quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann Stat 3:1–14
Scott DW, Terrel GR (1987) Biased and unbiased cross-validation in density estimation. J Am Stat Assoc 82:1131–1146
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52:591–611
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Simonoff JS (1996) Smoothing methods in statistics. Springer, New York
Stephens MA (1986) Tests based on EDF statistics. In: D’Agostino RB, Stephens MA (eds) Goodness-of-fit techniques. Marcel Dekker, New York, pp 97–193
Tenreiro C (1997) Loi asymptotique des erreurs quadratiques intégrées des estimateurs à noyau de la densité et de la régression sous des conditions de dépendance. Port Math 54:187–213
Tenreiro C (2001) On the asymptotic behaviour of the integrated square error of kernel density estimators with data-dependent bandwidth. Stat Probab Lett 53:283–292
Tenreiro C (2003) On the asymptotic normality of multistage integrated density derivatives kernel estimators. Stat Probab Lett 64:311–322
Tenreiro C (2007) On the asymptotic behaviour of location-scale invariant Bickel–Rosenblatt tests. J Stat Plan Inference 137:103–116 (Erratum: 139, 2115, 2009)
Tenreiro C (2017) A weighted least-squares cross-validation bandwidth selector for kernel density estimation. Commun Stat Theory Methods 46:3438–3458
Tsybakov AB (2009) Introduction to nonparametric estimation. Springer, London
Wand MP, Jones MC (1995) Kernel smoothing. Chapman & Hall, New York
Acknowledgements
The author would like to thank the anonymous reviewers and associate editor for their constructive comments and suggestions that greatly helped to improve this work.
Funding
Research partially supported by the Centre for Mathematics of the University of Coimbra—UID/MAT/00324/2019, funded by the Portuguese Government through FCT/MEC and co-funded by the European Regional Development Fund through the Partnership Agreement PT2020.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Proofs
Proofs
1.1 Proof of Theorem 1
Consider the expansion
In order to establish the asymptotic behaviour of each one of the previous terms, we use the approach of Tenreiro (2001), which is based on the Taylor expansion
where \(u\in \mathbb {R}\), \(h>0\),
and
Note that, from assumption (K) the functions \(K^{\partial (\ell )}\) are bounded and integrable on \(\mathbb {R}\), for \(\ell =1,\ldots ,\omega -1\), and there exists \(\eta \in \,]0,1[\) such that the function \(K^{\partial (\omega ),\eta }(u) := \sup _{|h-1|\le \eta } |K^{\partial (\omega )}(u,h)|,\) is bounded and integrable on \(\mathbb {R}\). From the previous Taylor expansion we deduce the following expansions for \(f_{\hat{h}}\), \(K_{\hat{h}}*f\) and \(K_{\hat{h}}*g(\cdot ; \hat{\theta }_1, \hat{\theta }_2 )\), that play a crucial role in what follows. For \(x\in \mathbb {R}\) and denoting by h the deterministic bandwidth h(f) given in assumption (B), we have
and
where \(K^{\partial (\ell )}_h(u)=K^{\partial (\ell )}_h(u/h)/h\) and \(K^{\partial (\omega )}_h(u,\hat{h})=K^{\partial (\omega )}_h(u/h,\hat{h}/h)/h\). Moreover, for \(|\hat{h}/h-1| \le \eta \) we have \(| K^{\partial (\omega )}_h(u,\hat{h})| \le K^{\partial (\omega ),\eta }_h(u)\), for \(u\in \mathbb {R}\).
Each one of the terms in (21) is studied in the following propositions. We denote by h the deterministic sequence h(f) given in assumption (B).
Proposition 1
We have
where \(U_n\) given by (25) is asymptotically normal with zero mean and variance \(2R(K\!*\!K)R(f)\).
Proof
Using equalities (22) and (23), and assumptions (D), (K) and (B), from Proposition 2 of Tenreiro (2001, p. 290) we have
Moreover, using degenerated U-statistics techniques (see Hall 1984; Tenreiro 1997) we have
with
and \(U_{n}\) is asymptotically normal with zero mean and variance equal to \(2R(K*K) R(f)\).
\(\square \)
Proposition 2
We have
Moreover, under the null hypothesis we have
Proof
where \(\hat{\delta }_n(x)=f(x) - g(x; \hat{\theta }_1, \hat{\theta }_2 )\). Moreover,
and for all \(\epsilon \in \,]0,\eta [\) and for \(|\hat{h}/h-1| \le \epsilon \) we have
and
Therefore, from assumption (B) we can write
On the other hand, from assumption (F) the function \((\theta _1,\theta _2) \mapsto g(x; \theta _1,\theta _2)\) has continuous first-order partial derivatives, and the functions \((\theta _1,\theta _2) \mapsto \big |\big | \frac{\partial g}{\partial \theta _k} (\cdot ; \theta _1,\theta _2)\big |\big |_2\) are locally bounded on \(\mathbb {R} \times ]0,+\infty [\) for \(k=1,2\). Therefore, for each \(x\in \mathbb {R}\), a Taylor expansion of \(g(x; \hat{\theta }_1, \hat{\theta }_2 )\) at the point \((\theta _1(f), \theta _2(f))\) leads to
where
The first part of the stated result follows now from (26) and the following convergence that can be established from standard arguments as h tends to zero, when n tends to infinity:
Finally, taking into account that \(\hat{\delta }_n = u_n\) under the null hypothesis, where \(|| u_n ||_2 = O_p(n^{-1/2})\) from assumption (P), we deduce that \(I_{n,2} = O_p(n^{-1})\) under the null hypothesis. \(\square \)
To establish the order of convergence of \(I_{n,3}\) we need the following lemma. Note that we are always assuming that \(\hat{h}\) satisfies assumption (B).
Lemma 1
Let \(\varphi \) be a real-valued function defined on \(\mathbb {R} \times ]0,+\infty [\), and assume that there exists \(\eta \in \,]0,1[\) such that the function \(\varphi ^\eta (u)= \sup _{|h-1|\le \eta } |\varphi (u,h)|\) is bounded and integrable.
-
(a)
If \(\gamma _n : \mathbb {R} \mapsto \mathbb {R}\) is such \(||\gamma _n||_2 = O(1)\) then
$$\begin{aligned} \frac{1}{n} \sum _{i=1}^n \int \big \{ \varphi _h(x-X_i) - \varphi _h*f(x) \big \} \gamma _n(x) \mathrm{d}x = O_p\big ( n^{-1/2} \big ). \end{aligned}$$ -
(b)
If \(\gamma _n : \mathbb {R} \mapsto \mathbb {R}\) is such \(||\gamma _n||_r = O(1)\), for some \(r\in [1,\infty ]\), then
$$\begin{aligned} \frac{1}{n} \sum _{i=1}^n \int | \varphi _h(x-X_i,\hat{h}) - \varphi _h(\cdot ,\hat{h})*f(x) | \gamma _n(x) \mathrm{d}x = O_p( 1 ). \end{aligned}$$ -
(c)
If \(\tilde{\gamma }_n = \tilde{\gamma }_n(\cdot ;X_1,\ldots ,X_n): \mathbb {R} \mapsto \mathbb {R}\) is such that \(||\tilde{\gamma }_n||_r = O_p(1)\), for some \(r\in [1,\infty ]\), then
$$\begin{aligned} \frac{1}{n} \sum _{i=1}^n \int | \varphi _h(x-X_i,\hat{h}) - \varphi _h(\cdot ,\hat{h})*f(x) | \tilde{\gamma }_n(x) \mathrm{d}x = O_p\big ( h^{-1/r} \big ). \end{aligned}$$
Proof
Write \(S_{n,a}\), \(S_{n,b}\) and \(S_{n,c}\) for the sums considered in each one of the parts a), b) and c). The order of convergence stated in part a) follows from the inequalities
In order to establish parts b) and c), it is enough to note that for all \(\epsilon \in \,]0,\eta [\) and for \(|\hat{h}/h-1| \le \epsilon \) we have
and
where
and
with \(1/r+1/s=1\). Therefore, \(S_{n,b}^\epsilon = O_p(1)\) and \(S_{n,c}^\epsilon = O_p\big ( h^{-1/r} \big )\) which implies the stated results as \(\hat{h}/h -1 = o_p(1)\). \(\square \)
Proposition 3
We have
Moreover, under the null hypothesis we have
where \(r \in \,]2,\infty ]\) is given in assumption (F).
Proof
The first statement follows from Propositions 1 and 2 since \(|I_{n,3}| \le I_{n,1}^{1/2} I_{n,2}^{1/2}\). On the other hand, from (22), (23) and (24) we have
where \(\hat{\delta }_n(x)=f(x) - g(x; \hat{\theta }_1, \hat{\theta }_2 )\). From assumption (F), the function \((\theta _1,\theta _2) \mapsto g(x; \theta _1,\theta _2)\) has continuous second-order partial derivatives, and for some \(r \in \,]2,\infty ]\) the functions \((\theta _1,\theta _2) \mapsto \big |\big | \frac{\partial ^2 g}{\partial \theta _k \partial \theta _l} (\cdot ; \theta _1,\theta _2)\big |\big |_r\) are locally bounded on \(\mathbb {R} \times ]0,+\infty [\), for \(k,l=1,2\). Therefore, under the null hypothesis a Taylor expansion of \(g(x; \hat{\theta }_1, \hat{\theta }_2 )\) at the point \((\theta _1(f), \theta _2(f))\) leads to
for \(x\in \mathbb {R}\), where from assumption (P)
Therefore, from Lemma 1 we get \(I_{n,3}^1 = O_p \big ( n^{-1} h^{-1/r} \big )\), \(I_{n,3}^2 = O_p \big ( (n^{-1/2} + n^{-1} h^{-1/r}) \xi _n^\omega \big )\), \(I_{n,3}^3 = O_p \big ( ( n^{-1/2} + n^{-1} h^{-1/r} ) \xi _n^\omega \big )\) and \(I_{n,3}^4 = O_p \big ( ( n^{-1/2} + n^{-1} h^{-1/r} ) \xi _n^{2\omega } \big )\), which completes the proof. \(\square \)
We can now conclude the proof of Theorem 1. As \(\xi _n = o_p(1)\) and \(h \rightarrow 0\), as \(n\rightarrow \infty \), from Proposition 1 we have
Therefore, from expansion (21) and Propositions 2 and 3, we get
which completes the proof of part b) as \(nh \rightarrow \infty \), when \(n\rightarrow \infty \). Moreover, under the null hypothesis from Propositions 1, 2 and 3 we also have
Taking into account hypothesis (13), this completes the proof of part a) as \(r>2\) and \(U_n\) are asymptotically normal with zero mean and variance equal to \(\nu _f^2 = 2R(K\!*\!K)R(f)\).
\(\square \)
1.2 Proof of Theorem 2
Let us consider the expansion
Each one of these terms will be studied in the following propositions. As before, we denote by h the deterministic sequence h(f) which existence is assured by assumption (B).
Proposition 4
We have
where \(U_n\) is defined in Proposition 1 and \(V_n\) given by (31) is asymptotically normal with zero mean and variance \(\mu _2(K)^2 \mathrm {Var}_f(f''(X_1))\).
Proof
Taking into account equality (22) and assumptions (D), (D’), (K), (K’) and (B), from Lemma 1 of Tenreiro (2001, p. 286) we have
Using degenerated U-statistics techniques (see Hall 1984) we know that
with \(U_n\) given by (25) and
with
is asymptotically normal with zero mean and variance equal to \(\mu _2(K)^2 \mathrm {Var}_f(f''(X_1))\).
\(\square \)
Proposition 5
We have
Moreover, under the null hypothesis we have
Proof
It follows straightforwardly from (27) and (28). \(\square \)
Proposition 6
We have
Moreover, under the null hypothesis we have
where \(W_n\) is given by (36).
Proof
The first statement follows from Propositions 4 and 5 because \(|J_{n,3}| \le J_{n,1}^{1/2} J_{n,2}^{1/2}\) and \(R(K_h*f - f)=O(h^4)\). Write
where \(\hat{\delta }_n(x)=f(x) - g(x; \hat{\theta }_1, \hat{\theta }_2 )\). From (22) and (23) we have
where from Lemma 1 we get
On the other hand, from (23) we have
where for all \(\epsilon \in \,]0,\eta [\) and for \(|\hat{h}/h-1| \le \epsilon \) we have
Moreover, as \(\int K^{\partial (\ell )}(u) \mathrm{d}u = \int uK^{\partial (\ell )}(u) \mathrm{d}u=0\) for \(\ell \ge 1\), a Taylor expansion of second order leads to
Therefore, for \(\ell \ge 1\) we have
Taking into account (27) and the fact that \(|| \hat{\delta }_n ||_2 = O_p(n^{-1/2})\) under the null hypothesis, from (35) we get
Finally, from (29) and (32), and assumptions (E) and (\(G'\)), we have
where
with
as
and
with \(1/r+1/s=1\). Thus
The proposition follows from (33), (34) and (37). \(\square \)
We can now conclude the proof of Theorem 2. From Proposition 4 and assumption (B’) we have
Therefore, from expansion (30) and Propositions 5 and 6, we get
which completes the proof of part b). Moreover, from Propositions 4, 5 and 6, under the null hypothesis we also have
Taking into account hypothesis (14), this completes the proof of part a) as \(r>2\) and, from the central limit theorem for degenerate U-statistics with variable kernels established in Tenreiro (1997, Theorem 1, p. 190), the sum \(U_n + ( nh^5 )^{1/2} (V_n - 2W_n)\) is asymptotically normal with zero mean nd variance equal to \(\sigma _f^2 = 2R(K*K) R(f) + \lambda _f \mu _2(K)^2 \mathrm {Var}_f(\varphi _f(X))\). \(\square \)
1.3 Proof of Theorem 3
We consider only the case of the test based on the critical region \(\mathscr {C}(J_n(\hat{h}),\alpha )=\{J_n(\hat{h}),\alpha ) > q(J_n(\hat{h}),\alpha ) \}\) given in (15), where \(q(J_n(\hat{h}),\alpha )\) is the quantile of order \(1-\alpha \) of the null distribution of \(J_n(\hat{h})\), but similar arguments can be used to establish the consistency of the test based on \(\mathscr {C}(I_n(\hat{h}),\alpha )\). From Theorem 2.a) and for \(f\in \mathscr {F}_0\) we have \(\upsilon _{f}^{-1} h(f)^{-1/2} \big ( q(J_n(\hat{h}),\alpha ) - R(K) - c_n(f;K) \big ) \rightarrow \Phi ^{-1}(1-\alpha )\). Therefore,
because h(f) tends to zero, as \(n\rightarrow \infty \), and \(c_n(f;K)=c_n(f_0;K)=c(f_0;K) (1+o(1))\), with \(c(f;K)=\frac{1}{4} \lambda _{f} \mu _2(K)^2 R(f^{\prime \prime }) (1 + o(1))\) (see Wand and Jones 1995, pp. 19–23, and Bosq and Lecoutre 1987, pp. 80–81). On the other hand, from Theorem 2.b) and for \(f\in \mathscr {F}{\setminus }\mathscr {F}_0\) we have \((nh(f))^{-1} \big ( J_n(\hat{h}) - R(K) - c_n(f;K) \big ) {\mathop {\longrightarrow }\limits ^{p}} R\big (f - g(\cdot ; \theta _1(f),\theta _2(f) )\big ) \ne 0,\) which enables us to conclude that
as \(nh(f) \rightarrow \infty \), and \(c_n(f;K)=c(f;K) (1+o(1))\). The consistency of the test based on \(\mathscr {C}(J_n(\hat{h}),\alpha )\) follows now from (38) and (39). \(\square \)
Rights and permissions
About this article
Cite this article
Tenreiro, C. On automatic kernel density estimate-based tests for goodness-of-fit. TEST 31, 717–748 (2022). https://doi.org/10.1007/s11749-021-00799-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-021-00799-3