Abstract
The traditional nonparametric entropy estimators based on the popular Rosenblatt–Parzen kernel density estimator using symmetric kernels may not be appropriate in the context of non-negative data. We, thus consider the Poisson weights-based density estimator instead, which is specifically tailored for non-negative data. The asymptotic properties of the new estimators are established and a simulation study is carried out to compare the performance of the new estimators, especially concerning those using the Rosenblatt–Parzen kernel density estimator, demonstrating a better performance of the new estimators.
Similar content being viewed by others
References
Ahmad IA, Lin PE (1989) A non-parametric estimation of the entropy for absolutely continuous distributions. IEEE Trans Inf Theory 36:688–692
Beirlant J, Dudewicz EJ, Györfi L, van de Meulen EC (2001) Nonparametric entropy estimation: an overview. Int J Math Stat Sci 6:17–39
Benavides EM (2012) Advanced engineering design: an integrated approach. Woodhead Publishing, Cambridge
Berrett TB, Samworth RJ, Yuan M (2019) Efficient multivariate entropy estimation via \(\it k\)-nearest neighbour distances. Ann Stat 47(1):288–318
Bouezmarni T, Scaillet O (2005) Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data. Econom Theory 21:390–412
Bracken P (2020) Entropy in quantum mechanics and applications to non-equilibrium thermodynamics. In: Bracken P (ed) Quantum mechanics. InTech Open, London. https://doi.org/10.5772/intechopen.87908
Chaubey YP, Mudholkar GS (2013) A rationale for maximizing the likelihood and related alternatives to maximum likelihood estimator. Investig Math Sci 3:1–15
Chaubey YP, Sen PK (1996) On smooth estimation of survival and density functions. Stat Decis 14:1–22
Chaubey YP, Sen PK (1999) On smooth estimation of mean residual life. J Stat Plan Inference 75(2):223–236
Chaubey YP, Sen PK (2009) On the selection of the smoothing parameter in Poisson smoothing of histogram estimator: computational aspects. Pak J Stat 25:385–401
Chaubey YP, Sen PK, Li J (2010) Smooth density estimation for length-biased data. J Indian Soc Agric Stat 64:145–155
Eggermont PPB, LaRiccia VN (1999) Best asymptotic normality of the kernel density entropy estimator for smooth densities. IEEE Trans Inf Theory 45:1321–1324
Györfi L, van der Meulen EC (1990) An entropy estimate based on a kernel density estimation. In: Berkes I, Csáki E, Révész P (eds) Limit Theorems Probab Stat. North-Holland, Amsterdam
Hall P, Morton SC (1993) On the estimation of entropy. Ann Inst Stat Math 45:69–88
Han Y, Jiao J, Weissman J, Wu Y (2019) Optimal rates of entropy estimation over Lipschitz balls. Preprint. arXiv:1711.02141v4 [math.ST]
Jaynes ET (1957a) Information theory and statistical mechanics-I. Phys Rev 106:620–630
Jaynes ET (1957b) Information theory and statistical mechanics-II. Phys Rev 108:171–190
Kandasamy K, Krishnamurthy A, Poczos B, Wasserman L, Robins J (2015) Nonparametric von Mises estimators for entropies, divergences and mutual informations. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. Curran Associates, Inc., New York, pp 397–405
Karmeshu (2003) Entropy measures, maximum entropy principle and emerging applications. Springer, Berlin
Mallat S (2009) A wavelet tour of signal processing, 3rd edn. Elsevier, Amsterdam
Paninski L (2003) Estimation of entropy and mutual information. Neural Comput 15:1191–1253
Paninski L, Yajima M (2008) Undersmoothed kernel entropy estimators. IEEE Trans Inf Theory 54(9):4384–4388. https://doi.org/10.1109/TIT.2008.928251
Pires J (2001) Textural and surface chemistry characterization of zeolites via adsorption phenomena. In: Nalwa H (ed) Handbook of surfaces and interfaces of materials, 2nd edn. Academic Press, London, pp 481–507
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Shwartz S, Zibulevsky M, Schechner YY (2005) Fast kernel entropy estimation and optimization. Signal Process 85:1045–1058
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall/CRC, London
Singh VP (1998) Entropy-based parameter estimation in hydrology. Water science and technology library, vol 30. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-1431-0_1
Singh VP (2003) The entropy theory as a decision-making tool in environmental and water resources. In: Karmeshu (ed) Entropy measures, maximum entropy principle and emerging applications. Springer, Berlin, pp 261–297
Acknowledgements
The authors are thankful to an anonymous reviewer for helpful comments that have helped to improve the presentation of this paper. The first author would like to acknowledge the partial support for this research from Natural Sciences and Engineering Research Council of Canada through a Discovery Grant (Grant No. RGPIN/4794-2017).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Celebrating the Centenary of Professor C. R. Rao” guest edited by, Ravi Khattree, Sreenivasa Rao Jammalamadaka, and M. B. Rao.
Appendix 1: Proofs
Appendix 1: Proofs
1.1 Proof of Theorem 1
Writing \(\hat{H}_f^{{\mathrm{Meanlog-Pois}}}\) as an integral with respect to the empirical distribution function \(F_n(x)\), we have
where
On the other hand, we know that \(-\frac{1}{n}\sum _{i=1}^{n}\log f(X_i)\) is an unbiased, strongly consistent estimator for \(H_f\) by the strong law of large numbers, and is root-n asymptotic normal by the central limit theorem if \({\mathbb {E}}[\log f(X)^2]<\infty\). That is
and
as \(n\rightarrow \infty .\) Therefore, it is sufficient to prove that
In order to establish this, we decompose \(I_n,\) into two parts as,
where
and analyze them separately.
-
Analysis of \(I_{n,2}\) Since the function \(\log z\) is continuous and differentiable for all \(z>0\), we can use the Taylor expansion centering at a to get
$$\begin{aligned} \log z = \log a +\frac{z-a}{tz+(1-t)a}, \end{aligned}$$where \(t\in (0,1)\). By letting \(z=\hat{f}_n^{{\mathrm{Pois}}}(x)\) and \(a=f(x)\), we obtain
$$\begin{aligned} \log \hat{f}_n^{{\mathrm{Pois}}}(x)-\log f(x) = \frac{\hat{f}_n^{{\mathrm{Pois}}}(x)-f(x)}{t\hat{f}_n^{{\mathrm{Pois}}}(x)+(1-t)f(x)}. \end{aligned}$$.
As \(\Vert \hat{f}_n^{{\mathrm{Pois}}}-f\Vert _{\infty }{\mathop {\rightarrow }\limits ^{{\mathrm{a.s.}}}}0\) uniformly if \(k=O(n^h)\) for \(0<h<3/4\), it implies that \(t\hat{f}_n^{{\mathrm{Pois}}}(x)+(1-t)f(x){\mathop {\rightarrow }\limits ^{{\mathrm{a.s.}}}}f(x)\). Meanwhile, Recall from Chaubey et al. [11] that if \(f'(x)\) satisfies the Lipschitz of order \(\alpha >0\), i.e., there exists a finite positive C such that
$$\begin{aligned} |f'(s)-f'(t)|\le C|s-t|^{\alpha } \quad \forall s,t\in {\mathbb {R}}^+, \end{aligned}$$then for fixed \(x\in {\mathbb {R}}\) we have
$$\begin{aligned} \hat{f}_n^{{\mathrm{Pois}}}(x)-f(x)=\frac{1}{2k}f'(x)+O(k^{-1-\alpha }). \end{aligned}$$(A.5)Thus, by dominant convergence theorem (DCT), from (A.4) we have
$$\begin{aligned} \sqrt{n}|I_{n,2}|&=\left| \int _{0}^{\infty }\sqrt{n}\bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}} (x)-f(x)}{t\hat{f}_n^{{\mathrm{Pois}}}(x)+(1-t)f(x)}\bigg )f(x){\mathrm{d}}x\right| \\&=\left| \int _{0}^{\infty }\sqrt{n}\bigg (\frac{\frac{1}{2k}f'(x)+O(k^{-1-\alpha })}{t\hat{f}_n^{{\mathrm{Pois}}}(x)+(1-t)f(x)}\bigg )f(x){\mathrm{d}}x\right| \\&\approx \left| \int _{0}^{\infty }\sqrt{n}\bigg (\frac{\frac{1}{2k}f'(x)+O(k^{-1-\alpha })}{f(x)}\bigg )f(x){\mathrm{d}}x\right| \\&=\left| \int _{0}^{\infty }\sqrt{n}\left( \frac{1}{2k}f'(x)+O(k^{-1-\alpha })\right) {\mathrm{d}}x\right| \\&{\mathop {\rightarrow }\limits ^{{\mathrm{a.s.}}}} 0, \end{aligned}$$(A.6)under the condition that \(k=O(n^h)\) for \(1/2<h<3/4\) and \(\int _{0}^{\infty }f'(x){\mathrm{d}}x<\infty\)
-
Analysis of \(I_{n,1}\) By the law of the iterated logarithm, we have
$$\begin{aligned} \Vert F_n-F\Vert _{\infty }=O\big (n^{-1/2}(\log \log n)^{1/2}\big )\,\, {\text { a.s.}} \end{aligned}$$(A.7)Also, from (A.5), given that \(f'(x)/f(x)<\infty\) \(\forall x\in {\mathbb {R}}^+\), we get
$$\begin{aligned} \frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}=1+\frac{1}{2k}\frac{f'(x)}{f(x)} +O(k^{-k-\alpha }) =1+O(k^{-1}). \end{aligned}$$(A.8)Thus, for \(k=O(n^h), 1/2<h<3/4,\) using integration by parts, we have
$$\begin{aligned} \sqrt{n}|I_{n,1}|&=\sqrt{n}\left| \bigg [\big (F_n(x)-F(x)\big )\log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}\bigg )\bigg ]^{\infty }_0\right. \\&\quad \left. -\int _{0}^{\infty }\frac{{\mathrm{d}}}{{\mathrm{d}}x}\bigg (\log \frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)} \bigg )\big ( F_n(x)-F(x) \big ){\mathrm{d}}x\right| \\&\le \sqrt{n}\Vert F_n-F\Vert _{\infty }\left| \log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)} \bigg )\right| ^{\infty }_0\\&\quad +\sqrt{n}||F_n-F||_{\infty }\left| \int _{0}^{\infty }\frac{{\mathrm{d}}}{{\mathrm{d}}x} \bigg (\log \frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)} \bigg ){\mathrm{d}}x\right| \\&=2\sqrt{n}||F_n-F||_{\infty }\left| \log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}\bigg ) \right| ^{\infty }_0\\&{\mathop {\rightarrow }\limits ^{{\mathrm{a.s.}}}} 0, \end{aligned}$$(A.9)
Now from Eqs. (A.2), (A.6), and (A.9) we get,
which implies
This completes the proof of the theorem. □
1.2 Proof of Theorem 2
Let \({\tilde{F}}_n(x)=\int _{0}^{x}\hat{f}_n^{{\mathrm{Pois}}}(t){\mathrm{d}}t\) that is a smooth version of the empirical distribution function \(F_n(x).\) Similarly, let \({{\tilde{S}}_n}(x)=1-{\tilde{F}}_n(x),\) be the corresponding smooth version of the empirical survival function \(S_n(x)=1-F_n(x),\) and denote the survival function \(1-F(x)\) by S(x). We can now decompose \(\hat{H}_f^{{\mathrm{Plugin-Pois}}}\) as
where
Therefore, with the asymptotic properties of \(H_f^{{\mathrm{Meanlog-Pois}}}\) in the Theorem 1, it is sufficient to prove
-
Analysis of \(U_{n,1}\) Recall that under the conditions \(k\rightarrow \infty\), \(n^{-1}k\rightarrow 0\), and f(x) being absolutely continuous having a bounded derivative \(f'(\cdot )\) a.e. on \({\mathbb {R}}^+\), [8] showed that (see their Theorem 3.2)
$$\begin{aligned} \Vert {\tilde{S}}_n-S_n\Vert _{\infty }=\sup _{x\in {\mathbb {R}}^+}\{ {\tilde{S}}_n(x)-S_n(x) \}=O(n^{-3/4}(\log n)^{1+\delta }) \ \ {\text{a.s.}} \, {\text { as }}\, n\rightarrow \infty , \end{aligned}$$(A.11)where \(\delta >0\) is arbitrary. Also, from (A.8), given that \(f'(x)/f(x)<\infty\) \(\forall x\in {\mathbb {R}}^+\), we get
$$\begin{aligned} \frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}=1+\frac{1}{2k}\frac{f'(x)}{f(x)}+O(k^{-k-\alpha }) =1+O(k^{-1}). \end{aligned}$$As a result, using integration by parts, we get
$$\begin{aligned} \sqrt{n}|U_{n,1}|&=\sqrt{n}\left| \int _{0}^{\infty }\log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}} (x)}{f(x)}\bigg ) d\big (S_n(x)-{\tilde{S}}_n(x)\big )\right| \\&=\sqrt{n}\left| \left[ \log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}\bigg )\big (S_n(x) -{\tilde{S}}_n(x)\big )\right] ^{\infty }_0\right. \\&\quad \left. -\int _{0}^{\infty }\frac{{\mathrm{d}}}{{\mathrm{d}}x}\bigg (\log \frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}\bigg ) \big (S_n(x)-{\tilde{S}}_n(x)\big ){\mathrm{d}}x\right| \\&\le \sqrt{n}\Vert {\tilde{S}}_n-S_n\Vert _{\infty }\left| \log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}\bigg )\right| ^{\infty }_0 \\&\quad +\sqrt{n}||{\tilde{S}}_n-S_n||_{\infty }\left| \log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}} (x)}{f(x)}\bigg )\right| ^{\infty }_0\\&=2\sqrt{n}\Vert {\tilde{S}}_n-S_n\Vert _{\infty }\left| \log \bigg (\frac{\hat{f}_n^{{\mathrm{Pois}}}(x)}{f(x)}\bigg )\right| ^{\infty }_0\\&{\mathop {\rightarrow }\limits ^{{\mathrm{a.s.}}}} 0. \end{aligned}$$(A.12) -
Analysis of \(U_{n,2}\) Similarly, we can use integration by part on \(U_{n,2}\) to obtain
$$\begin{aligned} \sqrt{n}|U_{n,2}|&=\sqrt{n}\left| \int _{0}^{\infty }\log f(x)d\big (S_n(x) -{\tilde{S}}_n(x)\big )\right| \\&=\sqrt{n}\left| \left[ \log f(x)\big (S_n(x)-{\tilde{S}}_n(x)\big ) \right] ^{\infty }_0\right. \\&\quad \left. -\int _{0}^{\infty }\frac{{\mathrm{d}}}{{\mathrm{d}}x}(\log f(x)) \big (S_n(x) -{\tilde{S}}_n(x)\big ){\mathrm{d}}x\right| \\&\le \sqrt{n}||{\tilde{S}}_n-S_n||_{\infty }\left| \log f(x)\right| ^{\infty }_0 +\sqrt{n}||{\tilde{S}}_n-S_n||_{\infty }\left| \log f(x)\right| ^{\infty }_0\\&=2\sqrt{n}||{\tilde{S}}_n-S_n||_{\infty }\left| \log f(x)\right| ^{\infty }_0\\&{\mathop {\rightarrow }\limits ^{{\mathrm{a.s.}}}} 0, \end{aligned}$$(A.13)where the last equation above follows by the assumption that f(x) is bounded away from zero, which implies that \(\left| \log f(x)\right| <\infty\).
Therefore, using Eqs. (A.10), (A.12) and (A.13), we obtain
which completes the proof of the theorem. □
Rights and permissions
About this article
Cite this article
Chaubey, Y.P., Vu, N.L. On the Estimation of Entropy for Non-negative Data. J Stat Theory Pract 15, 27 (2021). https://doi.org/10.1007/s42519-021-00165-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-021-00165-4