Some properties of Rényi entropy and Rényi entropy rate
Introduction
In 1948, Shannon [30] used some axioms to introduce Shannon entropy. Subsequently, a number of sets of axioms were proposed by Khinchin [16] and others [10]. In 1961, Rényi [27] generalized Shannon entropy to a one-parameter family of entropies by defining an entropy of order α which is called the Rényi entropy. Some authors have modified the axioms associated with Rényi entropy [10]. Recently, Jizba and Arimitsu [14] used the extant axioms for Shannon and Rényi entropies to introduce a set of new axioms on the basis of which one can get both Shannon and Rényi entropies.
The concept of Rényi entropy has a number of applications in coding theory [5], [9], statistical mechanics [7], [17], statistics and related fields [1], [13], [22], [34] and other areas (see for instance [3] and the references therein).
We know that in the case of Shannon entropy, conditional entropy can be derived for random variables. Furthermore, there is a relation between conditional Shannon entropy and the joint Shannon entropy of random variables. This relation is called chain rule [6]. In the case of conditional Rényi entropy of random variables, however, there is no established definition of this quantity yet. Cachin [4] has given a definition on the basis of conditional Shannon entropy. For this definition, however, the chain rule does not hold. But one can use the axioms introduced by Jizba and Arimitsu [14] to derive conditional Rényi entropy and thereby prove the validity of the chain rule.
The application of the conditional Rényi entropy can be found in many areas such as quantum systems [32], biomedical engineering [20], cryptography [4], fields related to statistics [15], economics [2] and other areas [8], [23].
Through the introduction of entropy in the probability theory, entropy and stochastic processes became linked and the entropy rate was defined for stochastic processes. For a discrete-time process X = (Xn)n⩾1, the entropy at time n is defined as the entropy of the n-dimensional random vector of X, and the entropy rate is defined by the limit of the entropy at time n divided by n, when the limit exists. For a stationary stochastic process with finite state space, Shannon proved [30] that the Shannon entropy rate exists. He also obtained the entropy rate for an ergodic Markov chain in the formwhere pij, i,j = 1,2, … ,n are transition probabilities and Π = (πi), i = 1,2,...,n, is the stationary distribution of the chain. Here, Π = ΠP, where P = (pij) and .
The existence of the Shannon entropy rate for an irreducible Markov chain with infinite state space was proved by Klimko and Sucheston [18]. It can be shown that (1.1) is valid for the rate of Shannon entropy of an irreducible Markov chain with infinite state space.
The Rényi entropy rate was first defined by Rached et al. [25] for an ergodic Markov chain with a finite state space. The entropy rate of this process is expressed aswhere λ is the largest positive eigenvalue of the matrix and pij are transition probabilities of the chain. Recently, it has been shown in Ref. [24] that the Rényi entropy rate for an irreducible-aperiodic Markov chain with infinite state space iswhere R is the convergence radius of the matrix , i, j = 1, 2, …, and pij are transition probabilities of the chain.
The Rényi entropy rate has revealed several operational characteristics in coding theory and error exponent [5], [25], [26], [31]. Rényi entropy rate for stochastic processes is given by [12], and for fields related to statistics refer to [11].
This paper is organized as follows. In Section 2, the conditional Rényi entropy is obtained and some properties of the Rényi entropy are presented. In Section 3, we show that the chain rule holds for the Rényi entropy, and introduce a relation for obtaining the rate of Rényi entropy. Then, using this relation, the Rényi entropy rate for an irreducible-aperiodic Markov chain, with finite and infinite state spaces, is derived. Finally, in Section 4, we show that for an irreducible-aperiodic Markov chain the bound for the Rényi entropy rate is simply the Shannon entropy rate.
Section snippets
Rényi entropy
We know that Rényi entropy is a generalization of Shannon entropy. For Shannon entropy, the conditional entropy has already been derived and its properties are known. But in the case of Rényi entropy, the conditional entropy does not have an established definition yet. Cachin [4] has given a definition similar to the definition for the conditional Shannon entropy, but he could not derive for it the chain rule which had been obtained for Shannon entropy. In this section, we present a suitable
The rate of Rényi entropy
Let (Xn)n⩾1 be a discrete-time process and assume that the random vector (X1, …, Xn) has a probability distributionThen, by the following theorem, we show that the chain rule holds for the Rényi entropy. Theorem 3.1 Let (X1, …, Xn) be a random vector with the probability distribution p(i1, …, in) and Hα(X1, …, Xn) be the Rényi entropy. Then: Proof For the random vector (X1, …, Xn), we have by (2.4), (3.1):We can write:chain rule:
Bounds for the Rényi entropy rate
Using the fact that the Rényi entropy is a decreasing function of α (remark 2.1), we have the following inequalities:where H1 is the Shannon entropy.
Now, we obtain the bounds for the Rényi entropy rate of an irreducible-aperiodic Markov chain by using (4.1), (4.2).
For a random vector (X1, …, Xn), inequality (4.1) becomes:and thereforeTaking the limit n → ∞ of the entropy and considering that the rate of Rényi
Conclusion
In this paper, we introduced a new definition for conditional Rényi entropy, based on the axioms introduced by Jizba and Arimitsu, and demonstrated that the chain rule holds for this definition. Then, we derived a relation for obtaining the rate of Rényi entropy and we used this relation to obtain the rate of Rényi entropy for an irreducible-aperiodic Markov chain. Furthermore, showed that Shannon entropy rate is a bound for Rényi entropy rate for the aforementioned processes.
References (34)
On the geometry of generalized Gaussian distributions
Journal of Multivariate Analysis
(2009)- et al.
Long memory and volatility clustering: is the empirical evidence consistent across stock markets?
Physica A
(2008) On some entropy functionals derived from Rényi information divergence
Information Sciences
(2008)- et al.
On Rényi information for ergodic diffusion processes
Information Sciences
(2009) - et al.
Gelfand–Yaglom–Perez theorem for generalized relative entropy functionals
Information Sciences
(2007) - et al.
Robust codind for a class of sources: applications in control and reliable communication over limited capacity channels
Systems and Control Letters
(2008) - et al.
On the entropy of hidden Markov process
Theoretical Computer Science
(2008) - et al.
A new information theoretic analysis of sum-of-squared-error kernel clustering
Neurocomputing
(2008) - et al.
The world according to Rényi: thermodynamics of multifractal systems
Annals of Physics
(2004) - C. Cachin, Entropy measures and unconditional security in cryptography, PhD Thesis, Swiss Federal Institute of...
Generalized cutoff rates and Rényi’s information measures
IEEE Transactions on Information Theory
The Elements of Information Theory
Information Theory with Applications
The asymptotics of posterior entropy and error probability for Bayesian estimation
IEEE Transactions on Information Theory
Mathematical Foundations of Information Theory
Using the Rényi entropy to describe quantum dissipative systems in statistical mechanics
Theoretical and Mathematical Physics
Cited by (91)
Studying the impact of fluctuations, spikes and rare events in time series through a wavelet entropy predictability measure
2024, Physica A: Statistical Mechanics and its ApplicationsRenyi entropy based design of heavy tailed distribution for return of financial assets
2024, Physica A: Statistical Mechanics and its ApplicationsMulti-threshold image segmentation based on an improved differential evolution: Case study of thyroid papillary carcinoma
2023, Biomedical Signal Processing and ControlPrediction of grain boundary of a composite microstructure using digital image processing: A comparative study
2021, Materials Today: ProceedingsCitation Excerpt :By using edge detection techniques, unimportant information of structural properties gets discarded. In this paper, we depicted a quantitative comparative analysis of different edge detection operators such as Sobel [15], Robert [16], Prewitt [17], and Canny [18,19]. Then the best operator is used in conjunction with a watershed transformation to give the distinct edge characteristics of the grains.