Elsevier

Information Sciences

Volume 179, Issue 14, 27 June 2009, Pages 2426-2433
Information Sciences

Some properties of Rényi entropy and Rényi entropy rate

https://doi.org/10.1016/j.ins.2009.03.002Get rights and content

Abstract

In this paper, we define the conditional Rényi entropy and show that the so-called chain rule holds for the Rényi entropy. Then, we introduce a relation for the rate of Rényi entropy and use it to derive the rate of the Rényi entropy for an irreducible-aperiodic Markov chain. We also show that the bound for the Rényi entropy rate is simply the Shannon entropy rate.

Introduction

In 1948, Shannon [30] used some axioms to introduce Shannon entropy. Subsequently, a number of sets of axioms were proposed by Khinchin [16] and others [10]. In 1961, Rényi [27] generalized Shannon entropy to a one-parameter family of entropies by defining an entropy of order α which is called the Rényi entropy. Some authors have modified the axioms associated with Rényi entropy [10]. Recently, Jizba and Arimitsu [14] used the extant axioms for Shannon and Rényi entropies to introduce a set of new axioms on the basis of which one can get both Shannon and Rényi entropies.

The concept of Rényi entropy has a number of applications in coding theory [5], [9], statistical mechanics [7], [17], statistics and related fields [1], [13], [22], [34] and other areas (see for instance [3] and the references therein).

We know that in the case of Shannon entropy, conditional entropy can be derived for random variables. Furthermore, there is a relation between conditional Shannon entropy and the joint Shannon entropy of random variables. This relation is called chain rule [6]. In the case of conditional Rényi entropy of random variables, however, there is no established definition of this quantity yet. Cachin [4] has given a definition on the basis of conditional Shannon entropy. For this definition, however, the chain rule does not hold. But one can use the axioms introduced by Jizba and Arimitsu [14] to derive conditional Rényi entropy and thereby prove the validity of the chain rule.

The application of the conditional Rényi entropy can be found in many areas such as quantum systems [32], biomedical engineering [20], cryptography [4], fields related to statistics [15], economics [2] and other areas [8], [23].

Through the introduction of entropy in the probability theory, entropy and stochastic processes became linked and the entropy rate was defined for stochastic processes. For a discrete-time process X = (Xn)n⩾1, the entropy at time n is defined as the entropy of the n-dimensional random vector of X, and the entropy rate is defined by the limit of the entropy at time n divided by n, when the limit exists. For a stationary stochastic process with finite state space, Shannon proved [30] that the Shannon entropy rate exists. He also obtained the entropy rate for an ergodic Markov chain in the formH¯1(X)=-iπijpijlogpijwhere pij, i,j = 1,2,  ,n are transition probabilities and Π = (πi), i = 1,2,...,n, is the stationary distribution of the chain. Here, Π = ΠP, where P = (pij) and i=1nπi=1.

The existence of the Shannon entropy rate for an irreducible Markov chain with infinite state space was proved by Klimko and Sucheston [18]. It can be shown that (1.1) is valid for the rate of Shannon entropy of an irreducible Markov chain with infinite state space.

The Rényi entropy rate was first defined by Rached et al. [25] for an ergodic Markov chain with a finite state space. The entropy rate of this process is expressed asH¯α(X)=11-αlogλα>0,α1where λ is the largest positive eigenvalue of the matrix (pijα)i,j=1,2,,n and pij are transition probabilities of the chain. Recently, it has been shown in Ref. [24] that the Rényi entropy rate for an irreducible-aperiodic Markov chain with infinite state space isH¯α(X)=11-αlogR-1α>0,α1where R is the convergence radius of the matrix (pijα), i, j = 1, 2, …, and pij are transition probabilities of the chain.

The Rényi entropy rate has revealed several operational characteristics in coding theory and error exponent [5], [25], [26], [31]. Rényi entropy rate for stochastic processes is given by [12], and for fields related to statistics refer to [11].

This paper is organized as follows. In Section 2, the conditional Rényi entropy is obtained and some properties of the Rényi entropy are presented. In Section 3, we show that the chain rule holds for the Rényi entropy, and introduce a relation for obtaining the rate of Rényi entropy. Then, using this relation, the Rényi entropy rate for an irreducible-aperiodic Markov chain, with finite and infinite state spaces, is derived. Finally, in Section 4, we show that for an irreducible-aperiodic Markov chain the bound for the Rényi entropy rate is simply the Shannon entropy rate.

Section snippets

Rényi entropy

We know that Rényi entropy is a generalization of Shannon entropy. For Shannon entropy, the conditional entropy has already been derived and its properties are known. But in the case of Rényi entropy, the conditional entropy does not have an established definition yet. Cachin [4] has given a definition similar to the definition for the conditional Shannon entropy, but he could not derive for it the chain rule which had been obtained for Shannon entropy. In this section, we present a suitable

The rate of Rényi entropy

Let (Xn)n⩾1 be a discrete-time process and assume that the random vector (X1, …, Xn) has a probability distributionp(i1,,in)P(X1=i1,,Xn=in)Then, by the following theorem, we show that the chain rule holds for the Rényi entropy.

Theorem 3.1

chain rule:

Let (X1, , Xn) be a random vector with the probability distribution p(i1, , in) and Hα(X1, , Xn) be the Rényi entropy. Then:Hα(X1,,Xn)=i=1nHα(XiX1,,Xi-1)

Proof

For the random vector (X1, …, Xn), we have by (2.4), (3.1):Hα(X1,,Xn)=11-αlogi1,,inpα(i1,,in)We can write:i1,,inpα

Bounds for the Rényi entropy rate

Using the fact that the Rényi entropy is a decreasing function of α (remark 2.1), we have the following inequalities:1-Forα<1,H1(.)<Hα(.)2-Forα>1,Hα(.)<H1(.)where H1 is the Shannon entropy.

Now, we obtain the bounds for the Rényi entropy rate of an irreducible-aperiodic Markov chain by using (4.1), (4.2).

For a random vector (X1, …, Xn), inequality (4.1) becomes:H1(X1,,Xn)<Hα(X1,,Xn)and therefore1nH1(X1,,Xn)<1nHα(X1,,Xn)Taking the limit n  ∞ of the entropy and considering that the rate of Rényi

Conclusion

In this paper, we introduced a new definition for conditional Rényi entropy, based on the axioms introduced by Jizba and Arimitsu, and demonstrated that the chain rule holds for this definition. Then, we derived a relation for obtaining the rate of Rényi entropy and we used this relation to obtain the rate of Rényi entropy for an irreducible-aperiodic Markov chain. Furthermore, showed that Shannon entropy rate is a bound for Rényi entropy rate for the aforementioned processes.

References (34)

  • I. Csiszár

    Generalized cutoff rates and Rényi’s information measures

    IEEE Transactions on Information Theory

    (1995)
  • T.M. Cover et al.

    The Elements of Information Theory

    (1991)
  • S. Guiasu

    Information Theory with Applications

    (1977)
  • N.J.A. Harvey, K. Onak, J. Nelson, Streaming algorithms for estimating entropy, in: IEEE Information Theory Workshop,...
  • F. Kanaya et al.

    The asymptotics of posterior entropy and error probability for Bayesian estimation

    IEEE Transactions on Information Theory

    (1995)
  • A.I. Khinchin

    Mathematical Foundations of Information Theory

    (1957)
  • V.S. Kirchanov

    Using the Rényi entropy to describe quantum dissipative systems in statistical mechanics

    Theoretical and Mathematical Physics

    (2008)
  • Cited by (91)

    • Renyi entropy based design of heavy tailed distribution for return of financial assets

      2024, Physica A: Statistical Mechanics and its Applications
    • Prediction of grain boundary of a composite microstructure using digital image processing: A comparative study

      2021, Materials Today: Proceedings
      Citation Excerpt :

      By using edge detection techniques, unimportant information of structural properties gets discarded. In this paper, we depicted a quantitative comparative analysis of different edge detection operators such as Sobel [15], Robert [16], Prewitt [17], and Canny [18,19]. Then the best operator is used in conjunction with a watershed transformation to give the distinct edge characteristics of the grains.

    View all citing articles on Scopus
    View full text