Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access May 26, 2020

On the symmetrized s-divergence

  • Slavko Simić , Sara Salem Alzaid and Hassen Aydi EMAIL logo
From the journal Open Mathematics

Abstract

In this study, we work with the relative divergence of type s , s , which includes the Kullback-Leibler divergence and the Hellinger and χ 2 distances as particular cases. We study the symmetrized divergences in additive and multiplicative forms. Some basic properties such as symmetry, monotonicity and log-convexity are established. An important result from the convexity theory is also proved.

MSC 2010: 60E15

1 Introduction

Denote by

Ω + = { μ = { μ i } | μ i > 0, μ i = 1 } ,

the family of finite discrete probability distributions.

The famous Csiszár’s f-divergence C f (μ||ν) [1] is known as the most general probability measure in the Information Theory. It is given as follows.

Definition 1.1

For a convex function f : ( 0 , ) , the f-divergence measure is defined as

C f ( μ | | ν ) ν i f ( μ i / ν i ) ,

where μ, νΩ +.

Some important information measures are just particular cases of Csiszár’s f-divergence.

As examples, we state the following:

  1. for f(ξ) = ξ α , α > 1, we get the α-order divergence given as

    I α ( μ | | ν ) μ i α ν i 1 α .

    This quantity is an argument in well-known theoretical divergence measures, such as Renyi α-order divergence I α R ( μ | | ν ) or Tsallis divergence I α T ( μ | | ν ) , defined as

    I α R ( μ | | ν ) 1 α 1 log I α ( μ | | ν ) ; I α T ( μ | | ν ) 1 α 1 ( I α ( μ | | ν ) 1 ) ;

  2. for f(ξ) = ξ log ξ, we get the Kullback-Leibler divergence [2] defined by

    K ( μ | | ν ) μ i log ( μ i / ν i ) ;

  3. for f ( ξ ) = ( ξ 1 ) 2 , we find the Hellinger distance

    H 2 ( μ , ν ) ( μ i ν i ) 2 ;

  4. if we consider f(ξ) = (ξ − 1)2, then we find the χ 2-distance

χ 2 ( μ , ν ) ( μ i ν i ) 2 / ν i .

The generalized measure K s (μ||ν), known as the relative divergence of type s [3], or simply s-divergence, is defined by

K s ( μ | | ν ) { ( μ i s ν i 1 s 1 ) / s ( s 1 ) , s \ { 0 , 1 } ; K ( ν | | μ ) , s = 0 ; K ( μ | | ν ) , s = 1 ,

where { μ i } 1 n , { ν i } 1 n are given probability distributions and K(μ||ν) is the Kullback-Leibler divergence.

It includes the Hellinger and χ 2 distances, as particular cases.

Indeed,

K 1 / 2 ( μ | | ν ) = 4 ( 1 μ i ν i ) = 2 ( μ i + ν i 2 μ i ν i ) = 2 H 2 ( μ , ν ) ; K 2 ( μ | | ν ) = 1 2 ( μ i 2 ν i 1 ) = 1 2 ( μ i ν i ) 2 ν i = 1 2 χ 2 ( μ , ν ) .

The s-divergence represents an extension of the Tsallis divergence to the real line and accordingly is of importance in the Information Theory. Main properties of this measure are given in [3].

Theorem 1.2

For fixed μ, νΩ +, μν, the s-divergence is a positive, continuous and convex function in s .

We shall use in this article a stronger property.

Theorem 1.3

For fixed μ, νΩ +, μν, the s-divergence is a log-convex function in s .

Proof

This is a corollary of an assertion proved in [4]. It says that for an arbitrary positive sequence {ξ i } and associated weight sequence νQ (see Appendix), the quantity λ s defined by

λ s ν i ξ i s ( ν i ξ i ) s s ( s 1 )

is logarithmically convex in s . By substituting ξ i = μ i /ν i , we obtain that λ s = K s (μν) is log-convex in s . Hence, for any real s, t we have that

K s ( μ | | ν ) K t ( μ | | ν ) K s + t 2 2 ( μ | | ν ) .

Among all mentioned measures, only the Hellinger distance has a symmetry property H 2 = H 2(μ, ν) = H 2(ν, μ). Our aim in this study is to investigate some global properties of the symmetrized measures U s = U s (μ, ν) = U s (ν, μ) ≔ K s (μ||ν) + K s (ν||μ) and V s = V s (μ, ν) = V s (ν, μ) ≔ K s (μ||ν)K s (ν||μ). Since Kullback and Leibler themselves in their fundamental paper [2] (see also [5]) worked with the symmetrized variant J ( μ , ν ) K ( μ | | ν ) + K ( ν | | μ ) = ( μ i ν i ) log ( μ i / ν i ) , our results can be regarded as a continuation of their ideas.

2 Results and proofs

We shall give first some properties of the symmetrized divergence V s = K s (μ||ν)K s (ν||μ).

Proposition 2.1

  1. For arbitrary, but fixed probability distributions μ, νΩ +, μν, the divergence V s is a positive and continuous function in s .

  2. V s is a log-convex (hence convex) function in s .

  3. The graph of V s is symmetric with respect to the line s = 1/2, bounded from below with the universal constant 4H 4 and unbounded from above.

  4. V s is monotone decreasing for s ∈ (−∞, 1/2) and monotone increasing for s ∈ (1/2, +∞).

  5. The inequality

V s t r V r t s V t s r

holds for any r < s < t.

Proof

Part 1 is a simple consequence of Theorem 1.2.

The proof of Part 2 follows by using the result from Theorem 1.3.

Namely, for any s , t , we have

V s V t = [ K s ( μ | | ν ) K s ( ν | | μ ) ] [ K t ( μ | | ν ) K t ( ν | | μ ) ] = [ K s ( μ | | ν ) K t ( μ | | ν ) ] [ K s ( ν | | μ ) K t ( ν | | μ ) ] [ K s + t 2 ( μ | | ν ) ] 2 [ K s + t 2 ( ν | | μ ) ] 2 = [ V s + t 2 ] 2 .

3. Note that

K s ( μ | | ν ) = K 1 s ( ν | | μ ) ; K s ( ν | | μ ) = K 1 s ( μ | | ν ) .

Hence, V s = V 1 −s , that is, V 1 / 2 s = V 1 / 2 + s , s .

Also,

V s = K s ( μ | | ν ) K s ( ν | | μ ) = K s ( μ | | ν ) K 1 s ( μ | | ν ) K 1 / 2 2 ( μ | | ν ) = 4 H 4 .

4. We shall prove only the “increasing” assertion. The other part follows from graph symmetry.

Therefore, for any 1/2 < x < y, we have that

1 y < 1 x < x < y .

Applying Proposition 5.3 (see Appendix) with a = 1 − y, b = y, s = 1 − x, t = x; f(s): = log K s (μ||ν), we get

log K x ( μ | | ν ) + log K 1 x ( μ | | ν ) log K y ( μ | | ν ) + log K 1 y ( μ | | ν ) ,

that is, V x V y for x < y.

5. From Parts 1 and 2, it follows that log V s is a continuous and convex function on . Therefore, we can apply the following alternative form [6].

Lemma 2.2

If ϕ(s) is continuous and convex in s of an open interval I and s 1 < s 2 < s 3 are in I, then

ϕ ( s 1 ) ( s 3 s 2 ) + ϕ ( s 2 ) ( s 1 s 3 ) + ϕ ( s 3 ) ( s 2 s 1 ) 0 .

Hence, for r < s < t, we get

( t r ) log V s ( t s ) log V r + ( s r ) log V t ,

which is equivalent to the assertion of Part 5.□

The properties of the symmetrized measure U s := K s (μ||ν) + K s (ν||μ) are very similar, for which reason some analogous proofs will be omitted.

Proposition 2.3

  1. The divergence U s is a positive and continuous function in s .

  2. U s is a log-convex function in s .

  3. The graph of U s is symmetric with respect to the line s = 1/2, bounded from below with 4H 2 and unbounded from above.

  4. U s is monotone decreasing for s∈ (−∞, 1/2) and monotone increasing for s ∈ (1/2, +∞).

  5. The inequality

U s t r U r t s U t s r

holds for any r < s < t.

Proof

  1. Omitted.

  2. Since both K s and V s are log-convex functions, we get

    U s U t U s + t 2 2 = [ K s ( μ | | ν ) + K s ( ν | | μ ) ] [ K t ( μ | | ν ) + K t ( ν | | μ ) ] [ K s + t 2 ( μ | | ν ) + K s + t 2 ( ν | | μ ) ] 2 = [ K s ( μ | | ν ) K t ( μ | | ν ) K s + t 2 ( μ | | ν ) 2 ] + [ K s ( ν | | μ ) K t ( ν | | μ ) K s + t 2 ( ν | | μ ) 2 ] + [ K s ( μ | | ν ) K t ( ν | | μ ) + K s ( ν | | μ ) K t ( μ | | ν ) 2 K s + t 2 ( μ | | ν ) K s + t 2 ( ν | | μ ) ] [ K s ( μ | | ν ) K t ( μ | | ν ) K s + t 2 ( μ | | ν ) 2 ] + [ K s ( ν | | μ ) K t ( ν | | μ ) K s + t 2 ( ν | | μ ) 2 ] + 2 [ V s V t V s + t 2 ] 0 .

  3. The graph symmetry follows from the fact that U s = U 1 s , s .

    We also have, due to arithmetic-geometric inequality, that

    U s 2 V s 4 H 2 .

    Finally, since μν yields max{μ i /ν i } = μ */ν * > 1, we get

    K s ( μ | | ν ) > ν ( μ / ν ) s 1 s ( s 1 ) ( s ) .

    It follows that both U s and V s are unbounded from above.

  4. Omitted.

  5. The proof is obtained by another application of Lemma 2.2 with ϕ(s) = log U s .□

Remark 2.4

We worked here with the class Ω + for the sake of simplicity. Observe that all results hold, after suitable adjustments, for arbitrary probability distributions and in the continuous case as well.

Remark 2.5

It is not difficult to see that the same properties are valid for normalized divergences U s = 1 2 ( K s ( μ | | ν ) + K s ( ν | | μ ) ) and V s = K s ( μ | | ν ) K s ( ν | | μ ) , with

2 H 2 V s U s .

3 Applications

As an illustration of our results, we provide comparisons for the symmetrized variants of the Kullback-Leibler divergence and χ 2 and Hellinger distances. Characteristic inequalities between those measures are already given in the literature (see [7]):

χ 2 ( μ , ν ) H 2 ( μ , ν ) ; K ( μ | | ν ) H 2 ( μ , ν ) ; χ 2 ( μ , ν ) K ( μ | | ν ) .

Theorem 3.1

The inequality χ 2(μ, ν) ≥ H 2(μ, ν) is improved by the following:

χ 2 ( μ , ν ) + χ 2 ( ν , μ ) 8 H 2 ( μ , ν ) ; χ 2 ( μ , ν ) χ 2 ( ν , μ ) 16 H 4 ( μ , ν ) .

Proof

By Part 3 of Propositions 2.1 and 2.3, we have that the inequalities U s ≥ 4H 2 and V s ≥ 4H 4 are valid for any s . Especially, for s = 2, we get

U 2 = K 2 ( μ | | ν ) + K 2 ( ν | | μ ) 4 H 2 ; V 2 = K 2 ( μ | | ν ) K 2 ( ν | | μ ) 4 H 4 .

Since K 2 ( μ | | ν ) = 1 2 χ 2 ( μ , ν ) , the proof readily follows.□

Theorem 3.2

The estimation K(μ||ν) ≥ H 2(μ, ν) is improved to the next

χ 2 ( μ , ν ) + χ 2 ( ν , μ ) 4 H 2 ( μ , ν ) ; χ 2 ( μ , ν ) χ 2 ( ν , μ ) 4 H 4 ( μ , ν ) .

Proof

Analogous to the previous reason, we obtain that

U 1 = K ( μ | | ν ) + K ( ν | | μ ) 4 H 2 ( μ , ν ) ,

and

V 1 = K ( μ | | ν ) K ( ν | | μ ) 4 H 4 ( μ , ν ) .

Theorem 3.3

The inequality χ 2(μ, ν) ≥ K(μ||ν) is improved to the following:

χ 2 ( μ , ν ) + χ 2 ( ν , μ ) 2 ( K ( μ | | ν ) + K ( ν | | μ ) ) ; χ 2 ( μ , ν ) χ 2 ( ν , μ ) 4 K ( μ | | ν ) K ( ν | | μ ) .

Proof

Applying Part 4 of Propositions 2.1 and 2.3, we obtain U 2U 1 and V 2V 1, that is,

1 2 χ 2 ( μ , ν ) + 1 2 χ 2 ( ν , μ ) K ( μ | | ν ) + K ( ν | | μ ) ,

and

[ 1 2 χ 2 ( μ , ν ) ] [ 1 2 χ 2 ( ν , μ ) ] K ( μ | | ν ) K ( ν | | μ ) ,

as desired.□

The aforementioned results clearly show the usefulness and necessity of studying symmetrized variants of s-divergence.

4 Conclusion

In this study, we considered symmetrized divergences

U s = U s ( μ , ν ) = U s ( ν , μ ) K s ( μ | | ν ) + K s ( ν | | μ ) ,

and

V s = V s ( μ , ν ) = V s ( ν , μ ) K s ( μ | | ν ) K s ( ν | | μ ) ,

where K s (μ||ν) is the s-divergence given by

K s ( μ | | ν ) { ( μ i s ν i 1 s 1 ) / s ( s 1 ) , s \ { 0 , 1 } ; K ( ν | | μ ) , s = 0 ; K ( μ | | ν ) , s = 1 .

Well-known Kullback-Leibler divergence K(μ||ν) was defined by

K ( μ | | ν ) μ i log ( μ i / ν i ) .

It was proved here that both U s and V s are log-convex for s , monotone decreasing for s ∈ (−∞, 1/2) and monotone increasing for s ∈ (1/2, +∞).

Also, they are unbounded from above with U s ≥ 4H 2 and V s ≥ 4H 4, where H denotes the well-known Hellinger distance.



  1. Conflicts of interest: The authors declare that they have no competing interests regarding the publication of this paper.

  2. Author contributions: All authors contributed equally and significantly in writing this article. All the authors read and approved the final manuscript.

  3. Funding: The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group number: RG-1441-420.

  4. Data availability: The data used to support the findings of this study are available from the corresponding author upon request.

Appendix

A convexity property

The class of convex functions is characterized by the inequality

(5.1) ϕ ( ξ ) + ϕ ( θ ) 2 ϕ ( ξ + θ 2 )

A function verifying (5.1) in a certain closed interval I is said to be convex in I. Geometrically, this means that the midpoint of any chord of the curve σ = ϕ(ξ) lies above or on the curve.

Let Q be the set of weights, i.e., of positive real numbers summing to 1. If ϕ is continuous, then much more can be said, i.e., the inequality

(5.2) μ ϕ ( ξ ) + ν ϕ ( θ ) ϕ ( μ ξ + ν θ )

is satisfied for all μ, νQ. Moreover, the equality sign takes place only if ξ = θ or ϕ is linear (cf. [6]).

We shall prove here an important property for this class of convex functions.

Proposition 5.3

Let f(·) be a continuous convex function defined on the closed interval I := [a, b]. Consider the function

(5.4) F ( s , t ) f ( s ) + f ( t ) 2 f ( s + t 2 ) .

Then,

(5.5) max s , t I F ( s , t ) = F ( a , b ) .

Proof

We need to establish the following

F ( s , t ) F ( a , b ) ,

for all a < s < t < b.□

The following assertion is needed in the sequel.

Lemma 5.5

Let f(·) be a continuous convex function on an interval I. If ξ 1, ξ 2, ξ 3I and ξ 1 < ξ 2 < ξ 3 , then

  1. f ( ξ 2 ) f ( ξ 1 ) 2 f ( ξ 2 + ξ 3 2 ) f ( ξ 1 + ξ 3 2 ) ;

  2. f ( ξ 3 ) f ( ξ 2 ) 2 f ( ξ 1 + ξ 3 2 ) f ( ξ 1 + ξ 2 2 ) .

Proof

It suffices to prove the first part. The proof of the second part can be obtained similarly.

Since ξ 1 < ξ 2 < ξ 2 + ξ 3 2 < ξ 3 , there exist μ, ν; 0 < μ, ν < 1, μ + ν = 1, so that ξ 2 = μ ξ 1 + ν ξ 2 + ξ 3 2 .

Hence,

f ( ξ 1 ) f ( ξ 2 ) 2 + f ( ξ 2 + ξ 3 2 ) 1 2 [ f ( ξ 1 ) ( μ f ( ξ 1 ) + ν f ( ξ 2 + ξ 3 2 ) ) ] + f ( ξ 2 + ξ 3 2 ) = ν 2 f ( ξ 1 ) + 2 ν 2 f ( ξ 2 + ξ 3 2 ) f ( ν 2 ξ 1 + 2 ν 2 ( ξ 2 + ξ 3 2 ) ) = f ( ξ 1 + ξ 3 2 ) .

Now, applying part (i) with ξ 1 = a, ξ 2 = s, ξ 3 = b and part (ii) with ξ 1 = s, ξ 2 = t, ξ 3 = b, we have

(5.6) f ( s ) f ( a ) 2 f ( s + b 2 ) f ( a + b 2 ) ;

(5.7) f ( b ) f ( t ) 2 f ( s + b 2 ) f ( s + t 2 ) ,

respectively.

Subtracting (5.6) from (5.7), the desired inequality follows.□

Corollary 5.8

Under the conditions of Proposition 5.3, we have that the double inequality:

(5.9) 2 f ( a + b 2 ) f ( t ) + f ( a + b t ) f ( a ) + f ( b )

holds for each tI.

Proof

Since the condition tI is equivalent to a + btI, applying Proposition 5.3 with s = a + bt we obtain the right-hand side of (5.9). The left-hand side is clear.□

Remark 5.10

The relation (5.9) is a kind of pre-Hermite–Hadamard inequalities. In fact, integrating both sides of (5.9) over I, we get the famous HH inequality:

f ( a + b 2 ) 1 b a a b f ( t ) d t f ( a ) + f ( b ) 2 .

We used here the fact that

a b f ( a + b t ) d t = a b f ( t ) d t .

References

[1] I. Csiszár, Information-type measures of difference of probability functions and indirect observations, Studia Sci. Math. Hungar. 2 (1967), 299–318.Search in Google Scholar

[2] S. Kullback, Information Theory and Statistics, John Willey & Sons, New York, 1959.Search in Google Scholar

[3] I. J. Taneja, New developments in generalized information measures, Adv. Imaging Electron. Phys. 91 (1985), 37–135.10.1016/S1076-5670(08)70106-XSearch in Google Scholar

[4] S. Simić, On logarithmic convexity for differences of power means, J. Inequal. Appl. 2007 (2007), 37359.10.1155/2007/37359Search in Google Scholar

[5] H. Jeffreys, An invariant form for the prior probability in estimation problems, Proc. R. Soc. Lond. Ser. A 186 (1946), 453–461.10.1098/rspa.1946.0056Search in Google Scholar PubMed

[6] G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, Cambridge University Press, Cambridge, 1978.Search in Google Scholar

[7] I. Vajda, Theory of Statistical Inference and Information, Kluwer Academic Press, London, 1989.Search in Google Scholar

Received: 2019-11-19
Revised: 2020-02-17
Accepted: 2020-02-25
Published Online: 2020-05-26

© 2020 Slavko Simić et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 30.5.2024 from https://www.degruyter.com/document/doi/10.1515/math-2020-0027/html
Scroll to top button