Paper The following article is Open access

Understanding frequency distributions of path-dependent processes with non-multinomial maximum entropy approaches

, and

Published 9 March 2017 © 2017 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
, , Citation Rudolf Hanel et al 2017 New J. Phys. 19 033008 DOI 10.1088/1367-2630/aa611d

1367-2630/19/3/033008

Abstract

Path-dependent stochastic processes are often non-ergodic and observables can no longer be computed within the ensemble picture. The resulting mathematical difficulties pose severe limits to the analytical understanding of path-dependent processes. Their statistics is typically non-multinomial in the sense that the multiplicities of the occurrence of states is not a multinomial factor. The maximum entropy principle is tightly related to multinomial processes, non-interacting systems, and to the ensemble picture; it loses its meaning for path-dependent processes. Here we show that an equivalent to the ensemble picture exists for path-dependent processes, such that the non-multinomial statistics of the underlying dynamical process, by construction, is captured correctly in a functional that plays the role of a relative entropy. We demonstrate this for self-reinforcing Pólya urn processes, which explicitly generalize multinomial statistics. We demonstrate the adequacy of this constructive approach towards non-multinomial entropies by computing frequency and rank distributions of Pólya urn processes. We show how microscopic update rules of a path-dependent process allow us to explicitly construct a non-multinomial entropy functional, that, when maximized, predicts the time-dependent distribution function.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

'It seems questionable whether the Boltzmann principle alone, meaning without a complete [...] mechanical description or some other complementary description of the process, can be given any meaning'. Einstein's famous critical comment on the completeness of Boltzmann entropy [1], is still thought provoking. For ergodic systems, e.g. [2], over a well defined set of states, this critique has turned out to be of minor relevance. Here we demonstrate how Einstein's observation becomes relevant again when dealing with non-ergodic, path-dependent systems or processes, i.e. processes where ensemble and time averages cease to yield identical results and the ensemble descriptions of a processes fails to describe the dynamics of a particular process (e.g. compare [3]).

Moreover, for path dependent systems we have to specify what we mean with 'entropy', since no unique generalization of entropy from equilibrium to non-equilibrium systems exists. However, Boltzmann's principle is grounded in the idea that in large systems the most likely samples we may draw from a process, i.e. the so called maximum-configuration, also characterize the typical samples, while it becomes very unlikely to draw atypical samples. In fact we will demonstrate the possibility to directly construct 'entropic functionals' from the microscopic properties determining the dynamics of a large class of non-ergodic processes using maximum-configuration framework. In this approach we identify relative entropy (up to a multiplicative constant) with the logarithm of the probability to observe a particular macro state (which typically is represented by a histogram over a set of observables states), compare e.g. [4]. By construction, maximization of the resulting entropy functionals leads to adequate predictions of statistical properties of non-ergodic processes, in maximum configuration.

For ergodic processes it is possible to replace time-averages of observables by their ensemble-averages, which leads to a tremendous simplification of computations. In particular, this is true for systems composed of independent particles or for Bernoulli processes, i.e. processes where samples are drawn independently, and the states of the independent components or observations collectively follow a multinomial statistics. The multinomial statistics of such a system with W observable states $i=1,\ldots ,W$ is captured by a functional that coincides with Shannon entropy [5], $H(p)=-{\sum }_{i=1}^{W}{p}_{i}\mathrm{log}{p}_{i}$. In this context $p=({p}_{1},\ldots ,{p}_{W})$ is the empirical relative frequency distribution of observing states i in an experiment of drawing from the process for N times, i.e. p = k/N is the normalized histogram of the experiment where state i has been drawn ki times. Clearly, ${\sum }_{i}{k}_{i}=N$. In this context H(p) can be understood as the logarithm of the multinomial factor, i.e. $-{\sum }_{i=1}^{W}{p}_{i}\mathrm{log}{p}_{i}\sim \tfrac{1}{N}\mathrm{log}\left(\genfrac{}{}{0em}{}{N}{k}\right)$, where and $\left(\genfrac{}{}{0em}{}{N}{k}\right)=N!/{\prod }_{i=1}^{W}{k}_{i}!$ (e.g. compare [6]).

Maximization of Shannon entropy under constraints therefore is a way of finding the most likely relative frequency distribution function (normalized histogram of sampled events) one will observe when measuring a system, provided that it follows a multinomial statistics. Constraints represent knowledge about the system. Bernoulli processes with multinomial statistics are characterized by the prior probabilities, $q=({q}_{1},\ldots ,{q}_{W})$. In general, the set of parameters characterizing a process, we denote by θ. In the multinomial case $\theta \equiv q$.

Denoting the probability to measure a specific histogram by $P(k| \theta ,N)$, the most likely histogram $\hat{k}$, that maximizes $P(k| \theta ,N)$, is the optimal predictor or the so-called maximum configuration. For a multinomial distribution function, $P(k| \theta ,N)=\left(\genfrac{}{}{0em}{}{N}{k}\right){\prod }_{i=1}^{N}{q}_{i}^{{k}_{i}}$, where qi are the prior probabilities (or biases), the functional that is maximized is $\psi (p| \theta )=H(p)+{\sum }_{i}{p}_{i}\mathrm{log}{q}_{i}$, which is (up to a sign) called the relative entropy or Kullback–Leibler divergence [7]. The term H(p) coincides with Shannon entropy, the term that depends on q is called cross-entropy and is a linear functional in p. By re-parametrizing ${q}_{i}=\exp (-\beta {\varepsilon }_{i})$, where $\beta \gt 0$ is a constant, one gets the standard max-ent functional

Equation (1)

In statistical physics, the constants ${\varepsilon }_{i}$ typically correspond to energies and β to the so called inverse temperature of a system. Maximization of this functional with respect to p yields the most likely empirical distribution function; this is sometimes called the maximum entropy principle.

Clearly, systems composed of independent components follow a multinomial statistics. Note that a multinomial statistics is also a direct consequence of working with ensembles of statistically independent systems. In this case the multinomial distribution function reflects the ensemble property and is not necessarily a property of the system itself. Therefore H(p) only has physical relevance for systems that consist of sufficiently independent elements. For path-dependent processes, where ensemble- and time-averages typically yield different results, H(p) remains the entropy of the ensemble picture, but ceases to be the 'physical' entropy that captures the time evolution of a path-dependent process. Obviously, assuming that the entropy functional H, which is consistent with an underlying multinomial statistics, in general also is adequate for characterizing path-dependent processes that are inherently non-multinomial (break multinomial symmetry), is nonsensical.

Surprisingly, the possibility that non-multinomial max-ent functionals can be constructed for path-dependent processes seems to have caught only little attention. In [4] it was noticed that a particular class of non-Markovian random walks with strongly correlated increments can be constructed, where the multiplicity of event sequences is no longer given by the multinomial factor, and the max-ent entropy functional of the process class exactly violates the composition axiom of Khinchin [8]. The general method of constructing a relative entropy principle for a particular process class does not inherently depend on the validity of particular information theoretic axioms, which opens a way for a general treatment of path-dependent, and non-equilibrium processes. We demonstrate this by constructing the max-ent entropy of multi-state Pólya urn processes [9, 10].

In multi-state Pólya processes, once a ball of a given color is drawn from an urn, it is replaced by a number of δ balls with the same color- see figure 1. They represent self-reinforcing, path-dependent processes that display the the rich get richer and the winner takes all phenomenon. Pólya urns are related to the beta-binomial distribution, Dirichlet processes, the Chinese restaurant problem, and models of population genetics. Their mathematical properties were studied in [11, 12], extensions and generalizations of the concept are found in [13, 14], applications to limit theorems in [1517]. Pólya urns have been used in a wide range of practical applications including response-adaptive clinical trials [18], tissue growth models [19], institutional development [20], computer data structures [21], resistance to reform in EU politics [22], aging of alleles and Ewens's sampling formula [23, 24], image segmentation and labeling [25], and the emergence of novelties in evolutionary scenarios [26, 27]. A notion of Pólya-divergence was recently defined in [28] in the context of Sanov's theorem [29]. This work characterizes Pólya urns in a regime of weak reinforcement. More precisely the Pólya divergence is derived for situations where the ratio between N, the number of samples drawn from the Pólya urn, and the number A0 of balls initially contained in the urn, are asymptotically fixed by the parameter $\beta \equiv N/{A}_{0}$ such that $\infty \gt \beta \gt 0$. As a consequence, in the limit $N\to \infty $ the reinforcement parameter $\gamma =\delta /{A}_{0}$ asymptotically approaches zero ($\gamma \sim \delta \beta /N\to 0$). So even if the number δ of balls added to the urn at each trial is large, the number of balls initially contained in the urn is much larger. In this regime of weak reinforcement Pólya urns behave similarly to Bernoulli processes. Our constructive approach allows us to access strong reinforcement parameters $\gamma \gt 0$ and the transition of Pólya urn dynamics from Bernoulli-process like behavior to a winner-takes-all type of dynamics can be studied.

2. Non-multinomial max-ent functionals

The general aim is to construct a max-ent functional for a path-dependent process, which allows us to infer the maximum configuration, i.e. the most likely sample we may draw from a process of interest. From a given class of processes X we select a particular process $X(\theta )$, specified by a set of parameters, θ. Running the processes $X(\theta )$ for N consecutive iterations produces a sequence of observed states $x(\theta ,N)=[{x}_{1},\ldots ,{x}_{N}]$, where each xn takes a value from W possible states. As before, we assume the existence of a most likely histogram $\hat{k}$, that maximizes $P(k| \theta ,N)$. To construct a max-ent functional for X, one has to conveniently rescale $P(k| \theta ,N)$, which happens in two steps. First, we define ${\rm{\Psi }}(p| \theta ,N)\equiv \mathrm{log}P({Np}| \theta ,N)$. Note, if $\hat{k}$ maximizes $P(k| \theta ,N)$, then $\hat{p}=\hat{k}/N$ maximizes ${\rm{\Psi }}(p| \theta ,N)$. Second, a scaling factor $\phi (N)$ can be used to scale out the leading term of the N dependence of Ψ. Typically $\phi (N)={N}^{c}$, for some constant $1\geqslant c\gt 0$, compare [4]. $\phi (N)$ corresponds to the effective number of degrees of freedom of samples of size N. We identify the max-ent functional with $\psi (p| \theta ,N)\equiv {\rm{\Psi }}(p| \theta ,N)/\phi (N)$. Again, if $\hat{k}$ maximizes $P(k| \theta ,N)$ with ${\sum }_{i}{k}_{i}=N$, then $\hat{p}=\hat{k}/N$ maximizes $\psi (p| \theta ,N)$, with ${\sum }_{i}{p}_{i}=1$. In other words, $\psi (p| \theta ,N)$ represents (up to a sign) a functional providing us with a notion of relative entropy (information divergence) for the process-class X. If this process-class X is the class of Bernoulli-processes, such that $P(k| q,N)$ is the multinomial distribution, then asymptotically $-\psi (p| q,N)\sim {\sum }_{i}{p}_{i}(\mathrm{log}{p}_{i}-\mathrm{log}{q}_{i})$, is the Kullback–Leibler divergence, and $\phi (N)=N$. In the following we compute $\psi (p| \theta ,N)$ for Pólya urn processes.

3. Max-ent functional for Pólya urns

In urn models observable states i are represented by the colors balls contained in the urn can have. The likelihood of drawing a ball of color i is determined by the number of balls contained in the urn. Initially the urn contains ai balls of color $i=1,\ldots ,W$. The initial prior probability to draw a ball of color i is given by ${q}_{i}={a}_{i}/{A}_{0}$, where ${A}_{0}={\sum }_{i}{a}_{i}$ is the total number of balls initially in the urn. Balls are drawn sequentially from the urn. Whenever a ball of color i is drawn, it is put back into the urn and another δ balls of the same color are added. This defines the multi-state Pólya process [9]. A particular Pólya process is fully characterized by the parameters, $\theta =({q}_{1},\ldots ,{q}_{W};{A}_{0},\delta )$. Drawing without replacement is the hypergeometric process, drawing with replacement ($\delta =0$), is the multinomial process.

If $\delta \gt 0$, after N trials there are ${a}_{i}(N)={a}_{i}+\delta {k}_{i}$ balls of color i in the urn (${a}_{i}={a}_{i}(0)$). The total number of balls is $A(N)=\sum {a}_{i}(N)={A}_{0}+N\delta $, and the probability to draw a ball of color i in the $(N+1)$ th step is

Equation (2)

which depends on the history of the process in terms of the histogram k. With $x(0)=[\,]$ the empty sequence, the probability of sampling sequence x can be computed

Equation (3)

where the function ${m}^{(\delta ,r)}$ is defined as

Equation (4)

Note that ${m}^{(\delta ,r)}$ generalizes the multinomial law,

Equation (5)

and forms a one-parameter generalization of powers mr. For $\delta =0$, ${m}^{(0,r)}={m}^{r}$ and for $\delta =1$, ${m}^{(1,r)}=(m+r-1)!/(m-1)!$.

The probability of observing a particular histogram k after N trials becomes

Equation (6)

with ${\sum }_{\{{k}_{i}\geqslant 0| {\sum }_{i}{k}_{i}=N\}}P(k| \theta ,N)=1$. Note that $P(k| \theta ,N)$ is almost of multinomial form, it is a multinomial factor times a term depending on θ. One might conclude that the max-ent functional for Pólya processes is Shannon entropy in combination with a generalized cross-entropy term that depends on θ. However, this turns out to be wrong, since contributions from the generalized powers ${m}^{(\delta ,r)}$ in equation (6) cancel the multinomial factor almost completely. To see this we first rewrite

Equation (7)

where we use ${\sum }_{r=1}^{s}\tfrac{1}{r}\sim \mathrm{log}(s+1)$ and $1+y\sim \exp (y)$, which is valid for sufficiently small $y={a}_{i}/\delta $, i.e. for sufficiently large δ. With the notation $\gamma \equiv \delta /{A}_{0}$ we obtain

Equation (8)

where k = pN. Following the construction discussed above, we identify ${\rm{\Psi }}(p| \theta ,N)=\mathrm{log}P({pN}| \theta ,N)$, which no longer scales explicitly with N, but $\phi (N)=1$ (c = 0), so that $\psi ={\rm{\Psi }}$. Inserting equation (7) into (6), leads to the expression

Equation (9)

More precisely, the finite size Pólya 'entropy' can be conveniently identified with the terms in $\psi (p| \theta )$ that do not depend on q,

Equation (10)

where $\lambda \gt 0$ can in principle be chosen freely. Up to a constant depending only on γ and N, the finite size cross-entropy can be identified with

Equation (11)

Convenient choices for λ are the following. $\lambda =1$, represent $\mathrm{log}\psi $ as in equation (9). Alternatively, one may choose $\lambda =1/(W\gamma )$, which is a convenient choice if one considers a uniform initial distribution, ${q}_{i}=1/W$, of balls in the urn. The finite size Pólya entropy equation (10), yields a well defined entropy even if some states i have vanishing probability pi = 0.

To simplify the following analysis we consider the limit $N\to \infty $ of this functional, where the notion of 'information divergence' for Pólya processes, essentially reduces to

Equation (12)

up to terms of order $1/N$ and terms that do not explicitly depend on pi or qi. In this limit the asymptotic Pólya 'entropy' is given by,

Equation (13)

We observe that one cannot derive ${H}^{{\rm{P}}\acute{{\rm{o}}}\mathrm{lya}}(p)$ from the multiplicity of the system, which gets canceled by counter terms, as we have seen above. In addition, we note that the q dependent terms, ${\sum }_{i}{q}_{i}\mathrm{log}{p}_{i}$, in equation (12) play the role of the Pólya 'cross-entropy', which is no longer linear in p.

Maximizing $\psi (p| \theta )$ with respect to p on $\sum {p}_{i}=1$, either leads to the solution

Equation (14)

for $0\lt {p}_{i}\lt 1$, or, if this can not be satisfied, to boundary solutions pi = 0. ζ is a normalization constant. There exist three scenarios:

  • (A)  
    For $\gamma \lt \min (q)$, equation (14) is the max-ent solution for all $i$ (no boundary-solutions). The limit $\gamma \to 0$ provides the correct multinomial limit ${p}_{i}\to {q}_{i}$.
  • (B)  
    If $\max (q)\gt \gamma \gt \min (q)$, ψ gets maximal for those $i$ with ${q}_{i}\gt \gamma $ and follows solution equation (14); those $i$ where ${q}_{i}\lt \gamma $ are boundary-solutions, ${p}_{i}=0$.
  • (C)  
    For $\gamma \gt \max (q)$ all ${p}_{i}$ are boundary-solutions, meaning that one winner $i$ takes it all, ${p}_{i}=1$, while all other states have vanishing probability.

Since ${\partial }^{2}{\psi }_{{\rm{P}}\acute{{\rm{o}}}\mathrm{lya}}/\partial {p}_{i}^{2}\lt 0\ $ if $\gamma \lt {q}_{i}$, for all $i$, case (A) applies. If ${q}_{i}\lt \gamma $, equation (14) becomes negative but also unstable and is replaced by a boundary solution: cases (B) and (C). The Pólya max-ent not only allows us to predict pi from the initial prior probabilities qi, it also identifies γ as the crucial parameter that distinguishes between the three regimes of Pólya urn dynamics5 . For sufficiently large but finite N, the analysis above is more involved but solvable.

Figure 1.

Figure 1. Schematic illustration of a Pólya process. When a ball of a certain color is drawn, it is replaced by $1+\delta $ balls of the same color. Then the next ball is drawn and the process is repeated for N iterations. Here $\delta =2$. This reinforcement process creates a history-dependent dynamics. The configurations obtained after successive iterations have non-multinomial structure.

Standard image High-resolution image

Assuming uniformly distributed priors, ${q}_{i}=1/W$ for all i, the max-ent result equation (14) correctly predicts uniformly distributed ${\hat{p}}_{i}=1/W$, while observed distributions p may strongly deviate from this prediction. This result reflects the fact that despite the Pólya urn process being inherently instable (e.g. winner takes all) with little chance of predicting who in particular will win, i.e. which color of balls will dominate the others, repeating the experiment many times every color of balls has the same chance to win (or biased according to the priors q). This discrepancy between ensemble average and time average makes it impossible to predict who in particular will win or loose in the course of time. However, using detailed information about the process one can predict how winners win. In particular one can (i) predict the onset of instability, i.e. the emergence of colors i that will effectively never be drawn, at ${\gamma }_{\mathrm{crit}}=\min (q)$ (compare figure 2), and (ii) construct a maximum entropy functional for predicting the time dependent frequency distribution of a process, i.e. the number of times one observes states i for n times. As a consequence, one also can derive the rank distributions of the process, i.e. the frequency of observing balls of some color after ranking those frequencies according to their magnitude.

Figure 2.

Figure 2. The fraction of distinct colors contained in the Pólya urn, which at least get sampled once within the first N = 500 steps of the process, for numbers of colors $W=2,3,\ldots ,10$ for uniformly distributed initial conditions ${q}_{i}=1/W$, $i=1,\ldots ,W$, evaluated from 250 runs for each $\gamma =0.01,0.02,\ldots ,1$. The onset of instability ${\gamma }_{\mathrm{crit}}(q)=\min (q)=1/W$ (circular markers) is very well reproduced experimentally.

Standard image High-resolution image

3.1. Rank and frequency distributions of Pólya urns

With the presented max-ent approach we now compute frequency distribution functions. Given the histogram $k=({k}_{1},{k}_{2},\ldots {k}_{W})$ is obtained after N iterations of the process,we define new variables,

Equation (15)

where χ is the characteristic function that returns 1 if the argument is true and 0 if false. nz(k) is the number of colors i that occur z times after running the Pólya process for N iterations. nz is subject to the two constraints,

Equation (16)

which can be included in the maximization procedure introducing Lagrange multipliers, α and β. The probability of observing some $n=({n}_{1},\ldots ,{n}_{N})$ is

Equation (17)

Defining the relative frequencies ${\pi }_{z}={n}_{z}/W$ and ${\bar{p}}_{z}=z/N$ we can construct the max-ent functional from $\tilde{P}(n| \theta ,N)$. We identify $\tilde{\psi }(\pi | \theta ,N)\equiv \mathrm{log}(\tilde{P}(n| \theta ,N))/W$.

For the multinomial $P(k| \theta ,N)=\left(\genfrac{}{}{0em}{}{N}{k}\right){\prod }_{i}{q}_{i}^{{k}_{i}}$, and uniform priors ${q}_{i}=1/W$ we find up to an additive constant,

Equation (18)

$\tilde{\psi }(\pi | \theta ,N)$ has to be maximized subject to equation (16),

Equation (19)

so that we get the asymptotic solution for large W and large N, ($N\gg W\gg 1$),

Equation (20)

This is the Poisson distribution, exactly as expected for multinomial processes. $\phi ={{N}{\rm{e}}}^{-(1+\tfrac{\beta }{N})}$, ζ is a normalization constant, and ${\pi }_{z}$ gets maximal at $\hat{z}=\phi \sim N/W$.

For the Pólya urn with uniform priors we get from equation (17)

Equation (21)

$Z(\theta ,N)$ is the normalization. Up to a constant the max-ent functional ${\tilde{\psi }}_{{\rm{P}}\acute{{\rm{o}}}\mathrm{lya}}(\pi | \theta ,N)\equiv \mathrm{log}(P(n| \theta ,N))/W$ is

Equation (22)

maximizing ${\tilde{\psi }}_{{\rm{P}}\acute{{\rm{o}}}\mathrm{lya}}(\pi | \theta ,N)$ under the conditions of equation (19) provides the frequency distribution of the Pólya process for uniform priors,

Equation (23)

with $\phi =\exp (-\beta )$, and normalization $\zeta =\exp (1+\alpha ){N}^{\tfrac{1}{W\gamma }-1}$.

The rank distribution of states, f(r), can now be obtained as follows. r = 1 is the state that occurs most frequently, r = W is the least occupied state. For $r=1,\ldots ,W$ we define intervals $[{t}_{r+1}{t}_{r}]$ with ${t}_{1}=N$ and ${t}_{W+1}=0$, such that ${\sum }_{{t}_{r+1}\leqslant z\lt {t}_{r}}{\pi }_{z}\sim 1/W$. To find ti we substitute sums by integrals and get

Equation (24)

Results for the frequency distributions for ai = 1, W = 100, and $\delta =2$ are shown in figure 3, together with a numerical simulation for the same process. The inset shows the rank distribution. The Pólya max-ent predicts frequency and rank distribution extremely well.

Figure 3.

Figure 3. Frequency distribution of a Pólya urn process and uniform initial conditions (red line), for W = 100, $\delta =2$, ai = 1 for all i = 1, and $N={10}^{5}$ steps. Simulations are shown for 100 (green) and 5000 (blue) repetitions of the process. Inset: rank distributions of the max-ent result and the numerical realizations in semi-log scale.

Standard image High-resolution image

The above results were all derived under the assumption that $\gamma \gt 0$ is sufficiently large. By numerical simulation we find that the solution equation (23) also works remarkably well for very small values of γ, if the value of γ in equation (23) is appropriately renormalised, $\gamma \to {\gamma }_{0}$. In particular for $\gamma =0$ (multinomial process) we sample the Poisson distribution function, equation (20). The Pólya max-ent solution recovers the Poisson distribution extremely well if $\gamma =0\to {\gamma }_{0}(W,N)=1/(N+3W)$. In this sense the Pólya max-ent remains adequate in the limit of small γ.

4. Discussion

Pólya urns offer a transparent way to study self-reinforcing systems with explicit path-dependence. They behave similarly to Bernoulli processes if the reinforcement is weak, i.e. if the number of balls initially contained in the urn is large in comparison to the number of balls added to the urn at each trial. This weak reinforcement regime has been studied in [28].

If reinforcement gets stronger, Pólya urns start to behave differently and the Pólya divergence derived in [28] no longer applies. Based on the microscopic rules of the process, we constructively derive the generalized information divergence or relative entropy $-\psi $, for strongly reinforcing Pólya urns. The functional ψ acts as the corresponding non-multinomial max-ent functional. This provides us with an alternative to the ensemble approach for path-dependent processes that enables us to predict the statistics of the process. The maximization of the functional leads to an equivalent of the classical maximum configuration approach, which by definition predicts the most likely distribution function. In this sense maximum configuration predictions are optimal, and can be used to understand even details of the statistics of path-dependent processes, such as their frequency and rank distributions.

It is interesting to note that the functional playing the role of the entropy in the Pólya processes violates at least two of the four classic information theoretic (Shannon–Khinchin) axioms which determine Shannon entropy [8]. Even more, for the finite size Pólya entropy, three of the four axioms are violated. This indicates that the classes of generalized entropy functionals that are useful for a max-ent approach may be even larger than expected [30, 31]. One might speculate that in this sense the classic information theoretic axioms are too rigorous, when it comes to characterizing information flow and phase space structure in non-stationary, path-dependent, processes. The observation that each particular class of non-multinomial processes requires a matching max-ent functional that can in principle be constructed from the generative rules of a process, opens the applicability of max-ent approaches for a wide range of complex systems in a meaningful way. The generalized max-ent approach in this sense responds to naturally Einsteins comment on Boltzmann's principle.

Finally we note the implications for statistical inference with data from non-multinomial sources, which implicitly involves the estimation of the parameters θ that determine the process that generates the data. In a max-ent approach this is done by fitting classes of curves to the data, that are consistent with the max-ent approach. For doing this, the nature of the process, i.e. its class, needs to be known. For path-dependent processes, which are non-multinomial by nature, entropy will no longer be Shannon entropy H, and the information divergence will no longer be the Kullback–Leibler divergence.

Acknowledgments

This work was supported by the Austrian Science Fund FWF under P29252 Generalized information theoretic approaches for history dependent- processes and the FP7 projects LASAGNE no. 318132 and MULTIPLEX no. 318132.

Footnotes

  • Note that a Pólya urn U1 that initially contains A0 balls and has evolved for N steps with $\gamma =\delta /{A}_{0}$, can be regarded as another Pólya urn, U2, in its initial state, containing ${A}_{N}={A}_{0}+\delta N$ balls, that evolves with an effective reinforcement parameter $\gamma (N)=\delta /{A}_{N}$, and the initial distribution of balls $q(N)=p(i| k(N),\theta )$, where k(N) is the histogram of colors drawn in the first N steps of the original urn process U1. Obviously the asymptotic behavior of Pólya urns gets determined early on in the process, where the effective reinforcement parameter $\gamma (N)$ is largest. The probability of a Pólya urn to enter a winner-takes-all dynamics, i.e. to end up in one of the scenarios A, B, or C, depends on the reinforcement parameter γ.

Please wait… references are loading.
10.1088/1367-2630/aa611d