1 Introduction

This paper studies the limiting free energy of a class of models with Boltzmann–Gibbs-type distributions on random walk paths. The energy of a path is defined through a coupling of the walk with a random environment. Our main interest is the directed polymer in an i.i.d. random environment, also called the polymer with bulk disorder. This model was introduced in the statistical physics literature by Huse and Henley in 1985 [19]. For recent surveys see [7, 18]. The free energy of these models is a central object of study. Its dependence on model parameters gives information about phase transitions. In quenched settings the fluctuations of the quenched free energy are closely related to the fluctuations of the path.

Some properties we develop can be proved with little or no extra cost more generally. The formulation then consists of a general walk in a potential that can depend both on an ergodic environment and on the steps of the walk. We call the model random walk in a random potential (RWRP).

This paper concentrates mainly on the point-to-point version of RWRP where the walk is fixed at two points and allowed to fluctuate in between. The point-to-line model was studied in the companion paper [34]. The motivation for both papers was that the free energy was known only as a subadditive limit, with no explicit formulas. We provide two variational formulas for the point-to-point free energy. One comes in terms of entropy and we develop it in detail after preliminary work on the regularity of the free energy. The other involves correctors (gradients of sorts) and can be deduced by combining a convex duality given in (4.3) below with Theorem 2.3 from [34].

Significant recent progress has taken place in the realm of 1+1 dimensional exactly solvable directed polymers (see review [10]). Work on general models is far behind. Here are three future directions opened up by our results in the present work and [34].

  1. (i)

    One goal is to use this theory to access properties of the limiting free energy, especially in regimes of strong disorder where the quenched model and annealed model deviate from each other.

  2. (ii)

    The variational formulas identify certain natural corrector functions and Markov processes whose investigation should shed light on the polymer models themselves. Understanding this picture for the exactly solvable log-gamma polymer [37] will be the first step.

  3. (iii)

    The zero-temperature limits of polymer models are last-passage percolation models. In this limit the free energy turns into the limit shape. Obtaining information about limit shapes of percolation models has been notoriously difficult. A future direction is to extend the variational formulas to the zero-temperature case.

In the remainder of the introduction we describe the model and some examples, give an overview of the paper, and describe some past literature.

1.1 The RWRP model and examples

Fix a dimension \(d\in \mathbb{N }\). Let \(\mathcal{R }\subset \mathbb{Z }^d\) be a finite subset of the square lattice and let \(P\) denote the distribution of the random walk on \(\mathbb{Z }^d\) started at \(0\) and whose transition probability is \(\hat{p}_z=1/|\mathcal{R }|\) for \(z\in \mathcal{R }\) and \(\hat{p}_z=0\) otherwise. In other words, the random walk picks its steps uniformly at random from \(\mathcal{R }\). \(E\) denotes expectation under \(P\). \(\mathcal{R }\) generates the additive group \(\mathcal{G }= \{\sum _{z\in \mathcal{R }}a_z z:a_z\in \mathbb{Z }\}\).

An environment \(\omega \) is a sample point from a probability space \((\Omega , \mathfrak{S }, \mathbb{P })\). \(\Omega \) comes equipped with a group \(\{T_z:{z\in \mathcal{G }}\}\) of measurable commuting transformations that satisfy \(T_{x+y}=T_xT_y\) and \(T_0\) is the identity. \(\mathbb{P }\) is a \(\{T_z:z\in \mathcal{G }\}\)-invariant probability measure on \((\Omega ,\mathfrak{S })\). This is summarized by the statement that \((\Omega ,\mathfrak{S },\mathbb{P },\{T_z:z\in \mathcal{G }\})\) is a measurable dynamical system. As usual \(\mathbb{P }\) is ergodic if \(T_z^{-1}A=A\) for all \(z\in \mathcal{R }\) implies \(\mathbb{P }(A)=0\) or \(1\), for events \(A\in \mathfrak{S }\). A stronger assumption of total ergodicity says that \(\mathbb{P }(A)=0\) or \(1\) whenever \(T_z^{-1}A=A\) for some extreme point \(z\) of the convex hull of \(\mathcal{R }\). \(\mathbb{E }\) will denote expectation relative to \(\mathbb{P }\).

A potential is a measurable function \(g:\Omega \times \mathcal{R }^\ell \rightarrow \mathbb{R }\) for some integer \(\ell \ge 0\). The case \(\ell =0\) means that \(g=g(\omega )\), a function of \(\omega \) alone. Given an environment \(\omega \) and an integer \(n\ge 1\) define the quenched polymer measure

$$\begin{aligned} Q^{g,\omega }_{n}(A) =\frac{1}{Z_{n}^{g,\omega }}E\bigl [e^{\sum _{k=0}^{n-1}g(T_{X_k}\omega , \,Z_{k+1,k+\ell })}{\small 1}\!\!1_A(\omega , X_{0,\infty })\bigr ], \end{aligned}$$
(1.1)

where \(A\) is an event on environments and paths and

$$\begin{aligned} Z_{n}^{g,\omega }=E\big [e^{\sum _{k=0}^{n-1}g(T_{X_k}\omega ,\, Z_{k+1,k+\ell })}\big ] \end{aligned}$$

is the normalizing constant called the quenched partition function. This model we call random walk in a random potential (RWRP). Above \(Z_k=X_k-X_{k-1}\) is a random walk step and \(Z_{i,j}=(Z_i,\cdots ,Z_j)\) a vector of steps. Similar notation will be used for all finite and infinite vectors and path segments, including \(X_{k,\infty }=(X_k, X_{k+1}, \cdots )\) and \(z_{1,\ell }=(z_1,\cdots , z_\ell )\) used above. Note that in general the measures \(Q^{g,\omega }_{n}\) defined in (1.1) are not consistent as \(n\) varies. Here are some key examples of the setting.

Example 1.1

(I.I.D. environment) A natural setting is the one where \(\Omega =\Gamma ^{\mathbb{Z }^d}\) is a product space with generic points \(\omega =(\omega _x)_{x\in \mathbb{Z }^d}\) and translations \((T_x\omega )_y=\omega _{x+y}\), the coordinates \(\omega _x\) are i.i.d. under \(\mathbb{P }\), and \(g(\omega ,z_{1,\ell })\) a local function of \(\omega \), which means that \(g\) depends on only finitely many coordinates \(\omega _x\). This is a totally ergodic case. In this setting \(g\) has the \(r_0\)-separated i.i.d. property for some positive integer \(r_0\). By this we mean that if \(x_1,\cdots , x_m\in \mathcal{G }\) satisfy \(\left| x_i-x_j\right| \ge r_0\) for \(i\ne j\), then the \(\mathbb{R }^{\mathcal{R }^\ell }\)-valued random vectors \(\{ \bigl (g(T_{x_i}\omega , z_{1,\ell })\bigr )_{z_{1,\ell }\in \mathcal{R }^\ell }: 1\le i\le m\}\) are i.i.d. under \(\mathbb{P }\).

Example 1.2

(Strictly directed walk and local potential in i.i.d. environment) A specialization of Example 1.1 where \(0\) lies outside the convex hull of \(\mathcal{R }\). This is equivalent to the existence of \({\hat{u}}\in \mathbb{Z }^d\) such that \({\hat{u}}\cdot z>0\) for all \(z\in \mathcal{R }\).

Example 1.3

(Stretched polymer) A stretched polymer has an external field \(h\in \mathbb{R }^d\) that biases the walk, so the potential is \(g(\omega ,z)=\Psi (\omega )+h\cdot z\). See the survey paper [20] and its references for the state of the art on stretched polymers in a product potential.

Example 1.4

(Random walk in random environment) To cover RWRE take \(\ell =1\) and \(g(\omega ,z)=\log p_z(\omega )\) where \((p_z)_{z\in \mathcal{R }}\) is a measurable mapping from \(\Omega \) into \(\mathcal{P }=\{(\rho _z)_{z\in \mathcal{R }} \in [0,1]^\mathcal{R }:\sum _z \rho _z=1\}\), the space of probability distributions on \(\mathcal{R }\). The quenched path measure \(Q^\omega _0\) of RWRE started at \(0\) is the probability measure on the path space \((\mathbb{Z }^d)^{\mathbb{Z }_+}\) defined by the initial condition \(Q^\omega _0(X_0=0)=1\) and the transition probability \(Q^\omega _0(X_{n+1}=y\vert X_n=x) =p_{y-x}(T_x\omega )\). The \((X_0,\cdots ,X_n)\)-marginal of the polymer measure \(Q^{g,\omega }_n\) in (1.1) is the marginal of the quenched path measure \(Q^\omega _0\).

1.2 Overview of the paper

Under some assumptions article [34] proved the \(\mathbb{P }\)-almost sure existence of the limit

$$\begin{aligned} \Lambda _\ell (g) = \lim _{n\rightarrow \infty } n^{-1} \log E\big [e^{\sum _{k=0}^{n-1}g(T_{X_k}\omega ,Z_{k+1,k+\ell })}\big ] . \end{aligned}$$
(1.2)

In different contexts this is called the limiting logarithmic moment generating function, the pressure, or the free energy. One of the main results of [34] was the variational characterization

$$\begin{aligned} \Lambda _\ell (g)=\sup _{\begin{array}{c} \mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ), c>0 \end{array}} \bigl \{E^\mu [\min (g,c)]-H_{\ell }(\mu )\bigr \}. \end{aligned}$$
(1.3)

\(\mathcal{M }_1(\varvec{\Omega }_\ell )\) is the space of probability measures on \(\varvec{\Omega }_\ell =\Omega \times \mathcal{R }^\ell \) and \(H_{\ell }(\mu )\) is an entropy, defined in (5.2) below.

In the present paper we study the quenched point-to-point free energy

$$\begin{aligned} \Lambda _\ell (g,\zeta )=\lim _{n\rightarrow \infty } n^{-1} \log E\big [e^{\sum _{k=0}^{n-1}g(T_{X_k}\omega ,Z_{k+1,k+\ell })}{\small 1}\!\!1\{X_n= {\hat{x}}_n(\zeta )\}\big ] \end{aligned}$$
(1.4)

where \(\zeta \in \mathbb{R }^d\) and \({\hat{x}}_n(\zeta )\) is a lattice point that approximates \(n\zeta \). Our main result is a variational characterization of \(\Lambda _\ell (g,\zeta )\) which is identical to (1.3), except that now the supremum is over distributions \(\mu \) on \(\varvec{\Omega }_\ell \) whose mean velocity for the path is \(\zeta \). For directed walks in i.i.d. environments this is Theorem 5.3 in Sect. 5.

We begin in Sect. 2 with the existence of \(\Lambda _\ell (g,\zeta )\) and regularity in \(\zeta \). A by-product is an independent proof of the limit (1.2). We relate \( \Lambda _\ell (g)\) and \(\Lambda _\ell (g,\zeta )\) to each other in a couple different ways. This relationship yields a second variational formula for \(\Lambda _\ell (g,\zeta )\). Combining convex duality (4.3) with Theorem 2.3 from [34] gives a variational formula for \(\Lambda _\ell (g,\zeta )\) that involves tilts and corrector functions rather than measures.

Section 3 proves further regularity properties for the i.i.d. strictly directed case: continuity of \(\Lambda _\ell (g,\zeta )\) in \(\zeta \) and \(L^p\) continuity (\(p>d\)) in \(g\).

Section 4 is for large deviations. Limits (1.2) and (1.4) give a quenched large deviation principle for the distributions \(Q^{g,\omega }_{n}\{X_n/n\in \cdot \,\}\), with rate function \(I^g(\zeta ) = \Lambda _\ell (g)-\Lambda ^\mathrm{usc (\zeta )}_\ell (g,\zeta )\) where \(\Lambda ^\mathrm{usc (\zeta )}_\ell (g,\zeta )\) is the upper semicontinuous regularization. This rate function is continuous on the convex hull of \(\mathcal{R }\). We specialize the LDP to RWRE and give an overview of past work on quenched large deviations for RWRE.

Section 5 develops the entropy representation of \(\Lambda _\ell (g,\zeta )\) for the i.i.d. strictly directed case. The general case can be found in the preprint version [33]. The LDP is the key, through a contraction principle.

Our results are valid for unbounded potentials, provided we have control of the mixing of the environment. When shifts of the potential are strongly mixing, \(g\in L^p\) for \(p\) large enough suffices. In particular, for an i.i.d. environment and stricly directed walks, the assumption is that \(g\) is local in its dependence on \(\omega \) and \(g(\cdot \,,z_{1,\ell })\in L^p(\mathbb{P })\) for some \(p>d\).

Section 6 illustrates the theory for a directed polymer in an i.i.d. environment in the \(L^2\) region (weak disorder, dimension \(d\ge 3\)). The variational formula is solved by an RWRE in a correlated environment, and a tilt (or “stretch” as in Example 1.3) appears as the dual variable of the velocity \(\zeta \).

1.3 Literature and past results

Standard references for RWRE are [2, 40, 44], and for RWRP [7, 18, 39]. RWRE large deviations literature is recounted in Sect. 4 after Theorem 4.3. Early forms of our variational formulas appeared in position-level large deviations for RWRE in [36].

A notion related to the free energy is the Lyapunov exponent defined by

$$\begin{aligned} \lim _{n\rightarrow \infty } n^{-1}\log E\Big [e^{\sum _{k=0}^{\tau ({\hat{x}}_n(\zeta ))-1}g(T_{X_k} \omega ,Z_{k+1,k+\ell })}{\small 1}\!\!1\{\tau ({\hat{x}}_n(\zeta ))<\infty \}\Big ] \end{aligned}$$

where \(\tau (x)=\inf \{k\ge 0:X_k=x\}\). Results on Lyapunov exponents and the quenched level 1 LDP for nearest-neighbor polymers in i.i.d. random potentials have been proved by Carmona and Hu [5], Mourrat [28] and Zerner [45]. Some of the ideas originate in Sznitman [38] and Varadhan [41].

Our treatment resolves some regularity issues of the level 1 rate function raised by Carmona and Hu [5, Remark 1.3]. We require \(g\) to be finite, so for example walks on percolation clusters are ruled out. Mourrat [28] proved a level 1 LDP for simple random walk in an i.i.d. potential \(g(\omega _0)\le 0\) that permits \(g=-\infty \) as long as \(g(\omega _x)>-\infty \) percolates.

The directed i.i.d. case of Example 1.2 in dimension \(d=2\), with a potential \(g(\omega _0)\) subject to some moment assumptions, is expected to be a member of the KPZ universality class (Kardar–Parisi–Zhang). The universality conjecture is that the centered and normalized point-to-point free energy should converge to the Airy\(_2\) process. At present such universality remains unattained. Piza [29] proved in some generality that fluctuations of the point-to-point free energy diverge at least logarithmically. Among the lattice models studied in this paper one is known to be exactly solvable, namely the log-gamma polymer introduced in [37] and further studied in [11, 16]. For that model the KPZ conjecture is partially proved: correct fluctuation exponents were verified in some cases in [37], and the Tracy–Widom GUE limit proved in some cases in [3]. KPZ universality results are further along for zero temperature polymers (oriented percolation or last-passage percolation type models). Article [10] is a recent survey of these developments.

1.4 Notation and conventions

On a product space \(\Omega =\Gamma ^{\mathbb{Z }^d}\) with generic points \(\omega =(\omega _x)_{x\in \mathbb{Z }^d}\), a local function \(g(\omega )\) is a function of only finitely many coordinates \(\omega _x\). \(\mathbb{E }\) and \(\mathbb{P }\) refer to the background measure on the environments \(\omega \). For the set \(\mathcal{R }\subset \mathbb{Z }^d\) of admissible steps we define \(M=\max \{\left| z\right| :z\in \mathcal{R }\}\), and denote its convex hull in \(\mathbb{R }^d\) by \(\mathcal{U }=\{ \sum _{z\in \mathcal{R }} a_z z: 0\le a_z\in \mathbb{R },\,\sum _z a_z=1\}\). The steps of an admissible path \((x_k)\) are \(z_k=x_k-x_{k-1}\in \mathcal{R }\).

In general, the convex hull of a set \(\mathcal I \) is \(\mathrm{co}\,\mathcal I \). A convex set \(\mathcal C \) has its relative interior \(\mathrm{ri}\,\,\mathcal C \), its set of extreme points \(\mathrm{ex}\,\mathcal C \), and its affine hull aff \(\mathcal C \). The upper semicontinuous regularization of a function \(f\) is denoted by \(f^{\text{ usc }}(x)=\inf _{\text{ open }\,B\ni x} \sup _{y\in B} f(y)\) with an analogous definition for \(f^{\text{ lsc }}\). \(E^\mu [f]=\int f\,d\mu \) denotes expectation under the measure \(\mu \). As usual, \(\mathbb{N }=\{1,2, 3,\cdots \}\) and \(\mathbb{Z }_+=\{0,1,2,\cdots \}\). \(x\vee y=\max (x,y)\) and \(x\wedge y=\min (x,y)\).

2 Existence and regularity of the quenched point-to-point free energy

Standing assumptions for this section are that \((\Omega ,\mathfrak{S },\mathbb{P },\{T_z:z\in \mathcal{G }\})\) is a measurable dynamical system and \(\mathcal{R }\) is finite. This will not be repeated in the statements of the theorems. When ergodicity is assumed it is mentioned. For the rest of this section we fix the integer \(\ell \ge 0\). Define the space \(\varvec{\Omega }_\ell =\Omega \times \mathcal{R }^\ell \). If \(\ell =0\) then \(\varvec{\Omega }_\ell =\Omega \). Convex analysis will be important throughout the paper. The convex hull of \(\mathcal{R }\) is denoted by \(\mathcal{U }\), the set of extreme points of \(\mathcal{U }\) is \(\mathrm{ex}\,\mathcal{U }\subset \mathcal{R }\), and \(\mathrm{ri}\,\,\mathcal{U }\) is the relative interior of \(\mathcal{U }\).

The following is our key assumption.

Definition 2.1

Let \(\ell \in \mathbb{Z }_+\). A function \(g:\varvec{\Omega }_\ell \rightarrow \mathbb{R }\) is in class \(\mathcal{L }\) if for each \(\tilde{z}_{1,\ell }\in \mathcal{R }^\ell \) these properties hold: \(g(\cdot \,,\tilde{z}_{1,\ell })\in L^1(\mathbb{P })\) and for any nonzero \(z\in \mathcal{R }\)

$$\begin{aligned} {\mathop {\overline{\text{ lim }}}\limits _{\varepsilon \searrow 0\,\,}}{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} \max _{x\in \mathcal{G }:\left| x\right| \le n}\frac{1}{n} \sum _{0\le k\le \varepsilon n} \left| g(T_{x+kz}\omega , \tilde{z}_{1,\ell })\right| =0\quad \text{ for } \mathbb{P }\text{-a.e. } \omega \text{. } \end{aligned}$$

Membership \(g\in \mathcal{L }\) depends on a combination of mixing of \(\mathbb{P }\) and moments of \(g\). If \(\mathbb{P }\) is an arbitrary ergodic measure then in general we must assume \(g\) bounded to guarantee \(g\in \mathcal{L }\), except that if \(d=1\) then \(g\in L^1(\mathbb{P })\) is enough. Strong mixing of the process \(\{g\circ T_x:x\in \mathcal{G }\}\) and \(g\in L^p(\mathbb{P })\) for some large enough \(p\) also guarantee \(g\in \mathcal{L }\). For example, with exponential mixing \(p>d\) is enough. This is the case in particular if \(g\) has the \(r_0\)-separated i.i.d. property mentioned in Example 1.1. Lemma A.4 of [34] gives a precise statement.

We now define the lattice points \({\hat{x}}_n(\zeta )\) that appear in the point-to-point free energy (1.4). For each point \(\zeta \in \mathcal{U }\) fix weights \(\beta _z(\zeta )\in [0,1]\) such that \(\sum _{z\in \mathcal{R }}\beta _z(\zeta ) =1\) and \(\zeta =\sum _{z\in \mathcal{R }}\beta _z(\zeta ) z\). Then define a path

$$\begin{aligned} {\hat{x}}_n(\zeta )=\sum _{z\in \mathcal{R }}\bigl (\lfloor n\beta _z(\zeta )\rfloor +b_z^{(n)}(\zeta )\bigr ) z, \quad n\in \mathbb{Z }_+, \end{aligned}$$
(2.1)

where \(b_z^{(n)}(\zeta )\in \{0,1\}\) are arbitrary but subject to these constraints: if \(\beta _z(\zeta )=0\) then \(b_z^{(n)}(\zeta )=0\), and \(\sum _z b_z^{(n)}(\zeta ) = n-\sum _{z\in \mathcal{R }}\lfloor n\beta _z(\zeta )\rfloor \). In other words, \({\hat{x}}_n(\zeta )\) is a lattice point that approximates \(n\zeta \), is precisely \(n\) \(\mathcal{R }\)-steps away from the origin, and uses only those steps that appear in the particular convex representation \(\zeta =\sum _z \beta _z z\) that was picked. When \(\zeta \in \mathcal{U }\cap \mathbb{Q }^d\) we require that \(\beta _z(\zeta )\) be rational. This is possible by Lemma A.1 of [34]. If we only cared about \(\Lambda _\ell (g,\zeta )\) for rational \(\zeta \) we could allow much more general paths, see Theorem 2.8 below.

The next theorem establishes the existence of the quenched point-to-point free energy (a) and free energy (b). Introduce the empirical measure \(R_n^{\ell }\) by

$$\begin{aligned} R_n^{\ell }(g)=n^{-1}\sum _{k=0}^{n-1}g(T_{X_k}\omega ,Z_{k+1,k+\ell }). \end{aligned}$$
(2.2)

Theorem 2.2

Fix \(g\in \mathcal{L }\).

  1. (a)

    For \(\mathbb{P }\)-a.e. \(\omega \) and simultaneously for all \(\zeta \in \mathcal{U }\) the limit

    $$\begin{aligned} \Lambda _\ell (g,\zeta ;\omega )= \lim _{n\rightarrow \infty }n^{-1}\log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n={\hat{x}}_n(\zeta )\}\big ] \end{aligned}$$
    (2.3)

    exists in \((-\infty ,\infty ]\). For a particular \(\zeta \) the limit is independent of the choice of convex representation \(\zeta =\sum _z\beta _z z\) and the numbers \(b^{(n)}_z\) that define \({\hat{x}}_n(\zeta )\) in (2.1). When \(\zeta \not \in \mathcal{U }\) it is natural to set \(\Lambda _\ell (g,\zeta )=-\infty \).

  2. (b)

    The limit

    $$\begin{aligned} \Lambda _\ell (g;\omega )=\lim _{n\rightarrow \infty }n^{-1}\log E\big [e^{\sum _{k=0}^{n-1}g(T_{X_k}\omega ,Z_{k+1,k+\ell })}\big ] \end{aligned}$$
    (2.4)

    exists \(\mathbb{P }\)-a.s. in \((-\infty ,\infty ]\) and satisfies

    $$\begin{aligned} \Lambda _\ell (g) =\sup _{\xi \in \mathbb{Q }^d\cap \mathcal{U }}\Lambda _\ell (g,\xi ) =\sup _{\zeta \in \mathcal{U }}\Lambda _\ell (g,\zeta ). \end{aligned}$$
    (2.5)

Formula (4.3) in Sect. 4 shows how to recover \(\Lambda _\ell (g,\zeta )\) from knowing \(\Lambda _\ell (h)\) for a broad enough class of functions \(h\).

Remark 2.3

(Conditions for finiteness) In general, we need to assume that \(g\) is bounded from above to prevent the possibility that \(\Lambda _\ell (g,\zeta )\) takes the value \(+\infty \). When \(g\) has the \(r_0\)-separated i.i.d. property and \(0\notin \mathcal{U }\) as in Example 1.2, the assumption \(\mathbb{E }[\left| g\right| ^p]<\infty \) for some \(p>d\) guarantees that \(\Lambda _\ell (g,\zeta )\) and \(\Lambda _\ell (g)\) are a.s. finite (Lemma 3.1). In fact \(\Lambda _\ell (g,\cdot \,)\) is either bounded or identically \(+\infty \) on \(\mathrm{ri}\,\,\mathcal{U }\) (Theorem 2.6).

Let us recall facts about convex sets. A face of a convex set \(\mathcal{U }\) is a convex subset \(\mathcal{U }_0\) such that every (closed) line segment in \(\mathcal{U }\) with a relative interior point in \(\mathcal{U }_0\) has both endpoints in \(\mathcal{U }_0\). \(\mathcal{U }\) itself is a face. By Corollary 18.1.3 of [35] any other face of \(\mathcal{U }\) is entirely contained in the relative boundary of \(\mathcal{U }\). Extreme points of \(\mathcal{U }\) are the zero-dimensional faces. By Theorem 18.2 of [35] each point \(\zeta \in \mathcal{U }\) has a unique face \(\mathcal{U }_0\) such that \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\). (An extreme case of this is \(\zeta \in \mathrm{ex}\,\mathcal{U }\) in which case \(\{\zeta \}=\mathcal{U }_0=\mathrm{ri}\,\,\mathcal{U }_0\). Note that the relative interior of a nonempty convex set is never empty.) By Theorem 18.1 of [35] if \(\zeta \in \mathcal{U }\) belongs to a face \(\mathcal{U }_0\) then any representation of \(\zeta \) as a convex combination of elements of \(\mathcal{U }\) involves only elements of \(\mathcal{U }_0\). Lastly, Theorem 18.3 in [35] says that a face \(\mathcal{U }_0\) is the convex hull of \(\mathcal{R }_0=\mathcal{R }\cap \mathcal{U }_0\).

We address basic properties of \(\Lambda _\ell (g,\zeta ;\omega )\). The first issue is whether it is random (genuinely a function of \(\omega \)) or deterministic (there is a value \(\Lambda _\ell (g,\zeta )\) such that \(\Lambda _\ell (g,\zeta ;\omega )=\Lambda _\ell (g,\zeta )\) for \(\mathbb{P }\)-almost every \(\omega \)). This will depend on the setting. If \(0\in \mathrm{ex}\,\mathcal{U }\) then the condition \(X_n=0\) does not permit the walk to move and \(\Lambda _\ell (g,0;\omega )=-\log |\mathcal{R }|+g(\omega ,(0,\cdots ,0))\). But even if the origin does not cause problems, \(\Lambda _\ell (g,\zeta ;\omega )\) is not necessarily deterministic on all of \(\mathcal{U }\) if \(\mathbb{P }\) is not totally ergodic. For example, if \(0\ne z\in \mathrm{ex}\,\mathcal{U }\) then \(X_n=nz\) is possible only by repetition of step \(z\) and \(\Lambda _\ell (g,z;\omega )=-\log |\mathcal{R }| +\mathbb{E }[g(\omega , (z,\cdots ,z))\,|\,\mathfrak I _z]\), where \(\mathfrak I _z\) is the \(\sigma \)-algebra invariant under \(T_z\).

Theorem 2.4

Fix \(g\in \mathcal{L }\). Let \(\mathcal{U }_0\) be any face of \(\mathcal{U }\), possibly \(\mathcal{U }\) itself. Suppose \(\mathbb{P }\) is ergodic under \(\{T_z:z\in \mathcal{R }\cap \mathcal{U }_0\}\). Then there exist a nonrandom function \(\Lambda _\ell (g,\zeta )\) of \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\) and an event \(\Omega _0\) such that (i) \(\mathbb{P }(\Omega _0)=1\) and (ii) for all \(\omega \in \Omega _0\) and \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\) the limit in (2.3) equals \(\Lambda _\ell (g,\zeta )\).

Remark 2.5

  1. (i)

    For an ergodic \(\mathbb{P }\) we get a deterministic function \(\Lambda _\ell (g,\zeta )\) of \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\). We write \(\Lambda _\ell (g,\zeta ;\omega )=\Lambda _\ell (g,\zeta )\) in this case.

  2. (ii)

    If \(\mathbb{P }\) is nondegenerate the assumption rules out the case \(\mathcal{U }_0=\{0\}\) because \(T_0\) is the identity mapping. \(\{0\}\) is a face if \(0\in \mathrm{ex}\,\mathcal{U }\).

  3. (iii)

    An important special case is the totally ergodic \(\mathbb{P }\). Then the theorem above applies to each face except \(\{0\}\). Since there are only finitely many faces, we get a single deterministic function \(\Lambda _\ell (g,\zeta )\) and a single event \(\Omega _0\) of full \(\mathbb{P }\)-probability such that \(\Lambda _\ell (g,\zeta )\) is the limit in (2.3) for all \(\omega \in \Omega _0\) and \(\zeta \in \mathcal{U }\backslash \{0\}\). The point \(\zeta =0\) is included in this statement if \(0\) is a non-extreme point of \(\mathcal{U }\).

Convexity of \(\Lambda _\ell (g,\zeta )\) in \(g\) follows from Hölder’s inequality. The next theorem establishes some regularity in \(\zeta \) for the a.e. defined function \(\Lambda _\ell (g,\zeta ;\omega )\).

The infinite case needs to be separated.

Theorem 2.6

Let \(g\in \mathcal{L }\) and assume \(\mathbb{P }\) is ergodic. Then \(\Lambda _\ell (g)\) is deterministic. The following properties hold for \(\mathbb{P }\)-a.e. \(\omega \).

  1. (a)

    If \(\Lambda _\ell (g)=\infty \) then \(\Lambda _\ell (g,\zeta )\) is identically \(+\infty \) for \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\).

  2. (b)

    Suppose \(\Lambda _\ell (g)<\infty \). Then \(\Lambda _\ell (g,\cdot \,;\omega )\) is lower semicontinuous and bounded on \(\mathcal{U }\) and concave and continuous on \(\mathrm{ri}\,\,\mathcal{U }\). The upper semicontinuous regularization of \(\Lambda _\ell (g,\cdot \,;\omega )\) and its unique continuous extension from \(\mathrm{ri}\,\,\mathcal{U }\) to \(\mathcal{U }\) are equal and deterministic.

Remark 2.7

Suppose \(\mathbb{P }\) is totally ergodic and we are in the finite case of Theorem 2.6(b). Then concavity in \(\zeta \) extends to all of \(\mathcal{U }\) (see Remark 2.10 below for the argument). This is true despite the possibility of a random value \(\Lambda _\ell (g,0;\omega )\) at \(\zeta =0\) (this happens in the case \(0\in \mathrm{ex}\,\mathcal{U }\)). In other words, concavity and lower semicontinuity are both valid even with the random value at \(\zeta =0\). However, continuity must fail because on \(\mathcal{U }\backslash \{0\}\) the function \(\Lambda _\ell (g, \zeta )\) is deterministic. This issue of extending continuity from \(\mathrm{ri}\,\,\mathcal{U }\) to the boundary is tricky. We address this issue in the i.i.d. case in Theorem 3.2.

We turn to the proofs of the theorems in this section. Recall \(M=\max \{\left| z\right| :z\in \mathcal{R }\}\). Let

$$\begin{aligned} D_n=\{z_1+\cdots +z_n:z_{1,n}\in \mathcal{R }^n\} \end{aligned}$$
(2.6)

denote the set of endpoints of admissible paths of length \(n\). To prove Theorem 2.2 we first treat rational points \(\xi \in \mathcal{U }\). In this case we can be more liberal with the function \(g\) and with the paths.

Theorem 2.8

Let \(g(\cdot \,,z_{1,\ell })\in L^1(\mathbb{P })\) for each \(z_{1,\ell }\in \mathcal{R }^\ell \). Then for \(\mathbb{P }\)-a.e. \(\omega \) and simultaneously for all \(\xi \in \mathcal{U }\cap \mathbb{Q }^d\) the following holds: for any path \(\{y_n(\xi )\}_{n\in \mathbb{Z }_+}\) such that \(y_n(\xi )-y_{n-1}(\xi )\in \mathcal{R }\) and for some \(k\in \mathbb{N }\), \(y_{mk}(\xi )=mk\xi \) for all \(m\in \mathbb{Z }_+\), the limit

$$\begin{aligned} \Lambda _\ell (g,\xi ;\omega )= \lim _{n\rightarrow \infty }n^{-1}\log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n=y_n(\xi )\}\big ] \end{aligned}$$
(2.7)

exists in \((-\infty ,\infty ]\). For a given \(\xi \in \mathcal{U }\cap \mathbb{Q }^d\) the limit is independent of the choice of the path \(\{y_n(\xi )\}\) subject to the condition above.

Proof of Theorem 2.8

Fix \(\xi \in \mathbb{Q }^d\cap \mathcal{U }\), the path \(y_n(\xi )\), and \(k\) so that \(y_{mk}(\xi )=mk\xi \) for all \(m\in \mathbb{Z }_+\). By the Markov property

$$\begin{aligned}&\log E\big [e^{(m+n)k R^{\ell }_{(m+n)k}(g)}, X_{(m+n)k}=(m+n)k\xi \big ]-2A_\ell (\omega )\nonumber \\&\qquad \qquad \ge \log E\big [e^{mk R_{mk}^{\ell }(g)}, X_{mk}=mk\xi \big ]-2A_\ell (\omega )\nonumber \\&\qquad \qquad \qquad +\log E\big [e^{nk R_{nk}^{\ell }(g\circ T_{mk\xi })}, X_{nk}=nk\xi \big ]-2A_\ell (T_{mk\xi }\omega ), \end{aligned}$$
(2.8)

where \(T_x\) acts by \(g\circ T_x (\omega ,z_{1,\ell })=g(T_x\omega ,z_{1,\ell })\) and the errors are covered by defining

$$\begin{aligned} A_\ell (\omega )=\ell \max _{y\in \mathcal{G }: \left| y\right| \le M\ell } \max _{z_{1,\ell }\in \mathcal{R }^\ell }\max _{1\le i\le \ell } \left| g(T_{-{\tilde{x}}_i}\omega ,z_{1,\ell })\right| \in L^1(\mathbb{P }). \end{aligned}$$

Since \(g\in L^1(\mathbb{P })\) the random variable \(-\log E[e^{nk R_{nk}^{\ell }(g)}, X_{nk}=nk\xi ]+2A_\ell (\omega )\) is \(\mathbb{P }\)-integrable for each \(n\). By Kingman’s subadditive ergodic theorem (for example in the form in [24, Theorem 2.6, p. 277])

$$\begin{aligned} \Lambda _\ell (g,\xi ;\omega )=\lim _{m\rightarrow \infty }\frac{1}{mk}\log E\big [e^{mk R_{mk}^{\ell }(g)}, X_{mk}=mk\xi \big ] \end{aligned}$$
(2.9)

exists in \((-\infty ,\infty ]\) \(\mathbb{P }\)-almost surely. This limit is independent of \(k\) because if \(k_1\) and \(k_2\) both work and give distinct limits, then the limit along the subsequence of multiples of \(k_1k_2\) would not be defined. Let \(\Omega _0\) be the full probability event on which limit (2.9) holds for all \(\xi \in \mathbb{Q }^d\cap \mathcal{U }\) and \(k\in \mathbb{N }\) such that \(k\xi \in \mathbb{Z }^d\).

Next we extend limit (2.9) to the full sequence. Given \(n\) choose \(m\) so that \( mk\le n< (m+1)k \). By assumption we have admissible paths from \(mk\xi \) to \(y_n(\xi )\) and from \(y_n(\xi )\) to \((m+1)k\xi \), so we can create inequalities by restricting the expectations to follow these path segments. For convenience let us take \(k>\ell \) so that \(R^{\ell }_{(m-1)k}(g)\) does not depend on the walk beyond time \(mk\). Then, for all \(\omega \)

$$\begin{aligned}&\log E\big [e^{n R_{n}^{\ell }(g)}, X_{n}=y_n(\xi )\big ]\nonumber \\&\quad \ge \log E\big [e^{(m-1)k R^{\ell }_{(m-1)k}(g)}, X_{mk}=mk\xi ,\, X_{n}=y_n(\xi )\big ]-A_{2k}(T_{mk\xi }\omega )\nonumber \\&\quad \ge \log E\big [e^{(m-1)k R^{\ell }_{(m-1)k}(g)}, X_{mk}=mk\xi \big ]-(n-mk) \log |\mathcal{R }| -A_{2k}(T_{mk\xi }\omega )\nonumber \\&\quad \ge \log E\big [e^{mk R^{\ell }_{mk}(g)}, X_{mk}=mk\xi \big ] - k\log |\mathcal{R }| -2A_{2k}(T_{mk\xi }\omega ) \end{aligned}$$
(2.10)

and similarly

$$\begin{aligned}&\log E\big [e^{(m+1)k R^{\ell }_{(m+1)k}(g)}, X_{(m+1)k}=(m+1)k\xi \big ]\\&\quad \ge \log E\big [e^{n R^{\ell }_{n}(g)}, X_{n}=y_n(\xi )\big ] -k\log |\mathcal{R }| -2A_{2k}(T_{mk\xi }\omega ). \end{aligned}$$

Divide by \(n\) and take \(n\rightarrow \infty \) in the bounds developed above. Since in general \(m^{-1}Y_m\rightarrow 0\) a.s. for identically distributed integrable \(\{Y_m\}\), the error terms vanish in the limit. The limit holds on the full probability subset of \(\Omega _0\) where the errors \(n^{-1}A_{2k}(T_{mk\xi }\omega )\rightarrow 0\) for all \(\xi \) and \(k\). We also conclude that the limit is independent of the choice of the path \(y_n(\xi )\). Theorem 2.8 is proved. \(\square \)

The next lemma will help in the proof of Theorem 2.2 and the LDP in Theorem 4.1

Lemma 2.9

Let \(g\in \mathcal{L }\). Define the paths \(\{y_n(\xi )\}\) for \(\xi \in \mathbb{Q }^d\cap \mathcal{U }\) as in Theorem 2.8. Then for \(\mathbb{P }\)-a.e. \(\omega \), we have the following bound for all compact \(K\subset \mathbb{R }^d\) and \(\delta >0:\)

$$\begin{aligned}&{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }}n^{-1}\log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n/n\in K\}\big ]\end{aligned}$$
(2.11)
$$\begin{aligned}&\qquad \qquad&\le \sup \limits _{\xi \in \mathbb{Q }^d\cap K_\delta \cap \mathcal{U }}\,{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }}n^{-1}\log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n=y_n(\xi )\}\big ] \end{aligned}$$
(2.12)

where \(K_\delta =\{\zeta \in \mathbb{R }^d:\exists \zeta ^{\prime }\in K \text{ with } |\zeta -\zeta ^{\prime }|<\delta \}\).

Proof

Fix a nonzero \({\hat{z}}\in \mathcal{R }\). Fix \(\varepsilon \in (0,\delta /(4M))\) and an integer \(k\ge |\mathcal{R }|(1+2\varepsilon )/\varepsilon \). There are finitely many points in \(k^{-1}D_k\) so we can fix a single integer \(b\) such that \(y_{mb}(\xi )=mb\xi \) for all \(m\in \mathbb{Z }_+\) and \(\xi \in k^{-1}D_k\).

We construct a path from each \(x\in D_n\cap nK\) to a multiple of a point \(\xi (n,x)\in K_\delta \cap k^{-1}D_k\). Begin by writing \(x=\sum _{z\in \mathcal{R }} a_z z\) with \(a_z\in \mathbb{Z }_+\) and \(\sum _{z\in \mathcal{R }}a_z=n\). Let \(m_n=\left\lceil {(1+2\varepsilon )n/k}\right\rceil \) and \(s_z^{(n)}=\left\lceil {k a_z/((1+2\varepsilon )n)}\right\rceil \).

$$\begin{aligned} (1-\tfrac{1}{1+2\varepsilon })n^{-1}a_z-\tfrac{1}{k}\le n^{-1}a_z - k^{-1}s_z^{(n)}\le (1-\tfrac{1}{1+2\varepsilon })n^{-1}a_z. \end{aligned}$$

This implies that

$$\begin{aligned} \tfrac{\varepsilon }{1+2\varepsilon }\le 1-k^{-1}\sum _z s^{(n)}_z\le 1-\tfrac{1}{1+2\varepsilon }<\tfrac{\delta }{2M} \end{aligned}$$

and

$$\begin{aligned} \Big | k^{-1}\sum _{z\in \mathcal{R }} s^{(n)}_z z -n^{-1}x\Big |\le M \sum _{z\in \mathcal{R }} |k^{-1}s^{(n)}_z-n^{-1}a_z|\le M(1-\tfrac{1}{1+2\varepsilon })<\tfrac{\delta }{2}. \end{aligned}$$

Define a point \(\xi (n,x)\in K_\delta \cap k^{-1}D_k\) by

$$\begin{aligned} \xi (n,x)=k^{-1}\sum _{z\in \mathcal{R }} s^{(n)}_z z+\Big (1-k^{-1}\sum _{z\in \mathcal{R }} s^{(n)}_z\Big ){\hat{z}}. \end{aligned}$$
(2.13)

Since \(m_n s^{(n)}_z\ge a_z\) for each \(z\in \mathcal{R }\), the sum above describes an admissible path of \(m_n k-n\) steps from \(x\) to \(m_n k\xi (n,x)\). For each \(x\in D_n\) and each \(n\), the number of \({\hat{z}}\) steps in this path is at least

$$\begin{aligned} m_n(k-\sum _{z\in \mathcal{R }} s^{(n)}_z)\ge m_n k\varepsilon /(1+2\varepsilon )\ge n\varepsilon . \end{aligned}$$
(2.14)

Next, let \(\ell _n\) be an integer such that \((\ell _n-1)b<m_n\le \ell _n b\). Repeat the steps of \(k\xi (n,x)\) in (2.13) \(\ell _n b-m_n\le b\) times to go from \(m_n k\xi (n,x)\) to \(\ell _nkb\xi (n,x)=y_{\ell _n kb}(\xi (n,x))\). Thus, the total number of steps to go from \(x\) to \(\ell _n k b\xi (n,x)\) is \(r_n=\ell _n k b-n\). Recall that \(b\) is a function of \(k\) alone. So \(r_n \le 3\varepsilon n\) for \(n\) large enough, depending on \(k, \varepsilon \). Denote this sequence of steps by \(\mathbf{u}(n,x)=(u_1,\cdots ,u_{r_n})\).

We develop an estimate. Abbreviate \({\bar{g}}(\omega )=\) \(\max _{z_{1,\ell }\in \mathcal{R }^\ell } |g(\omega ,z_{1,\ell })|\).

$$\begin{aligned}&\frac{1}{n}\log E\big [e^{n R^{\ell }_n(g)}{\small 1}\!\!1\{X_n/n\in K\}\big ]\nonumber \\&\qquad = \frac{1}{n}\log \sum _{x\in D_n\cap nK} E\big [e^{n R^{\ell }_n(g)}, X_n=x\big ]\nonumber \\&\qquad \le \max _{x\in D_n\cap nK} \frac{1}{n}\log E\big [e^{(n-\ell ) R^{\ell }_{n-\ell }(g)}, X_n=x\big ]\nonumber \\&\qquad \qquad + \max _{x\in D_n\cap nK} \max _{y\in \cup _{s=0}^\ell D_s}\frac{\ell }{n} {\bar{g}}(T_{x-y}\omega )+ \frac{C\log n}{n}\nonumber \\&\qquad \le \max _{x\in D_n\cap nK} \frac{1}{n}\log E\big [e^{\ell _n k b R^{\ell }_{\ell _n kb}(g)}, X_{\ell _n kb}=\ell _nkb\xi (n,x)\big ]\nonumber \\&\qquad \qquad +\max _{x\in D_n\cap nK} \frac{1}{n} \sum _{i=1}^{r_n} {\bar{g}}(T_{x+u_1+\cdots +u_i}\omega ) +\frac{r_n}{n}\log |\mathcal{R }|\nonumber \\&\qquad \qquad +\max _{x\in D_n\cap nK}\max _{y\in \cup _{s=0}^\ell D_s} \frac{2\ell }{n} {\bar{g}}(T_{x-y}\omega )+ \frac{C\log n}{n}. \end{aligned}$$
(2.15)

As \(n\rightarrow \infty \) the limsup of the term in the third-to-last line of the above display is bounded above, for all \(\omega \), by

$$\begin{aligned} (1+3\varepsilon )\sup _{\xi \in \mathbb{Q }^d\cap K_\delta \cap \mathcal{U }}\,{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }}n^{-1}\log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n=y_n(\xi )\}\big ]. \end{aligned}$$

The proof of (2.11) is complete once we show that a.s.

$$\begin{aligned} \begin{aligned}&{\mathop {\overline{\text{ lim }}}\limits _{\varepsilon \rightarrow 0}}{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} \max _{x\in D_n} \frac{1}{n}\sum _{i=1}^{r_n} {\bar{g}}(T_{x+u_1+\cdots +u_i}\omega )=0\\ \text{ and } \quad&{\mathop {\overline{\text{ lim }}}\limits _{\varepsilon \rightarrow 0}}{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} \max _{x\in D_n}\max _{y\in \cup _{s=0}^\ell D_s} \frac{1}{n} {\bar{g}}(T_{x-y}\omega )=0. \end{aligned} \end{aligned}$$
(2.16)

To this end, observe that the order in which the steps in \(\mathbf{u}(n,x)\) are arranged was so far immaterial. From (2.14) the ratio of zero steps to \({\hat{z}}\) steps is at most \(r_n/(n\varepsilon )\le 3\). Start path \(\mathbf{u}(n,x)\) by alternating \({\hat{z}}\) steps with blocks of at most 3 zero steps, until \({\hat{z}}\) steps and zero steps are exhausted. After that fix an ordering \(\mathcal{R }\setminus \{0, {\hat{z}}\}=\{z_1,z_2,\cdots \}\) and arrange the rest of the path \(\mathbf{u}(n,x)\) to take first all its \(z_1\) steps, then all its \(z_2\) steps, and so on. This leads to the bound

$$\begin{aligned} \sum _{i=1}^{r_n} {\bar{g}}(T_{x+u_1+\cdots +u_i}\omega )\le 4\left| \mathcal{R }\right| \max _{y\in x+\mathbf{u}(n,x)}\max _{z\in \mathcal{R }\setminus \{0\}}\sum _{i=0}^{r_n} {\bar{g}}(T_{y+iz}\omega ). \end{aligned}$$
(2.17)

The factor 4 is for repetitions of the same \({\bar{g}}\)-value due to zero steps. By \(y\in x+\mathbf{u}(n,x)\) we mean that \(y\) is on the path starting from \(x\) and taking steps in \(\mathbf{u}(n,x)\). A similar bound develops for the second line of (2.16). Then the limits in (2.16) follow from membership in \(\mathcal{L }\). The lemma is proved. \(\square \)

Proof of Theorem 2.2

Part (a). Having proved Theorem 2.8, the next step is to deduce the existence of \(\Lambda _\ell (g,\zeta )\) as the limit (2.3) for irrational velocities \(\zeta \), on the event of full \(\mathbb{P }\)-probability where \(\Lambda _\ell (g,\xi )\) exists for all rational \(\xi \in \mathcal{U }\).

Let \(\zeta \in \mathcal{U }\). It comes with a convex representation \(\zeta =\sum _{z\in \mathcal{R }_0}\beta _z z\) with \(\beta _z>0\) for \(z\in \mathcal{R }_0\subset \mathcal{R }\), and its path is defined as in (2.1). Let \(\delta =\delta (\zeta ) =\min _{z\in \mathcal{R }_0}\beta _z>0\).

We approximate \(\zeta \) with rational points from \(\mathrm{co}\,\mathcal{R }_0\). Let \(\varepsilon >0\) and choose \(\xi =\sum _{z\in \mathcal{R }_0}\alpha _z z\) with \(\alpha _z\in [\delta /2,1]\cap \mathbb{Q }\), \(\sum _z\alpha _z=1\), and \(|\alpha _z-\beta _z|<\varepsilon \) for all \(z\in \mathcal{R }_0\).

Let \(k\in \mathbb{N }\) be such that \(k\alpha _z\in \mathbb{N }\) for all \(z\in \mathcal{R }_0\). Let \(m_n=\left\lfloor {k^{-1}(1+4\varepsilon /\delta )n}\right\rfloor \) and \(s_z^{(n)}=km_n\alpha _z-\lfloor n\beta _z\rfloor -b_z^{(n)}\). Then,

$$\begin{aligned} s_z^{(n)}/n\rightarrow (1+4\varepsilon /\delta )\alpha _z-\beta _z\ge \varepsilon >0. \end{aligned}$$
(2.18)

Thus \(s_z^{(n)}\ge 0\) for large enough \(n\).

Now, starting at \({\hat{x}}_n(\zeta )\) and taking each step \(z\in \mathcal{R }_0\) exactly \(s_z^{(n)}\) times arrives at \(km_n\xi \). Denote this sequence of steps by \(\{u_i\}_{i=1}^{r_n}\), with \(r_n=km_n-n\le (4\varepsilon /\delta ) n\). We wish to develop an estimate similar to those in (2.10) and (2.15), using again \({\bar{g}}(\omega )=\) \(\max _{z_{1,\ell }\in \mathcal{R }^\ell } |g(\omega ,z_{1,\ell })|\). Define

$$\begin{aligned} B(\omega ,n,\varepsilon , \kappa )&=\kappa \left| \mathcal{R }\right| \cdot \max \limits _{\left| x\right| \le \kappa n} \max \limits _{z\in \mathcal{R }\backslash \{0\}} \sum \limits _{i=0}^{\kappa \varepsilon n} {\bar{g}}(T_{x+iz}\omega ) \\&\qquad \qquad \qquad +\max \limits _{x\in D_n}\max \limits _{y\in \cup _{s=0}^\ell D_s} {2\ell } {\bar{g}}(T_{x-y}\omega ). \end{aligned}$$

Then develop an upper bound:

$$\begin{aligned}&\log E\big [e^{km_n R_{k m_n}^{\ell }(g)}{\small 1}\!\!1\{X_{km_n}\!=\!km_n\xi \}\big ]\nonumber \\&\ge \log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n={\hat{x}}_n(\zeta )\}\big ] \!-\! \sum _{i=0}^{r_n-1} {\bar{g}}(T_{{\hat{x}}_n(\zeta )\!+\!u_1\!+\!\cdots \!+\!u_i}\omega )\nonumber \\&\qquad - \max _{y\in \cup _{s=0}^\ell D_s} {2\ell } {\bar{g}}(T_{{\hat{x}}_n(\zeta )\!-\!y}\omega ) \!-\! (4\varepsilon /\delta ) n\log |\mathcal{R }| \nonumber \\&\ge \log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n={\hat{x}}_n(\zeta )\}\big ] \!-\! B(\omega ,n,\varepsilon , \kappa ) \!-\! (4\varepsilon /\delta ) n\log |\mathcal{R }|. \end{aligned}$$
(2.19)

To get the last inequality above first order the steps of the \(\{u_i\}\) path as was done above to go from (2.16) to (2.17). In particular, the number of zero steps needs to be controlled. If \(0\in \mathcal{R }_0\), pick a step \( {\hat{z}}\in \mathcal{R }_0\backslash \{0\}\), and from (2.18) obtain that, for large enough \(n\),

$$\begin{aligned} \frac{s^{(n)}_0}{s^{(n)}_{\hat{z}}}\le \frac{2n\bigl ((1+4\varepsilon /\delta )\alpha _0-\beta _0\bigr )}{n\varepsilon /2} \le 4\Bigl (1+\frac{4}{\delta }\Bigr ). \end{aligned}$$

Thus we can exhaust the zero steps by alternating blocks of \(\left\lceil {4(1+4/\delta )}\right\rceil \) zero steps with individual \({\hat{z}}\) steps. Consequently in the sum on the second line of (2.19) we have a bound \(c(\delta )\) on the number of repetitions of individual \({\bar{g}}\)-values. To realize the domination by \(B(\omega ,n,\varepsilon , \kappa )\) on the last line of (2.19), pick \(\kappa >c(\delta )\) and large enough so that \(\kappa \varepsilon n\ge r_n\) and so that \(\{\left| x\right| \le \kappa n\}\) covers \(\{{\hat{x}}_n(\zeta )+u_1+\cdots +u_i: 0\le i\le r_n\}\).

The point of formulating the error \(B(\omega ,n,\varepsilon , \kappa )\) with the parameter \(\kappa \) is to control all the errors in (2.19) on a single event of \(\mathbb{P }\)-measure 1, simultaneously for all \(\zeta \in \mathcal{U }\) and countably many \(\varepsilon \searrow 0\), with a choice of rational \(\xi \) for each pair \((\zeta , \varepsilon )\). From \(g\in \mathcal{L }\) follows that \(\mathbb{P }\)-a.s.

$$\begin{aligned} {\mathop {\overline{\text{ lim }}}\limits _{\varepsilon \searrow 0\,\,}}{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} n^{-1}B(\omega ,n,\varepsilon , \kappa )= 0 \quad \text{ simultaneously } \text{ for } \text{ all } \kappa \in \mathbb{N }\text{. } \end{aligned}$$

A similar argument, with \(\bar{m}_n\!=\!\lfloor k^{-1}(1 - 4\varepsilon /\delta )n\rfloor \) and \(\bar{s}_z^{(n)}\!=\!\lfloor n\beta _z\rfloor + b_z^{(n)}(\zeta ) - k\bar{m}_n\alpha _z\), gives

$$\begin{aligned}&\log E\big [e^{k\bar{m}_n R_{k \bar{m}_n}^{\ell }(g)}{\small 1}\!\!1\{X_{k\bar{m}_n}=k\bar{m}_n\xi \}\big ]\nonumber \\&\quad \le \log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n={\hat{x}}_n(\zeta )\}\big ] + C\varepsilon n\log |\mathcal{R }| +B(\omega ,n,\varepsilon ,\kappa ). \end{aligned}$$
(2.20)

Now in (2.19) and (2.20) divide by \(n\), let \(n\rightarrow \infty \) and use the existence of the limit \(\Lambda _\ell (g,\xi )\). Since \(\varepsilon >0\) can be taken to zero, we have obtained the following. \(\Lambda _\ell (g,\zeta )\) exists as the limit (2.3) for all \(\zeta \in \mathcal{U }\) on an event of \(\mathbb{P }\)-probability \(1\), and

$$\begin{aligned} \Lambda _\ell (g,\zeta )=\lim _{\xi _j\rightarrow \zeta }\Lambda _\ell (g,\xi _j), \end{aligned}$$
(2.21)

whenever \(\xi _j\) is a sequence of rational convex combinations of \(\mathcal{R }_0\) whose coefficients converge to the coefficients \(\beta _z\) of \(\zeta \).

At this point the value \(\Lambda _\ell (g,\zeta )\) appears to depend on the choice of the convex representation \(\zeta =\sum _{z\in \mathcal{R }_0}\beta _z z\). We show that each choice gives the same value \(\Lambda _\ell (g,\zeta )\) as a particular fixed representation. Let \(\bar{\mathcal{U }}\) be the unique face containing \(\zeta \) in its relative interior and \(\bar{\mathcal{R }}=\mathcal{R }\cap \bar{\mathcal{U }}\). Then we can fix a convex representation \(\zeta =\sum _{z\in \bar{\mathcal{R }}}\bar{\beta }_z z\) with \(\bar{\beta }_z>0\) for all \(z\in \bar{\mathcal{R }}\). As above, let \(\xi _n\) be rational points from \(\mathrm{co}\,\mathcal{R }_0\) such that \(\xi _n\rightarrow \zeta \). The fact that \(\zeta \) can be expressed as a convex combination of \(\mathcal{R }_0\) forces \(\mathcal{R }_0\subset \bar{\mathcal{U }}\), and consequently \(\xi _n\in \bar{\mathcal{U }}\). By Lemma 7.1, there are two rational convex representations \(\xi _n=\sum _{z\in \mathcal{R }_0}\alpha ^n_z z=\sum _{z\in \bar{\mathcal{R }}}\bar{\alpha }^n_z z\) with \(\alpha _z^n\rightarrow \beta _z\) and \(\bar{\alpha }_z^n\rightarrow \bar{\beta }_z\). By Theorem 2.8 the value \(\Lambda _\ell (g,\xi _n)\) is independent of the convex representation of \(\xi _n\). Hence the limit in (2.21) shows that representations in terms of \(\mathcal{R }_0\) and in terms of \(\bar{\mathcal{R }}\) lead to the same value \(\Lambda _\ell (g,\zeta )\).

Part (b). With the limit (2.3) in hand, limit (2.4) and the variational formula (2.5) follow from Lemma 2.9 with \(K=\mathcal{U }\). Theorem 2.2 is proved. \(\square \)

Proofs of the remaining theorems of the section follow.

Proof of Theorem 2.4

Fix a face \(\mathcal{U }_0\) and \(\mathcal{R }_0=\mathcal{R }\cap \mathcal{U }_0\). If \(\xi \) is a rational point in \(\mathrm{ri}\,\,\mathcal{U }_0\), then write \(\xi =\sum _{z\in \mathcal{R }_0}\alpha _z z\) with rational \(\alpha _z>0\) (consequence of Lemma A.1 of [34]). Let \(k\in \mathbb{N }\) such that \(k\alpha _z\in \mathbb{Z }\) for each \(z\). Let \(z\in \mathcal{R }_0\). There is a path of \(k-1\) steps from \((m-1)k\xi +z\) to \(mk\xi \). Proceed as in (2.10) to reach

$$\begin{aligned} \Lambda _\ell (g,\xi )&\ge {\mathop {\underline{\text{ lim }}}\limits _{m\rightarrow \infty }}\frac{1}{mk}\log E\Big [e^{mk R_{mk}^{\ell }(g)}, X_{mk}=mk\xi \,\Big |\,X_1=z\Big ]\\&\ge {\mathop {\underline{\text{ lim }}}\limits _{m\rightarrow \infty }}\frac{1}{mk}\log E\Big [e^{((m-1)k+1) R_{(m-1)k+1}^{\ell }(g)},\\&\qquad \qquad \qquad \quad X_{(m-1)k+1}=(m-1)k\xi +z\,\Big |\,X_1=z\Big ]\\&= \Lambda _\ell (g,\xi )\circ T_z. \end{aligned}$$

Thus \(\Lambda _\ell (g,\xi )\) is \(T_z\)-invariant for each \(z\in \mathcal{R }_0\) so by ergodicity \(\Lambda _\ell (g,\xi )\) is deterministic. This holds for \(\mathbb{P }\)-a.e. \(\omega \) simultaneously for all rational \(\xi \in \mathrm{ri}\,\,\mathcal{U }_0\). Since \(\Lambda _\ell (g,\cdot )\) at irrational points of \(\mathrm{ri}\,\,\mathcal{U }_0\) can be obtained through (2.21) from its values at rational points, the claim follows for all \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\). \(\square \)

Proof of Theorem 2.6

The logical order of the proof is not the same as the ordering of the statements in the theorem. First we establish concavity for rational points in \(\mathrm{ri}\,\,\mathcal{U }\) via the Markov property. For \(t\in \mathbb{Q }\cap [0,1]\) and \(\xi ^{\prime },\xi ^{\prime \prime }\in \mathbb{Q }^d\cap \mathrm{ri}\,\,\mathcal{U }\) choose \(k\) so that \(kt\in \mathbb{Z }_+\), \(kt\xi ^{\prime }\in \mathbb{Z }^d\), and \(k(1-t)\xi ^{\prime \prime }\in \mathbb{Z }^d\). Then, as in (2.8),

$$\begin{aligned}&\log E\Big [e^{mk R_{mk}^{\ell }(g)}, X_{mk}=mk(t\xi ^{\prime }+(1-t)\xi ^{\prime \prime })\Big ]\nonumber \\&\quad \ge \log E\Big [e^{mkt R_{mkt}^{\ell }(g)}, X_{mkt}=mkt\xi ^{\prime }\Big ]\nonumber \\&\qquad \quad + \log E\Big [e^{mk(1-t) R_{mk(1-t)}^{\ell }(g\circ T_{mkt\xi ^{\prime }})}, X_{mk(1-t)}=mk(1-t)\xi ^{\prime \prime }\Big ]\nonumber \\&\qquad \quad -2A_\ell (T_{mkt\xi ^{\prime }}\omega ). \end{aligned}$$
(2.22)

Divide by \(mk\) and let \(m\rightarrow \infty \). On \(\mathrm{ri}\,\,\mathcal{U }\) \(\Lambda _\ell (g,\cdot )\) is deterministic (Theorem 2.4), hence the second (shifted) logarithmic moment generating function on the right of (2.22) converges to its limit at least in probability, hence a.s. along a subsequence. In the limit we get

$$\begin{aligned} \Lambda _\ell (g,t\xi ^{\prime }+(1-t)\xi ^{\prime \prime })\ge t\Lambda _\ell (g,\xi ^{\prime })+(1-t)\Lambda _\ell (g,\xi ^{\prime \prime }). \end{aligned}$$
(2.23)

\(\square \)

To get concavity on all of \(\mathrm{ri}\,\,\mathcal{U }\), approximate arbitrary points of \(\mathrm{ri}\,\,\mathcal{U }\) with rational convex combinations so that limit (2.21) can be used to pass along the concavity.

Remark 2.10

In the totally ergodic case Theorem 2.4 implies that \(\Lambda _\ell (g,\zeta )\) is deterministic on all of \(\mathcal{U }\), except possibly at \(\zeta =0\in \mathrm{ex}\,\mathcal{U }\). If \(0\) is among \(\{\xi ^{\prime },\xi ^{\prime \prime }\}\) then take \(\xi ^{\prime }=0\) in (2.22), so that, as the limit is taken to go from (2.22) to (2.23), we can take advantage of the deterministic limit \(\Lambda _\ell (g,\xi ^{\prime \prime })\) for the shifted term on the right of (2.22). Thus, (2.23) holds for all rational \(\xi ^{\prime },\xi ^{\prime \prime }\in \mathcal{U }\). The subsequent limit to non-rational points proceeds as above.

Next we address lower semicontinuity of \(\Lambda _\ell (g,\zeta )\) in \(\zeta \in \mathcal{U }\). Fix \(\zeta \) and pick \(\mathcal{U }\ni \zeta _j\rightarrow \zeta \) that achieves the liminf of \(\Lambda _\ell (g,\cdot )\) at \(\zeta \). Since \(\mathcal{R }\) is finite, one can find a further subsequence that always stays inside the convex hull \(\mathcal{U }_0\) of some set \(\mathcal{R }_0\subset \mathcal{R }\) of at most \(d+1\) affinely independent vectors. Then, \(\zeta \in \mathcal{U }_0\) and we can write the convex combinations \(\zeta =\sum _{z\in \mathcal{R }_0}\beta _z z\) and \(\zeta _j=\sum _{z\in \mathcal{R }_0}\beta _z^{(j)} z\). Furthermore, as before, \(\beta _z^{(j)}\rightarrow \beta _z\) as \(j\rightarrow \infty \). Let \(\hat{\mathcal{R }}_0=\{z\in \mathcal{R }_0:\beta _z>0\}\) and define \(\delta =\min _{z\in \hat{\mathcal{R }}_0}\beta _z>0\).

Fix \(\varepsilon \in (0,\delta /2)\) and take \(j\) large enough so that \(|\beta _z^{(j)}-\beta _z|<\varepsilon \) for all \(z\in \mathcal{R }_0\). Let \(m_n=\lceil (1+4\varepsilon /\delta )n\rceil \) and \(s_z^{(n)}=\lfloor m_n\beta _z^{(j)}\rfloor + b_z^{(n)}(\zeta _j)-\lfloor n\beta _z\rfloor -b_z^{(n)}(\zeta )\) for \(z\in \mathcal{R }_0\). (If \(\beta _z=\beta _z^{(j)}=0\), then simply set \(s_z^{(n)}=0\).) Then, for \(n\) large enough, \(s_z^{(n)}\ge 0\) for each \(z\in \mathcal{R }_0\). Now, proceed as in the proof of (2.21), by finding a path from \({\hat{x}}_n(\zeta )\) to \({\hat{x}}_{m_n}(\zeta _j)\). After taking \(n\rightarrow \infty \), \(j\rightarrow \infty \), then \(\varepsilon \rightarrow 0\), we arrive at

$$\begin{aligned} {\mathop {\underline{\text{ lim }}}\limits _{\mathcal{U }\ni \zeta ^{\prime }\rightarrow \zeta }}\Lambda _\ell (g,\zeta ^{\prime })\ge \Lambda _\ell (g,\zeta ). \end{aligned}$$

Note that here random limit values are perfectly acceptable.

Remark 2.11

We can see here why upper semicontinuity (and hence continuity to the boundary) may in principle not hold: constructing a path from \(\zeta _j\) to \(\zeta \) is not necessarily possible since \(\zeta _j\) may have non-zero components on \(\mathcal{R }_0\backslash \hat{\mathcal{R }}_0\).

By lower semicontinuity the supremum in (2.5) can be restricted to \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\). By Theorem 2.4 \(\Lambda _\ell (g,\zeta )\) is deterministic on \(\mathrm{ri}\,\,\mathcal{U }\) under an ergodic \(\mathbb{P }\), and consequently \(\Lambda _\ell (g)\) is deterministic.

Combining Theorems 2.2 and 2.4 and the paragraphs above, we now know that under an ergodic \(\mathbb{P }\), we have the function \(-\infty < \Lambda _\ell (g,\zeta ,\omega )\le \infty \), \(\mathbb{P }\)-a.e. defined, lower semicontinuous for \(\zeta \in \mathcal{U }\) and concave and deterministic for \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\). Lower semicontinuity and compactness of \(\mathcal{U }\) imply that \(\Lambda _\ell (g,\cdot \,,\omega )\) is uniformly bounded below with a bound that can depend on \(\omega \).

Assume now that \(\Lambda _\ell (g)<\infty \). Then upper boundedness of \(\Lambda _\ell (g,\cdot \,,\omega )\) comes from (2.5). As a finite concave function \(\Lambda _\ell (g,\cdot )\) is continuous on the convex open set \(\mathrm{ri}\,\,\mathcal{U }\). Since it is bounded below, by [35, Theorem 10.3] \(\Lambda _\ell (g,\cdot )\) has a unique continuous extension from the relative interior to the whole of \(\mathcal{U }\). This extension is deterministic since it comes from a deterministic function on \(\mathrm{ri}\,\,\mathcal{U }\). To see that this extension agrees with the upper semicontinuous regularization, consider this general situation.

Let \(f\) be a bounded lower semicontinuous function on \(\mathcal{U }\) that is concave on \(\mathrm{ri}\,\,\mathcal{U }\). Let \(g\) be the continuous extension of \(f\vert _{\mathrm{ri}\,\,\mathcal{U }}\) and \(h\) the upper semicontinuous regularization of \(f\) on \(\mathcal{U }\). For \(x\) on the relative boundary find \(\mathrm{ri}\,\,\mathcal{U }\ni x_n\rightarrow x\). Then \(g(x) = \lim g(x_n) = \lim f(x_n) \ge f(x)\) and so \(f \le g\) and consequently \(h \le g\). Also \(g(x)= \lim g(x_n) = \lim f(x_n) = \lim h(x_n) \le h(x)\) and so \(g \le h\).

Finally we check part (a) of the theorem. If \(\Lambda _\ell (g)=\infty \) then there exists a sequence \(\zeta _n\in \mathrm{ri}\,\,\mathcal{U }\) such that \(\Lambda _\ell (g,\zeta _n)\rightarrow \infty \). One can assume \(\zeta _n\rightarrow \zeta \in \mathcal{U }\). Let \(\zeta ^{\prime }\) be any point in \(\mathrm{ri}\,\,\mathcal{U }\). Pick \(t\in (0,1)\) small enough for \(\zeta ^{\prime \prime }_n=(\zeta ^{\prime }-t\zeta _n)/(1-t)\) to be in \(\mathrm{ri}\,\,\mathcal{U }\) for \(n\) large enough. Then,

$$\begin{aligned} \Lambda _\ell (g,\zeta ^{\prime })\ge t\Lambda _\ell (g,\zeta _n)+(1-t)\Lambda _\ell (g,\zeta ^{\prime \prime }_n). \end{aligned}$$

Since \(\Lambda _\ell (g,\cdot )\) is bounded below on \(\mathrm{ri}\,\,\mathcal{U }\), taking \(n\rightarrow \infty \) in the above display implies that \(\Lambda _\ell (g,\zeta ^{\prime })=\infty \).

3 Continuity in the i.i.d. case

We begin with \(L^p\) continuity of the free energy in the potential \(g\).

Lemma 3.1

Let \(\mathcal{U }_0\) be a face of  \(\mathcal{U }\) (the choice \(\mathcal{U }_0=\mathcal{U }\) is allowed), and let \(\mathcal{R }_0=\mathcal{R }\cap \mathcal{U }_0\) so that \(\mathcal{U }_0=\mathrm{co}\,\mathcal{R }_0\). Assume \(0\not \in \mathcal{U }_0\). Then an admissible \(n\)-step path from \(0\) to a point in \(n\mathcal{U }_0\) cannot visit the same point twice.

  1. (a)

    Let \(h\ge 0\) be a measurable function on \(\Omega \) with the \(r_0\)-separated i.i.d. property. Then there is a constant \(C=C(r_0,d, M)\) such that, \(\mathbb{P }\)-almost surely,

    $$\begin{aligned} {\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} \max _{\begin{array}{c} x_{0,n-1}: \\ x_k-x_{k-1}\in \mathcal{R }_0 \end{array}} n^{-1} \sum _{k=0}^{n-1} h(T_{x_k}\omega ) \le C \int _0^\infty \mathbb{P }\{ h\ge s\}^{1/d}\,ds . \end{aligned}$$
    (3.1)

    If \(h\in L^p(\mathbb{P })\) for some \(p>d\) then the right-hand side of (3.1) is finite by Chebyshev’s inequality.

  2. (b)

    Let \(f, g:\varvec{\Omega }_\ell \rightarrow \mathbb{R }\) be measurable functions with the \(r_0\)-separated i.i.d. property. Then with the same constant \(C\) as in (3.1)

    $$\begin{aligned}&{\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} \sup _{\zeta \in \mathcal{U }_0}\, \Bigl |n^{-1}\log E\big [e^{nR_{n}^{\ell }(f)}{\small 1}\!\!1\{X_{n}={\hat{x}}_n(\zeta )\}\big ]\nonumber \\&\qquad \qquad \qquad \qquad \;-\; n^{-1}\log E\big [e^{nR_{n}^{\ell }(g)}{\small 1}\!\!1\{X_{n}={\hat{x}}_n(\zeta )\}\big ] \Bigr |\nonumber \\&\qquad \le C \int _0^\infty \mathbb{P }\Bigl \{\omega : \max _{z_{1,\ell }\in \mathcal{R }^\ell }\left| f(\omega ,z_{1,\ell })-g(\omega ,z_{1,\ell })\right| \ge s\Bigl \}^{1/d}\,ds . \end{aligned}$$
    (3.2)

Assume additionally that \(f(\cdot \,,z_{1,\ell })\), \(g(\cdot \,,z_{1,\ell })\in L^p(\mathbb{P })\) \(\forall z_{1,\ell }\in \mathcal{R }^\ell \) for some \(p>d\). Then \(f,g\in \mathcal{L }\) and for \(\zeta \in \mathcal{U }_0\) the limits \(\Lambda _\ell (f,\zeta )\) and \(\Lambda _\ell (g,\zeta )\) are finite and deterministic and satisfy

$$\begin{aligned} \sup _{\zeta \in \mathcal{U }_0}\, \left| \Lambda _\ell (f,\zeta )-\Lambda _\ell (g,\zeta )\right| \, \le \, C\mathbb{E }\Bigl [\; \max _{z_{1,\ell }\in \mathcal{R }^\ell }\left| f(\omega ,z_{1,\ell })-g(\omega ,z_{1,\ell })\right| ^p\Bigl ].\quad \end{aligned}$$
(3.3)

Strengthen the assumptions further with \(0\notin \mathcal{U }\). Then \(\Lambda _\ell (f)\) and \(\Lambda _\ell (g)\) are finite and deterministic and satisfy

$$\begin{aligned} \left| \Lambda _\ell (f)-\Lambda _\ell (g)\right| \, \le \, C\mathbb{E }\Bigl [\; \max _{z_{1,\ell }\in \mathcal{R }^\ell }\left| f(\omega ,z_{1,\ell })-g(\omega ,z_{1,\ell })\right| ^p\Bigl ]. \end{aligned}$$
(3.4)

Proof

If \(x\in n\mathcal{U }_0\) and \(x=\sum _{i=1}^n z_i\) gives an admissible path to \(x\), then \(n^{-1}x=n^{-1} \sum _{i=1}^n z_i\) gives a convex representation of \(n^{-1}x\in \mathcal{U }_0\) which then cannot use points \(z\in \mathcal{R }\backslash \mathcal{R }_0\). By the assumption \(0\notin \mathcal{U }_0\), points from \(\mathcal{R }_0\) cannot sum to \(0\) and consequently a loop in an \(\mathcal{R }_0\)-path is impossible.

Part (a) We can assume that \(r_0>M=\max \{\left| z\right| :z\in \mathcal{R }\}\). We bound the quantity on the left of (3.1) with a greedy lattice animal [12, 14, 26] after a suitable coarse graining of the lattice. Let \(B=\{0,1,\cdots , r_0-1\}^d\) be the cube whose copies \(\{r_0y+B: y\in \mathbb{Z }^d\}\) tile the lattice. Let \(\mathcal{A }_n\) denote the set of connected subsets \(\xi \) of \(\mathbb{Z }^d\) of size \(n\) that contain the origin (lattice animals). Since the \(x_k\)’s are distinct,

$$\begin{aligned} \sum _{k=0}^{n-1} h(T_{x_k}\omega )&= \sum _{u\in B}\sum _{y\in \mathbb{Z }^d} \sum _{k=0}^{n-1} {\small 1}\!\!1_{\{x_k=r_0y+u\}} h(T_{r_0y+u}\omega ) \\&\le \sum _{u\in B}\sum _{y\in \mathbb{Z }^d} {\small 1}\!\!1_{\{x_{0,n-1}\cap (r_0y+B)\ne \emptyset \}} h(T_{u+r_0y}\omega ) \\&\le \sum _{u\in B} \max _{\xi \in \mathcal{A }_{n(d-1)}} \sum _{y\in \xi } h(T_{u+r_0y}\omega ). \end{aligned}$$

The last step works as follows. Define first a vector \(y_{0,n-1}\in (\mathbb{Z }^{d})^n\) from the conditions \(x_i\in r_0y_i+B\), \(0\le i<n\). Since \(r_0\) is larger than the maximal step size \(M\), \(\left| y_{i+1}-y_i\right| _\infty \le 1\). Points \(y_i\) and \(y_{i+1}\) may fail to be nearest neighbors, but by filling in at most \(d-1\) intermediate points we get a nearest-neighbor sequence. This sequence can have repetitions and can have fewer than \(n(d-1)\) entries, but it is contained in some lattice animal \(\xi \) of \(n(d-1)\) lattice points.

We can assume that the right-hand side of (3.1) is finite. This and the fact that \(\{ h(T_{u+r_0y}\omega ) : y\in \mathbb{Z }^d\}\) are i.i.d. allows us to apply limit (1.7) of Theorem 1.1 in [26]: for a finite constant \(c\) and \(\mathbb{P }\)-a.s.

$$\begin{aligned} {\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }} \max _{\begin{array}{c} x_{0,n-1}: \\ x_k-x_{k-1}\in \mathcal{R }_0 \end{array}} n^{-1} \sum _{k=0}^{n-1} h(T_{x_k}\omega ) \le \left| B\right| (d-1) c \int _0^\infty \mathbb{P }\{ h\ge s\}^{1/d}\,ds . \end{aligned}$$

With the volume \(\left| B\right| =r_0^d\) this gives (3.1).

Part (b) Write \(f=g+(f-g)\) in the exponent to get an estimate, uniformly in \(\zeta \in \mathcal{U }_0\):

$$\begin{aligned}&n^{-1}\log E\big [e^{nR_{n}^{\ell }(f)}{\small 1}\!\!1\{X_{n}\!=\!{\hat{x}}_n(\zeta )\}\big ]\nonumber \\&\qquad \le n^{-1}\log E\big [e^{nR_{n}^{\ell }(g)}{\small 1}\!\!1\{X_{n}\!=\!{\hat{x}}_n(\zeta )\}\big ]\nonumber \\&\qquad \qquad + \max _{\begin{array}{c} x_{0,n+\ell -1}: x_k\!-\!x_{k-1}\in \mathcal{R }_0 \end{array}} n^{-1} \sum _{k=0}^{n-1} \left| f(T_{x_k}\omega , z_{k+1,k+\ell }) \!-\! g(T_{x_k}\omega , z_{k+1,k+\ell }) \right| .\nonumber \\ \end{aligned}$$
(3.5)

Switch the roles of \(f\) and \(g\) to get a bound on the absolute difference. Apply part (a) to get (3.2).

By Lemma A.4 of [34] the \(L^p\) assumption with \(p>d\) implies that \(f, g\in \mathcal{L }\). Finiteness of \(\Lambda _\ell (f,\zeta )\) comes from (3.2) with \(g=0\). Chebyshev’s inequality bounds the right-hand side of (3.2) with the right-hand side of (3.3).

To get (3.4) start with (3.5) without the indicators inside the expectations and with \(\mathcal{R }_0\) replaced by \(\mathcal{R }\). \(\square \)

Next the continuity of \(\Lambda _\ell (g,\zeta )\) as a function of \(\zeta \) all the way to the relative boundary in the i.i.d. case. The main result is part (a) below. Parts (b) and (c) come without extra work.

Theorem 3.2

Let \(\mathbb{P }\) be an i.i.d. product measure as described in Example 1.1 and \(p>d\). Let \(g:\varvec{\Omega }_\ell \rightarrow \mathbb{R }\) be a function such that for each \(z_{1,\ell }\in \mathcal{R }^\ell \), \(g(\cdot ,z_{1,\ell }) \) is a local function of \(\omega \) and a member of \(L^p(\mathbb{P })\).

  1. (a)

    If \(0\not \in \mathcal{U }\), then \(\Lambda _\ell (g,\zeta )\) is continuous on \(\mathcal{U }\).

  2. (b)

    If \(0\in \mathrm{ri}\,\,\mathcal{U }\) and \(g\) is bounded above, then \(\Lambda _\ell (g,\zeta )\) is continuous on \(\mathcal{U }\).

  3. (c)

    If \(0\) is on the relative boundary of \(\mathcal{U }\) and if \(g\) is bounded above, then \(\Lambda _\ell (g,\zeta )\) is continuous on \(\mathrm{ri}\,\,\mathcal{U }\), at nonzero extreme points of \(\mathcal{U }\), and at any point \(\zeta \) such that the face \(\mathcal{U }_0\) satisfying \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\) does not contain \(\{0\}\).

In (b) and (c) we assume \(g\) bounded above because otherwise \(\Lambda _\ell (g)=\infty \) is possible. If \(g\) is unbounded above and a function of \(\omega \) alone and if admissible paths can form loops, then \(\Lambda _\ell (g)=\infty \) because the walk can look for arbitrarily high values of \(g(T_x\omega )\) and keep returning to \(x\) forever. Then by Theorem 2.6(a) also \(\Lambda _\ell (g,\zeta )=\infty \) for all \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\).

In certain situations our proof technique can be pushed up to faces that include \(0\). For example, for \(\mathcal{R }=\{(1,0),(0,1),(0,0)\}\) \(\Lambda _\ell (g,\zeta )\) is continuous in \(\zeta \in \mathcal{U }\backslash \{0\}\).

Proof of Theorem 3.2

This continuity argument was inspired by the treatment of the case \(\mathcal{R }=\{e_1,\cdots , e_d\}\) in [15, 27].

By Lemma A.4 of [34] the \(L^p\) assumption with \(p>d\) implies that \(g\in \mathcal{L }\). By Lemma 3.1 in case (a), and by the upper bound assumption in the other cases, \(\Lambda _\ell (g)<\infty \). Thereby \(\Lambda _\ell (g,\cdot )\) is bounded on \(\mathcal{U }\) and continuous on \(\mathrm{ri}\,\,\mathcal{U }\) (Theorem 2.6). Since \(\Lambda _\ell (g,\cdot )\) is lower semicontinuous, it suffices to prove upper semicontinuity at the relative boundary of \(\mathcal{U }\). Let \(\zeta \) be a point on the relative boundary of \(\mathcal{U }\).

We begin by reducing the proof to the case of a bounded \(g\). We can approximate \(g\) in \(L^p\) with a bounded function. In part (a) we can apply (3.3) to \(\mathcal{U }_0=\mathcal{U }\). Then the uniformity in \(\zeta \) of (3.3) implies that it suffices to prove upper semicontinuity in the case of bounded \(g\). In parts (b) and (c) \(g\) is bounded above to begin with. Assume that upper semicontinuity has been proved for the bounded truncation \(g_c=g\vee c\). Then

$$\begin{aligned} {\mathop {\overline{\text{ lim }}}\limits _{\zeta ^{\prime }\rightarrow \zeta }}\Lambda _\ell (g,\zeta ^{\prime })\le {\mathop {\overline{\text{ lim }}}\limits _{\zeta ^{\prime }\rightarrow \zeta }}\Lambda _\ell (g_c,\zeta ^{\prime }) \le \Lambda _\ell (g_c,\zeta ). \end{aligned}$$

In cases (b) and (c) the unique face \(\mathcal{U }_0\) that contains \(\zeta \) in its relative interior does not contain \(0\), and we can apply (3.3) to show that \(\Lambda _\ell (g_c,\zeta )\) decreases to \(\Lambda _\ell (g,\zeta )\) which proves upper semicontinuity for \(g\). We can now assume \(g\) is bounded, and by subtracting a constant we can assume \(g\le 0\).

We only prove upper semicontinuity away from the extreme points of \(\mathcal{U }\). The argument for the extreme points of \(\mathcal{U }\) is an easier version of the proof.

Assume thus that the point \(\zeta \) on the boundary of  \(\mathcal{U }\) is not an extreme point. Let  \(\mathcal{U }_0\) be the unique face of \(\mathcal{U }\) such that \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\). Let \(\mathcal{R }_0=\mathcal{R }\cap \mathcal{U }_0\). Then \(\mathcal{U }_0=\mathrm{co}\,\mathcal{R }_0\) and any convex representation \(\zeta =\sum _{z\in \mathcal{R }}\beta _zz\) of \(\zeta \) can only use \(z\in \mathcal{R }_0\) [35, Theorems 18.1 and 18.3].

The theorem follows if we show that for any fixed \(\delta >0\) and \(\xi \in \mathbb{Q }^d\cap \mathcal{U }\) close enough to \(\zeta \) and for \(k\in \mathbb{N }\) such that \(k\xi \in \mathbb{Z }^d\),

$$\begin{aligned} \lim _{m\rightarrow \infty }\mathbb{P }\bigg \{\sum _{x_{0,mk+\ell }\in \Pi _{mk,mk\xi }}\!\!\!\!\!\!\! e^{mkR_{mk}^{\ell }(g)}\ge e^{mk(\Lambda _\ell (g,\zeta )+\log |\mathcal{R }|)+6mk\delta }\bigg \}=0. \end{aligned}$$
(3.6)

Here we used the approximation by rational points (2.21). \(\Pi _{mk,mk\xi }\) is the set of admissible paths \(x_{0,mk+\ell }\) such that \(x_0=0\) and \(x_{mk}=mk\xi \). It is enough to approach \(\zeta \) from outside \(\mathcal{U }_0\) because continuity on \(\mathrm{ri}\,\,\mathcal{U }_0\) is guaranteed by concavity. Fix \(\delta >0\).

Since \(0\notin \mathcal{U }_0\) we can find a vector \({\hat{u}}\in \mathbb{Z }^d\) such that \(z\cdot {\hat{u}}>0\) for \(z\in \mathcal{R }_0\).

Given a path \(x_{0,mk+\ell }\) let \(s_0=0\) and, if it exists, let \(s^{\prime }_0\ge 0\) be its first regeneration time: this is the first time \(i\in [0,mk]\) such that \(x_j\cdot {\hat{u}}\le x_i\cdot {\hat{u}}\) for \(j\le i\), \(z_{i+1,i+\ell }\in \mathcal{R }_0^\ell \), and \(x_j\cdot {\hat{u}}>x_{i+\ell }\cdot {\hat{u}}\) for \(j\in \{i+\ell +1,\cdots , mk+\ell \}\). If \(s^{\prime }_0\) does not exist then we set \(s^{\prime }_0=mk+\ell \) and stop at that. Otherwise, if \(s^{\prime }_0\) exists, then let

$$\begin{aligned} s_1&= \min \{j\in (s^{\prime }_0,mk+\ell ):z_{j+1}\not \in \mathcal{R }_0\\&\text{ or } \exists i\in (j+1,mk+\ell ] \text{ such } \text{ that } x_i\cdot {\hat{u}}\le x_{j+1}\cdot {\hat{u}}\}. \end{aligned}$$

If such a time does not exist, then we set \(s_1=s^{\prime }_1=mk+\ell \) and stop. Otherwise, define \(s_1<s^{\prime }_1<s_2<s^{\prime }_2<\cdots \) inductively. Path segments \(x_{s^{\prime }_i, s_{i+1}}\) are good and segments \(x_{s_i,s_i^{\prime }}\) are bad (the paths in the gray blocks in Fig.  1). Good segments have length at least \(\ell \) and consist of only \(\mathcal{R }_0\)-steps, and distinct good segments lie in disjoint slabs (a slab is a portion of \(\mathbb{Z }^d\) between two hyperplanes perpendicular to \({\hat{u}}\)).

Fig. 1
figure 1

Path segments in shaded regions are bad, the other segments are good. \(v_i=X_{s_i}\) and \(v_i^{\prime }=X_{s_i^{\prime }}\). Steps going up and to the right represent steps in \(\mathcal{R }_0\)

Time \(mk+\ell \) may belong to an incomplete bad segment and then in the above procedure the last time defined was \(s_N<mk+\ell \) for some \(N\ge 0\) and we set \(s^{\prime }_{N}=mk+\ell \), or to a good segment in which case the last time defined was \(s^{\prime }_{N-1}\le mk\) for some \(N\ge 1\) and we set \(s_{N}=s^{\prime }_{N}=mk+\ell \). There are \(N\) good segments and \(N+1\) bad segments, when we admit possibly degenerate first and last bad segments \(x_{s_0,s^{\prime }_0}\) and \(x_{s_N,s^{\prime }_N}\) (a degenerate segment has no steps). Except possibly for \(x_{s_0,s^{\prime }_0}\) and \(x_{s_N,s^{\prime }_N}\), each bad segment has at least one \((\mathcal{R }\backslash \mathcal{R }_0)\)-step. \(\square \)

Lemma 3.3

Given \(\varepsilon >0\), we can choose \(\varepsilon _0\in (0,\varepsilon )\) such that if \(|\xi -\zeta |<\varepsilon _0\), then the total number of steps in the bad segments in any path in \(\Pi _{mk,mk\xi }\) is at most \(C\varepsilon mk\) for a constant \(C\). In particular, \(N\le C\varepsilon mk\).

Proof

Given \(\varepsilon >0\) we can find \(\varepsilon _0>0\) such that if \(|\xi -\zeta |<\varepsilon _0\), then any convex representation \(\xi =\sum _{z\in \mathcal{R }}\alpha _z z\) of \(\xi \) satisfies \(\sum _{z\not \in \mathcal{R }_0}\alpha _z\le \varepsilon \). (Otherwise we can let \(\xi \rightarrow \zeta \) and in the limit \(\zeta \) would possess a convex representation with positive weight on \(\mathcal{R }\backslash \mathcal{R }_0\).) Consequently, if \(x_{0,mk+\ell }\in \Pi _{mk,mk\xi }\) and \(|\xi -\zeta |<\varepsilon _0\) the number of \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps in \(x_{0,mk+\ell }\) is bounded by \(\varepsilon mk+\ell \).

Hence it is enough to show that in each bad segment, the number of \(\mathcal{R }_0\)-steps is at most a constant multiple of \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps. So consider a bad segment \(x_{s_i,s_i^{\prime }}\). If \(s^{\prime }_i=mk+\ell \) it can happen that \( x_{s^{\prime }_i}\cdot {\hat{u}}<\max _{s_i\le j\le s_i^{\prime }}x_j\cdot {\hat{u}}. \) In this case we add more steps from \(\mathcal{R }_0\) and increase \(s^{\prime }_i\) so that

$$\begin{aligned} x_{s^{\prime }_i}\cdot {\hat{u}}=\max _{s_i\le j\le s_i^{\prime }}x_j\cdot {\hat{u}}. \end{aligned}$$
(3.7)

This only makes things worse by increasing the number of \(\mathcal{R }_0\)-steps. We proceed now by assuming (3.7).

Start with \(\gamma _0=s_i\). Let

$$\begin{aligned} \alpha _1=\, s_i^{\prime }\,\wedge \, \inf \{ n\ge \gamma _0: \exists j>n \text{ such } \text{ that } x_j\cdot {\hat{u}}\le x_n\cdot {\hat{u}}\}. \end{aligned}$$

We first control the number of \(\mathcal{R }_0\)-steps in the segment \(z_{\gamma _0+1,\alpha _1}\). The segment \(z_{\gamma _0+1,\alpha _1-1}\) cannot contain more than \(\ell -1\) \(\mathcal{R }_0\)-steps in a row because any \(\ell \)-string of \(\mathcal{R }_0\)-steps would have begun the next good segment. Thus, the number of \(\mathcal{R }_0\)-steps in \(z_{\gamma _0+1,\alpha _1}\) is bounded by \((\ell -1)\) \(\times \) (the number of \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps) \(+\) \(\ell \). Suppose \(\alpha _1=s_i^{\prime }\), in other words, we already exhausted the entire bad segment. Since a bad segment contains at least one \((\mathcal{R }\backslash \mathcal{R }_0)\)-step we are done: the number of \(\mathcal{R }_0\)-steps is bounded by \(2\ell \) times the number of \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps. So let us suppose \(\alpha _1<s_i^{\prime }\) and continue with the segment \(x_{\alpha _1,s_i^{\prime }}\).

Let

$$\begin{aligned} \beta _1=\inf \{ n>\alpha _1: x_n\cdot {\hat{u}}\le x_{\alpha _1}\cdot {\hat{u}}\}\le s_i^{\prime } \end{aligned}$$

be the time of the first backtrack after \(\alpha _1\) and

$$\begin{aligned} \gamma _1=\inf \Bigl \{ n>\beta _1: x_n\cdot {\hat{u}}\ge \max _{\alpha _1\le j\le \beta _1} x_j\cdot {\hat{u}}\Bigl \} \end{aligned}$$

the time when the path gets at or above the previous maximum. Due to (3.7), \(\gamma _1\le s_i^{\prime }\).

We claim that in the segment \(x_{\alpha _1,\gamma _1}\) the number of positive steps (in the \({\hat{u}}\)-direction) is at most a constant times the number of nonpositive steps. Since \(\mathcal{R }_0\)-steps are positive steps while all nonpositive steps are \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps, this claim gives the dominance (number of \(\mathcal{R }_0\)-steps) \(\le \) \(C\) \(\times \) (number of \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps).

The claim is proved by counting. Project all steps \(z\) onto the \({\hat{u}}\) direction by considering \(z\cdot {\hat{u}}\), so that we can think of a path on the 1 dimensional lattice. Then, instead of the original steps that come in various sizes, count increments of \(\pm 1\). Up to constant multiples, counting unit increments is the same as counting steps. By the definition of the stopping times, at time \(\beta _1\) the segment \(x_{\alpha _1,\gamma _1}\) visits a point at or below its starting level, but ends up at a new maximum level at time \(\gamma _1\). Ignore the part of the last step \(z_{\gamma _1}\) that takes the path above the previous maximum \(\max _{\alpha _1\le j\le \beta _1} x_j\cdot {\hat{u}}\). Then each negative unit increment in the \({\hat{u}}\)-direction is matched by at most two positive unit increments. (Project the right-hand picture in Fig. 2 onto the vertical \({\hat{u}}\) direction.)

Fig. 2
figure 2

Illustration of the stopping times \(\alpha _i\), \(\beta _i\), and \(\gamma _i\). Note how the immediate backtracking at \(\gamma _1\) makes \(\alpha _2=\gamma _1\) and \(\beta _2=\alpha _2+1\)

Since the segment \(x_{\alpha _1,\gamma _1}\) must have at least one \((\mathcal{R }\backslash \mathcal{R }_0)\)-step, we have shown that the number of \(\mathcal{R }_0\)-steps in the segment \(x_{\gamma _0,\gamma _1}\) is bounded above by \(2(C\vee \ell )\) \(\times \) (number of \((\mathcal{R }\backslash \mathcal{R }_0)\)-steps). Now repeat the previous argument, beginning at \(\gamma _1\). Eventually the bad segment \(x_{s_i,s_i^{\prime }}\) is exhausted. \(\square \)

Let \(\mathbf{v}\) denote the collection of times \(0=s_0\le s^{\prime }_0<s_1<s^{\prime }_1<s_2<s^{\prime }_2<\cdots <s_{N-1}<s^{\prime }_{N-1}<s_{N}\le s^{\prime }_{N}=mk+\ell \), positions \(v_i=x_{s_i}\), \(v^{\prime }_i=x_{s^{\prime }_i}\), and the steps in bad path segments \(u^{(i)}_{s_i,s^{\prime }_i}=z_{s_i+1,s^{\prime }_i}\). \(s_0=s^{\prime }_0\) means \(u^{(0)}\) is empty.

We use the following simple fact below. Using Stirling’s formula one can find a function \(h(\varepsilon )\searrow 0\) such that, for all \(\varepsilon >0\) and \(n\ge \varepsilon ^{-1}\), \(\genfrac(){0.0pt}{}{n}{n\varepsilon }\le e^{nh(\varepsilon )}\).

Lemma 3.4

With \(\varepsilon >0\) fixed in Lemma 3.3, and with \(m\) large enough, the number of vectors \(\mathbf{v}\) is at most \(C(mk)^{c_1}e^{mk h(\varepsilon )}\), where the function \(h\) satisfies \(h(\varepsilon )\rightarrow 0\) as \(\varepsilon \rightarrow 0\).

Proof

Recall \(N\le C\varepsilon mk\) for a constant \(C\) coming from Lemma 3.3. We take \(\varepsilon >0\) small enough so that \(C\varepsilon <1/2\). A vector \(\mathbf{v}\) is determined by the following choices.

  1. (i)

    The times \(\{s_i, s^{\prime }_i\}_{0\le i\le N}\) can be chosen in at most

    $$\begin{aligned} \sum _{N=1}^{C\varepsilon mk}\genfrac(){0.0pt}{}{mk}{2N}\le Cmk\genfrac(){0.0pt}{}{mk}{C\varepsilon mk}\le C mk e^{mk h(\varepsilon )}\qquad \text{ ways. } \end{aligned}$$
  2. (ii)

    The steps in the bad segments, in a total of at most \(\left| \mathcal{R }\right| ^{C\varepsilon mk}\le e^{mk h(\varepsilon )}\) ways.

  3. (iii)

    The path increments \(\{v_{i}-v^{\prime }_{i-1}\}_{1\le i\le N}\) across the good segments. Their number is also bounded by \(C(mk)^{c_1}e^{mk h(\varepsilon )}\).

The argument for (iii) is as follows. For each finite \(\mathcal{R }_0\)-increment \(y\in \{ z_1+\cdots +z_k: k\in \mathbb{N }, \, z_1,\cdots , z_k\in \mathcal{R }_0\} \), fix a particular representation \(y=\sum _{z\in \mathcal{R }_0} a_z(y)z\), identified by the vector \(a(y)= (a_z(y))\in \mathbb{Z }_+^{\mathcal{R }_0}\). The number of possible endpoints \(\eta =\sum _{i=1}^N(v_{i}-v^{\prime }_{i-1})\) is at most \(C(\varepsilon mk)^d\) because \(|mk\xi -mk\zeta |<mk\varepsilon \) and the total number of steps in all bad segments is at most \(C\varepsilon mk\). Each possible endpoint \(\eta \) has at most \(C(mk)^{\left| \mathcal{R }_0\right| }\) representations \(\eta =\sum _{z\in \mathcal{R }_0} b_zz\) with \((b_z)\in \mathbb{Z }_+^{\mathcal{R }_0}\) because projecting to \({\hat{u}}\) shows that each \(b_z\) is bounded by \(Cmk\). Thus there are at most \(C(mk)^{c_1}\) vectors \((b_z)\in \mathbb{Z }_+^{\mathcal{R }_0}\) that can represent possible endpoints of the sequence of increments. Each such vector \(b=(b_z)\) can be decomposed into a sum of increments \(b=\sum _{i=1}^{N} a^{(i)}\) in at most

$$\begin{aligned} \prod _{z\in \mathcal{R }_0}\genfrac(){0.0pt}{}{b_z+N}{N} \le {\genfrac(){0.0pt}{}{Cmk+C\varepsilon mk}{C\varepsilon mk}}^{\left| \mathcal{R }_0\right| } \le e^{mk h(\varepsilon )} \end{aligned}$$

ways. (Note that \(\genfrac(){0.0pt}{}{a+b}{b}\) is increasing in both \(a\) and \(b\).) So all in all there are \(C(mk)^{c_1} e^{mk h(\varepsilon )} \) possible sequences \(\{a^{(i)}\}_{1\le i\le N}\) of increments in the space \(\mathbb{Z }_+^{\mathcal{R }_0}\) that satisfy

$$\begin{aligned} \sum _{z\in \mathcal{R }_0} \sum _{i=1}^{N} a_z^{(i)}z =\eta \qquad \text{ for } \text{ a } \text{ possible } \text{ endpoint } \eta \text{. } \end{aligned}$$

Map \(\{v_{i}-v^{\prime }_{i-1}\}_{1\le i\le N}\) to \(\{a(v_{i}-v^{\prime }_{i-1})\}_{1\le i\le N}\). This mapping is 1-1. The image is one of the previously counted sequences \(\{a^{(i)}\}_{1\le i\le N}\) because

$$\begin{aligned} \sum _{z\in \mathcal{R }_0} \sum _{i=1}^{N} a_z(v_{i}-v^{\prime }_{i-1}) z= \sum _{i=1}^{N} \sum _{z\in \mathcal{R }_0}a_z(v_{i}-v^{\prime }_{i-1}) z= \sum _{i=1}^{N} (v_{i}-v^{\prime }_{i-1}) = \eta . \end{aligned}$$

We conclude that there are at most \(C(mk)^{c_1} e^{mk h(\varepsilon )} \) sequences \(\{v_{i}-v^{\prime }_{i-1}\}_{1\le i\le N}\) of increments across the good segments. Point (iii) has been verified.

Multiplying counts (i)–(iii) proves the lemma.\(\square \)

Let \(\Pi _{mk,mk\xi }^{\mathbf{v}}\) denote the paths in \(\Pi _{mk,mk\xi }\) that are compatible with \(\mathbf{v}\), that is, paths that go through space-time points \((x_{s_i}, s_i)\), \((x_{s^{\prime }_i}, s^{\prime }_i)\) and take the specified steps in the bad segments. The remaining unspecified good segments connect \((x_{s^{\prime }_{i-1}}, s^{\prime }_{i-1})\) to \((x_{s_i}, s_i)\) with \(\mathcal{R }_0\)-steps, for \(1\le i\le N\).

Fix \(\varepsilon >0\) small enough so that for large \(m\), \(C(mk)^{c_1} e^{mkh(\varepsilon )}\le e^{mk\delta }\). Then our goal (3.6) follows if we show

$$\begin{aligned} \lim _{m\rightarrow \infty }\sum _{\mathbf{v}}\mathbb{P }\Bigl \{\sum _{x_{0,mk}\in \Pi _{mk,mk\xi }^\mathbf{v}}\!\!\!\! e^{mkR_{mk}^{\ell }(g)}\ge e^{mk(\Lambda _\ell (g,\zeta )+\log |\mathcal{R }|)+5mk\delta }\Bigl \}=0. \end{aligned}$$
(3.8)

Given a vector \(\mathbf{v}\) and an environment \(\omega \) define a new environment \(\omega ^\mathbf{v}\) by deleting the bad slabs and shifting the good slabs so that the good path increments \(\{v_{i}-v^{\prime }_{i-1}\}_{1\le i\le N}\) become connected. Here is a precise construction. First for \(x\cdot {\hat{u}}<0\) and \(x\cdot {\hat{u}}\ge \sum _{j=0}^{N-1}(v_{j+1}-v^{\prime }_j)\cdot {\hat{u}}\) sample \(\omega ^\mathbf{v}_x\) fresh (this part of space is irrelevant). For a point \(x\) in between pick \(i\ge 0\) such that

$$\begin{aligned} \sum _{j=1}^{i}(v_{j}-v^{\prime }_{j-1})\cdot {\hat{u}}\le x\cdot {\hat{u}}<\sum _{j=1}^{i+1}(v_{j}-v^{\prime }_{j-1})\cdot {\hat{u}}\end{aligned}$$

and put \(y=\sum _{j=1}^{i}(v_j-v^{\prime }_{j-1})\). Then set \(\omega ^\mathbf{v}_x=\omega _{v^{\prime }_i+x-y}\).

For a fixed \(\mathbf{v}\), each path \(x_{0,mk+\ell }\in \Pi _{mk,mk\xi }^\mathbf{v}\) is mapped in a 1-1 fashion to a new path \(x^{\mathbf{v}}_{0,\tau (\mathbf{v})+\ell -1}\) as follows. Set

$$\begin{aligned} \tau (\mathbf{v})=\sum _{j=1}^{N}(s_{j}-s^{\prime }_{j-1})-\ell . \end{aligned}$$

Given time point \(t\in \{0,\cdots , \tau (\mathbf{v})+\ell -1\}\) pick \(i\ge 0\) such that

$$\begin{aligned} \sum _{j=1}^{i}(s_{j}-s^{\prime }_{j-1})\le t<\sum _{j=1}^{i+1}(s_{j}-s^{\prime }_{j-1}). \end{aligned}$$

Then with \(s=\sum _{j=0}^{i}(s^{\prime }_j-s_j)\) and \(u=\sum _{j=0}^{i}(v^{\prime }_j-v_j)\) set \(x_t^{\mathbf{v}}=x_{t+s}-u\). This mapping of \(\omega \) and \(x_{0,mk+\ell }\) moves the good slabs of environments together with the good path segments so that \(\omega ^\mathbf{v}_{x^{\mathbf{v}}_t}=\omega _{x_{t+s}}\). (See Fig. 3.) The sum of the good increments that appeared in Lemma 3.4 is now

$$\begin{aligned} x_{\tau (\mathbf{v})+\ell }^{\mathbf{v}}=x_{s_{N}}-\sum _{j=0}^{N-1}(v^{\prime }_j-v_j) =v_{N}-\sum _{j=0}^{N-1}(v^{\prime }_j-v_j)=\sum _{j=1}^N(v_j-v^{\prime }_{j-1}). \end{aligned}$$

Define \(\eta (\mathbf{v})\in \mathcal{U }_0\) by

$$\begin{aligned} x^\mathbf{v}_{\tau (\mathbf{v})}=\tau (\mathbf{v})\eta (\mathbf{v}). \end{aligned}$$
Fig. 3
figure 3

Illustration of the construction. The shaded bad slabs of environments are deleted. The white good slabs are joined together and shifted so that the good path segments connect. So for example points \(v_1\) and \(v^{\prime }_1\) on the left are identified as \(v^{\prime \prime }_1\) on the right

Observe that \(|\tau (\mathbf{v})-mk|\) and \(|x^{\mathbf{v}}_{\tau (\mathbf{v})}-mk\xi |\) are (essentially) bounded by the total length of the bad segments and hence by \(C\varepsilon mk\). Moreover, due to total ergodicity \(\Lambda _\ell (g,\cdot )\) is concave on \(\mathcal{U }_0\) and hence continuous in its interior. Thus, we can choose \(\varepsilon >0\) small enough so that

$$\begin{aligned} mk\Lambda _\ell (g,\zeta )+mk\delta >\tau (\mathbf{v})\Lambda _\ell (g,\eta (\mathbf{v})). \end{aligned}$$

(3.8) would then follow if we show

$$\begin{aligned} {\mathop {\text{ lim }}\limits _{m\rightarrow \infty }}\sum _{\mathbf{v}}\mathbb{P }\bigg \{\sum _{x_{0,mk}\in \Pi _{mk,mk\xi }^\mathbf{v}} \!\!\!\!\!\! e^{mkR_{mk}^{\ell }(g)}\ge e^{\tau (\mathbf{v})(\Lambda _\ell (g,\eta (\mathbf{v}))+\log |\mathcal{R }|)+3mk\delta }\bigg \}=0. \end{aligned}$$

This, in turn, follows from showing

$$\begin{aligned}&\lim _{m\rightarrow \infty }\sum _{\mathbf{v}}\mathbb{P }\Bigg \{\sum _{x_{0,mk}\in \Pi _{mk,mk\xi }^\mathbf{v}} \!\!\!\!\!\! e^{\tau (\mathbf{v})R_{\tau (\mathbf{v})}^{\ell }(g)(\omega ^{\mathbf{v}},x^{\mathbf{v}}_{0,\tau (\mathbf{v})+\ell })} \nonumber \\&\qquad \qquad \qquad \quad \ge \; e^{\tau (\mathbf{v})(\Lambda _\ell (g,\eta (\mathbf{v}))+\log |\mathcal{R }|)+2mk\delta }\;\Bigg \}=0. \end{aligned}$$
(3.9)

To justify the step to (3.9), first delete all terms from

$$\begin{aligned} mkR_{mk}^{\ell }(g)=\sum _{i=0}^{mk-1}g(T_{x_i}\omega , z_{i+1,i+\ell }) \end{aligned}$$

that depend on \(\omega \) or \((z_i)\) outside of good slabs. Since \(g\le 0\) this goes in the right direction. The remaining terms can be written as \(\sum _i g(T_{x^{\mathbf{v}}_i}\omega ^{\mathbf{v}}, z^{\mathbf{v}}_{i+1,i+\ell })\) for a certain subset of indices \(i\in \{0,\cdots , \tau (\mathbf{v})-1\}\). Then add in the terms for the remaining indices to capture the entire sum

$$\begin{aligned} \tau (\mathbf{v})R_{\tau (\mathbf{v})}^{\ell }(g)(\omega ^{\mathbf{v}},x^{\mathbf{v}}_{0,\tau (\mathbf{v})+\ell }) =\sum _{i=0}^{\tau (\mathbf{v})-1} g(T_{x^{\mathbf{v}}_i}\omega ^{\mathbf{v}}, z^{\mathbf{v}}_{i+1,i+\ell }). \end{aligned}$$

The terms added correspond to terms that originally straddled good and bad segments. Hence since \(g\) is local in its dependence on both \(\omega \) and \(z_{1,\infty }\) there are at most \(C\varepsilon mk\) such terms. Since \(g\) is bounded, choosing \(\varepsilon \) small enough allows us to absorb all such terms into one \(mk\delta \) error.

Observing that \(\omega ^\mathbf{v}\) has the same distribution as \(\omega \), adding more paths in the sum inside the probability, and recalling that \(|\tau (\mathbf{v})-mk|\le Cmk\varepsilon \), we see that it is enough to prove

$$\begin{aligned} \lim _{m\rightarrow \infty }\sum _{\mathbf{v}}\mathbb{P }\Bigl \{\sum _{x_{0,\tau (\mathbf{v})}\in \Pi _{\tau (\mathbf{v}),\tau (\mathbf{v})\eta (\mathbf{v})}} \!\!\!\!\!\!\!\!\!\!\! e^{\tau (\mathbf{v})R_{\tau (\mathbf{v})}^{\ell }(g)}\ge e^{\tau (\mathbf{v})(\Lambda _\ell (g,\eta (\mathbf{v}))+\log |\mathcal{R }|)+\tau (\mathbf{v})\delta }\Bigl \}=0. \end{aligned}$$

By Lemma 3.4, concentration inequality Lemma 8.1, and \(\tau (\mathbf{v})\ge mk/2\), the sum of probabilities above is bounded by \(C(mk)^{c_1} e^{mkh(\varepsilon )-B\delta ^2mk/2}\le C(mk)^{c_1} e^{-(\delta _1-h(\varepsilon ))km}\) for another small positive constant \(\delta _1\). Choosing \(\varepsilon \) small enough shows convergence to \(0\) exponentially fast in \(m\).

We have verified the original goal (3.6) and thereby completed the proof of Theorem 3.2.

4 Quenched large deviations for the walk

Standing assumptions for this section are \(\mathcal{R }\subset \mathbb{Z }^d\) is finite and \((\Omega ,\mathfrak{S },\mathbb{P },\{T_z:z\in \mathcal{G }\})\) is a measurable ergodic dynamical system. The theorem below assumes \(\Lambda _\ell (g)\) finite; recall Remark 2.3 for conditions that guarantee this. We employ the following notation for lower semicontinuous regularization of a function of several variables:

$$\begin{aligned} F^{\text{ lsc }(x)}(x,y)= \lim _{r\searrow 0} \inf _{z: \left| z-x\right| <r} F(z,y), \end{aligned}$$

and analogously for upper semicontinuous regularization.

Theorem 4.1

Let \(\ell \ge 0\) and let \(g:\Omega \times \mathcal{R }^\ell \rightarrow \mathbb{R }\). Assume \(g\in \mathcal{L }\) and that \(\Lambda _\ell (g)\) is finite. Then for \(\mathbb{P }\)-a.e. \(\omega \), the distributions \(Q_n^{g,\omega }\{X_n/n\in \cdot \}\) on \(\mathbb{R }^d\) satisfy an LDP with deterministic rate function

$$\begin{aligned} I^g(\zeta ) = \Lambda _\ell (g)-\Lambda ^\mathrm{usc (\zeta )}_\ell (g,\zeta ). \end{aligned}$$
(4.1)

This means that the following bounds hold:

$$\begin{aligned}&{\mathop {\overline{\mathrm{lim}}}\limits _{n\rightarrow \infty }}n^{-1}\log Q_{n}^{g,\omega }\{X_n/n\in A\} \le -\inf _{\zeta \in A} I^g(\zeta )\ \text{ for } \text{ closed } A\subset \mathbb{R }^d \nonumber \\ \mathrm{and}&{\mathop {\underline{\mathrm{lim}}}\limits _{n\rightarrow \infty }}n^{-1}\log Q_{n}^{g,\omega }\{X_n/n\in O\} \ge -\inf _{\zeta \in O} I^g(\zeta ) \ \text{ for } \text{ open } O\subset \mathbb{R }^d. \end{aligned}$$
(4.2)

Rate function \(I^g:\mathbb{R }^d\rightarrow [0,\infty ]\) is convex, and on \(\mathcal{U }\) finite and continuous.

Proof of Theorem 4.1

Let \(O\subset \mathbb{R }^d\) be open, and \(\zeta \in \mathcal{U }\cap O\). Then \({\hat{x}}_n(\zeta )\in nO\) for large \(n\).

$$\begin{aligned}&{\mathop {\underline{\text{ lim }}}\limits _{n\rightarrow \infty }}n^{-1}\log Q_{n}^{g,\omega }\{X_n/n\in O\} \\&\quad \ge {\mathop {\underline{\text{ lim }}}\limits _{n\rightarrow \infty }} \Bigl \{ n^{-1}\log E\bigl [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n={\hat{x}}_n(\zeta )\}\bigr ] - n^{-1}\log E\bigl [e^{n R_n^{\ell }(g)} \bigr ] \Bigl \} \\&\quad = \Lambda _\ell (g,\zeta ) - \Lambda _\ell (g). \end{aligned}$$

A supremum over an open set does not feel the difference between a function and its upper semicontinuous regularization, and so we get the lower large deviation bound:

$$\begin{aligned} {\mathop {\underline{\text{ lim }}}\limits _{n\rightarrow \infty }}n^{-1}\log Q_{n}^{g,\omega }\{X_n/n\in O\} \ge - \inf _{\zeta \in O} \{ \Lambda _\ell (g) - \Lambda ^{\text{ usc }}_\ell (g,\zeta ) \}. \end{aligned}$$

For a closed set \(K\subset \mathbb{R }^d\) and \(\delta >0\) Lemma 2.9 implies

$$\begin{aligned} {\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }}n^{-1}\log Q_{n}^{g,\omega }\{X_n/n\in K\}&\le -\lim \limits _{\delta \searrow 0}\inf \limits _{\zeta \in K_\delta }\{\Lambda _\ell (g)-\Lambda _\ell (g,\zeta )\}\\&\le -\lim \limits _{\delta \searrow 0}\inf \limits _{\zeta \in K_\delta }\{\Lambda _\ell (g)-\Lambda ^\mathrm{{usc}}_\ell (g,\zeta )\}\\&= -\inf \limits _{\zeta \in K}\{\Lambda _\ell (g)-\Lambda _\ell ^\mathrm{{usc}}(g,\zeta )\}. \end{aligned}$$

The last limit \(\delta \searrow 0\) follows from the compactness of \(\mathcal{U }\). Properties of \(I^g\) follow from Theorem 2.6. \(\square \)

Remark 4.2

Since the rate function \(I^g\) is convex, it is the convex dual of the limiting logarithmic moment generating function

$$\begin{aligned} \sigma (t)= \lim _{n\rightarrow \infty } n^{-1}\log E^{Q_n^{g,\omega }}(e^{t\cdot X_n}) = \Lambda _\ell (g+t\cdot z_1) - \Lambda _\ell (g) \end{aligned}$$

on \(\mathbb{R }^d\). This gives the identity

$$\begin{aligned} -\Lambda _\ell ^\mathrm{usc}(g,\zeta )=\sup _{t\in \mathbb{R }^d}\{\zeta \cdot t-\Lambda _\ell (g+t\cdot z_1)\}. \end{aligned}$$
(4.3)

This identity can be combined with a variational representation for \(\Lambda _\ell (g+t\cdot z_1)\) from Theorem 2.3 from [34] to produce a representation for \(\Lambda _\ell ^\mathrm{usc}(g,\zeta )\).

As a corollary we state a level 1 LDP for RWRE (see Example 1.4).

Theorem 4.3

Let \(d\ge 1\). Consider RWRE on \(\mathbb{Z }^d\) in an ergodic environment with a finite set \(\mathcal{R }\subset \mathbb{Z }^d\) of admissible steps. Assume that \(g(\omega ,z)=\log p_{z}(\omega )\) is a member of \(\mathcal{L }\). Then there exists a continuous, convex rate function \(I:\mathcal{U }\rightarrow [0,\infty )\) such that, for \(\mathbb{P }\)-a.e. \(\omega \), the distributions \(Q^{\omega }\{X_n/n\in \cdot \,\}\) on \(\mathcal{U }\) satisfy an LDP with rate \(I\). For \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\), \(I(\zeta )\) is the limit of point probabilities:

$$\begin{aligned} I(\zeta )=-\lim _{n\rightarrow \infty } n^{-1}\log Q^\omega _0\{ X_n={\hat{x}}_n(\zeta )\} \quad \text{ a.s. } \end{aligned}$$
(4.4)

This theorem complements our level 3 quenched LDPs in [32, 34] with formula (4.4) and the continuity of the rate function, in particular in the case where \(0\not \in \mathcal{U }\) and \(g\) is unbounded (e.g. if \(\mathbb{P }\) has enough mixing and \(g\) enough moments). To put the theorem in perspective we give a quick tour of the history of quenched large deviation theory of RWRE.

The development began with the quenched level 1 LDP of Greven and den Hollander [17] for the one-dimensional elliptic nearest-neighbor i.i.d. case (\(d=1\), \(\mathcal{R }=\{-1,+1\}\), and \(g\) bounded). Their proof utilized an auxiliary branching process. The LDP was extended to the ergodic case by Comets et al. [6], using hitting times. Both results relied on the possibility of explicit computations in the one-dimensional nearest-neighbor case (which in particular implies \(0\in \mathcal{U }\)). When \(d\ge 2\) Zerner [46] used a subadditivity argument for certain passage times to prove the level 1 LDP in the nearest-neighbor i.i.d. nestling case with \(g\in L^d\). The nestling assumption (\(0\) belongs to the convex hull of the support of \(\sum _z zp_z(\omega )\), and thus in particular \(0\in \mathcal{U }\)) was crucial for Zerner’s argument. Later, Varadhan [41] used subadditivity directly to get the result for a general ergodic environment with finite step size, \(0\in \mathcal{U }\), and bounded \(g\).

Subadditivity methods often fail to provide formulas for rate functions. Rosenbluth [36] used the point of view of the particle, following ideas of Kosygina et al. [22] for diffusions with random drift, and gave an alternative proof of the quenched level 1 LDP along with two variational formulas for the rate function. The assumptions were that the walk is nearest-neighbor, \(\mathbb{P }\) is ergodic, and \(g\in L^p\) for some \(p>d\). That the walk is nearest-neighbor in [36] is certainly not a serious obstacle and can be replaced with a finite \(\mathcal{R }\) as long as \(0\in \mathcal{U }\). Yılmaz [43] extended the quenched LDP and rate function formulas to a univariate level 2 quenched LDP and Rassoul-Agha and Seppäläinen [32] extended further to level 3 results.

All the past results mentioned above are for cases with \(0\in \mathcal{U }\). This restriction eliminates natural important models such as the space-time case. When \(0\not \in \mathcal{U }\), a crucial uniform integrability estimate fails and the method of [22, 32, 36, 43] breaks down. For diffusions in time-dependent but bounded random potentials this issue was resolved by Kosygina and Varadhan [23]. For random polymers and RWRE the way around this problem was found by Rassoul-Agha, Seppäläinen, and Yılmaz [34] who proved a quenched level 3 LDP with potential \(g\in \mathcal{L }\) even when \(0\not \in \mathcal{U }\).

For the precise location of the difficulty see step 5 on page 833 of [23] and the proof of Lemma 2.13 of [34]. In a separate work [4] we showed that the method of [41] works also in the space-time case \(\mathcal{R }\subset \{z:z\cdot e_1=1\}\), but with \(g\) assumed bounded.

Limit (4.4) has been previously shown for various restricted cases: in [17] (\(d=1\), \(\mathbb{P }\) i.i.d., \(\mathcal{R }=\{-1,1\}\), \(g\) bounded), [46] (\(\mathbb{P }\) i.i.d. , nestling, \(g\in L^d\)), [41] (\(\mathbb{P }\) ergodic, \(0\in \mathcal{U }\), \(g\) bounded), and [4] (\(\mathbb{P }\) ergodic, \(g\) bounded, and \(\mathcal{R }\subset \{z:z\cdot e_1=1\}\)). [4, 17] also proved continuity of the rate function.

Let us finally point out that [1] obtains homogenization results similar to [23] for unbounded potentials, but has to compensate with a mixing assumption. This is the same spirit in which our assumption \(g\in \mathcal{L }\) works.

5 Entropy representation of the point-to-point free energy

With either a compact \(\Omega \) or an i.i.d. directed setting, the LDP of Theorem 4.1 can be obtained by contraction from the higher level LDPs of [34]. This is the route to linking \(\Lambda _\ell (g,\zeta )\) with entropy. First we define the entropy.

The joint evolution of the environment and the walk give a Markov chain \((T_{X_n}\omega , Z_{n+1,n+\ell })\) on the state space \(\varvec{\Omega }_\ell =\Omega \times \mathcal{R }^\ell \). Elements of \(\varvec{\Omega }_\ell \) are denoted by \(\eta =(\omega ,\,z_{1,\ell })\). The transition kernel is

$$\begin{aligned}&\hat{p}_\ell (\eta ,S^+_z\eta )=\tfrac{1}{|\mathcal{R }|} \, \quad \text{ for } z\in \mathcal{R } \text{ and } \eta =(\omega ,z_{1,\ell })\in \varvec{\Omega }_\ell \end{aligned}$$
(5.1)

where the transformations \(S^+_z\) are defined by \(S^+_z(\omega ,z_{1,\ell })=(T_{z_1}\omega , (z_{2,\ell },z))\). An entropy \(H_{\ell }\) that is naturally associated to this Markov chain and reflects the role of the background measure is defined as follows.

Let \(\mu _0\) denote the \(\Omega \)-marginal of a probability measure \(\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell )\). Define

$$\begin{aligned} H_{\ell }(\mu )= {\left\{ \begin{array}{ll} \inf \{H(\mu \times q\,|\,\mu \times \hat{p}_\ell ):q\in \mathcal{Q }(\varvec{\Omega }_\ell ) \text{ with } \mu q=\mu \}&{}\text{ if } \mu _0\ll \mathbb{P },\\ \infty &{}\text{ otherwise. } \end{array}\right. } \end{aligned}$$
(5.2)

The infimum is over Markov kernels \(q\) on \(\varvec{\Omega }_\ell \) that fix \(\mu \). Inside the braces the familiar relative entropy is

$$\begin{aligned} H(\mu \times q\,|\,\mu \times \hat{p}_\ell ) = \int _{\varvec{\Omega }_\ell } \sum _{z\in \mathcal{R }}q(\eta ,S^+_z\eta )\,\log \frac{q(\eta ,S^+_z\eta )}{\hat{p}_\ell (\eta ,S^+_z\eta )}\,\mu (d\eta ). \end{aligned}$$
(5.3)

Obviously \(q(\eta ,S^+_z\eta )\) is not the most general Markov kernel on \(\varvec{\Omega }_\ell \). But the entropy cannot be finite unless the kernel is supported on shifts \(S^+_z\eta \), so we might as well restrict to this case. \(H_{\ell }: \mathcal{M }_1(\varvec{\Omega }_\ell )\rightarrow [0,\infty ]\) is convex. (The argument for this can be found at the end of Section 4 in [32].)

The quenched free energy has this variational characterization for \(g\in \mathcal L \) (Theorem 2.3 in [34]):

$$\begin{aligned} \Lambda _\ell (g)=\sup _{\begin{array}{c} \mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ), c>0 \end{array}} \bigl \{E^\mu [\min (g,c)]-H_{\ell }(\mu )\bigr \}. \end{aligned}$$
(5.4)

Our goal is to find such characterizations for the point-to-point free energy. We develop the formula in the i.i.d. directed setting. Such a formula is also valid in the more general setting of this paper if \(\Omega \) is a compact metric space. Details can be found in the preprint version [33].

Let \(\Omega =\Gamma ^{\mathbb{Z }^d}\) be a product space with shifts \(\{T_x\}\) and \(\mathbb{P }\) an i.i.d. product measure as in Example 1.1. Assume \(0\notin \mathcal{U }\). Then the free energies \(\Lambda _\ell (g)\) and \(\Lambda _\ell (g, \zeta )\) are deterministic (that is, the \(\mathbb{P }\)-a.s. limits are independent of the environment \(\omega \)) and \(\Lambda _\ell (g, \zeta )\) is a continuous, concave function of \(\zeta \in \mathcal{U }\). Assume also that \(\Gamma \) is a separable metric space, and that \(\mathfrak{S }\) is the product of Borel \(\sigma \)-algebras, thereby also the Borel \(\sigma \)-algebra of \(\Omega \).

To utilize convex analysis we put the space \(\mathcal M \) of finite Borel measures on \(\varvec{\Omega }_\ell \) in duality with \(C_b(\varvec{\Omega }_\ell )\), the space of bounded continuous functions on \(\varvec{\Omega }_\ell \), via integration: \(\langle f,\mu \rangle =\int f\,d\mu \). Give \(\mathcal M \) the weak topology generated by \(C_b(\varvec{\Omega }_\ell )\). Metrize \(C_b(\varvec{\Omega }_\ell )\) with the supremum norm. The limit definition (2.3) shows that \(\Lambda _\ell (g)\) and \(\Lambda _\ell (g,\zeta )\) are Lipschitz in \(g\), uniformly in \(\zeta \). \(H_{\ell }\) is extended to \(\mathcal M \) by setting \(H_{\ell }(\mu )=\infty \) for measures \(\mu \) that are not probability measures.

For \(g\in C_b(\varvec{\Omega }_\ell )\) (5.4) says that \(\Lambda _\ell (g)=H_{\ell }^*(g)\), the convex conjugate of \(H_{\ell }\). The double convex conjugate

$$\begin{aligned} H_{\ell }^{**}(\mu ) = \Lambda ^*_\ell (\mu ) = \sup _{f\in C_b(\varvec{\Omega }_\ell )}\{E^\mu [f]- \Lambda _\ell (f)\}, \quad \mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ), \end{aligned}$$
(5.5)

is equal to the lower semicontinuous regularization \(H_{\ell }^\mathrm{{lsc}}\) of \(H_{\ell }\) (Propositions 3.3 and 4.1 in [13] or Theorem 5.18 in [31]). Since relative entropy is lower semicontinuous, (5.2) implies that

$$\begin{aligned} H_{\ell }^{**}(\mu )=H_{\ell }(\mu ) \quad \text{ for } \mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) \text{ such } \text{ that } \mu _0\ll \mathbb{P }\text{. } \end{aligned}$$
(5.6)

There is a quenched LDP for the distributions \(Q_n^{g,\omega }\{R_n^{\ell }\in \cdot \}\), where \(R_n^{\ell }\) is the empirical measure defined in (2.2). The rate function of this LDP is \(H_{\ell }^{**}\) (Theorems 3.1 and 3.3 of [34]).

The reader may be concerned about considering the \(\mathbb{P }\)-a.s. defined functionals \(\Lambda _\ell (g)\) or \(\Lambda _\ell (g, \zeta )\) on the possibly non-separable function space \(C_b(\varvec{\Omega }_\ell )\). However, for bounded functions we can integrate over the limits (2.3) and (2.4) and define the free energies without any “a.s. ambiguity”, so for example

$$\begin{aligned} \Lambda _\ell (g,\zeta )= \lim _{n\rightarrow \infty }n^{-1} \mathbb{E }\Bigl ( \log E\big [e^{n R_n^{\ell }(g)}{\small 1}\!\!1\{X_n={\hat{x}}_n(\zeta )\}\big ]\Bigr ). \end{aligned}$$

We extend the duality set-up to involve point to point free energy.

Theorem 5.1

Let \(\Omega =\Gamma ^{\mathbb{Z }^d}\) be a product of separable metric spaces with Borel \(\sigma \)-algebra \(\mathfrak{S }\), shifts \(\{T_x\}\), and an i.i.d. product measure \(\mathbb{P }\). Assume \(0\notin \mathcal{U }\). With \(\ell \ge 1\), let \(\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell )\) and \(\zeta =E^\mu [Z_1]\). Then

$$\begin{aligned} H_{\ell }^{**}(\mu ) = \sup _{g\in C_b(\varvec{\Omega }_\ell )}\{E^\mu [g]- \Lambda _\ell (g, \zeta )\}. \end{aligned}$$
(5.7)

On the other hand, for \(f\in C_b(\varvec{\Omega }_\ell )\) and \(\zeta \in \mathcal{U }\),

$$\begin{aligned} \Lambda _\ell (f,\zeta )= \sup _{\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]- H_{\ell }^{**}(\mu ) \}. \end{aligned}$$
(5.8)

Equation (5.8) is valid also when \(H_{\ell }^{**}(\mu )\) is replaced with \(H_{\ell }(\mu )\):

$$\begin{aligned} \Lambda _\ell (f,\zeta )= \sup _{\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]- H_{\ell }(\mu ) \}. \end{aligned}$$
(5.9)

Proof

With fixed \(\zeta \), introduce the convex conjugate of \(\Lambda _\ell (g,\zeta )\) by

$$\begin{aligned} \Lambda _\ell ^*(\mu ,\zeta )=\sup _{g\in C_b(\varvec{\Omega }_\ell )}\{E^\mu [g]-\Lambda _\ell (g,\zeta )\}. \end{aligned}$$
(5.10)

Taking \(g(\omega ,z_{1,\ell })=a\cdot z_1\) gives \(\Lambda _\ell ^*(\mu ,\zeta )\ge a\cdot (E^\mu [Z_1]-\zeta )-\log \left| \mathcal{R }_0\right| .\) Thus \(\Lambda _\ell ^*(\mu ,\zeta )=\infty \) unless \(E^\mu [Z_1]=\zeta \).

From Theorems 2.6 and 3.2, \(E^\mu [g]-\Lambda _\ell (g,\zeta )\) is concave in \(g\), convex in \(\zeta \), and continuous in both over \(C_b(\varvec{\Omega }_\ell )\times \mathcal{U }\). Since \(\mathcal{U }\) is compact we can apply a minimax theorem such as König’s theorem [21, 31]. Utilizing (2.5),

$$\begin{aligned} \Lambda _\ell ^*(\mu )&= \sup _{g\in C_b(\varvec{\Omega }_\ell )}\{E^\mu [g]-\Lambda _\ell (g)\}\\&= \sup _{g\in C_b(\varvec{\Omega }_\ell )}\inf _{\zeta \in \mathcal{U }}\{E^\mu [g]-\Lambda _\ell (g,\zeta )\} =\inf _{\zeta \in \mathcal{U }}\Lambda _\ell ^*(\mu ,\zeta ). \end{aligned}$$

Thus, if \(E^\mu [Z_1]=\zeta \), then \(\Lambda _\ell ^*(\mu )=\Lambda _\ell ^*(\mu ,\zeta )\). Since \(H_{\ell }^{**}(\mu )=\Lambda _\ell ^*(\mu )\), (5.7) follows from (5.10).

By double convex duality (Fenchel-Moreau theorem, see e.g. [31]), for \(f\in C_b(\varvec{\Omega }_\ell )\),

$$\begin{aligned} \Lambda _\ell (f,\zeta )=\sup _\mu \{E^\mu [f]-\Lambda _\ell ^*(\mu ,\zeta )\}= \sup _{\mu :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]-\Lambda _\ell ^*(\mu )\} \end{aligned}$$

and (5.8) follows.

To replace \(H_{\ell }^{**}(\mu )\) with \(H_{\ell }(\mu )\) in (5.8), first consider the case \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\).

$$\begin{aligned}&\sup \limits _{\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]- H_{\ell }^{**}(\mu ) \}\\&\quad = \sup \limits _{\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]- H_{\ell }(\mu ) \}^{\text{ usc( }\mu \text{) }}\\&\quad = \Bigl ( \; \, \sup \limits _{\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]- H_{\ell }(\mu ) \}\Bigl )^{\text{ usc( }\zeta \text{) }}\\&\quad = \sup \limits _{\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ) :\,E^\mu [Z_1]=\zeta } \{E^\mu [f]- H_{\ell }(\mu ) \}. \end{aligned}$$

The first equality is the continuity of \(\mu \mapsto E^\mu [f]\). The second is a consequence of the compact sublevel sets of \(\{\mu : H_{\ell }^{**}(\mu )\le c\}\). This compactness follows from the exponential tightness in the LDP controlled by the rate \(H_{\ell }^{**}\), given by Theorem 3.3 in [34]. The last equality follows because concavity gives continuity on \(\mathrm{ri}\,\,\mathcal{U }\).

For \(\zeta \in \mathcal{U }\backslash \mathrm{ri}\,\,\mathcal{U }\), let \(\mathcal{U }_0\) be the unique face such that \(\zeta \in \mathrm{ri}\,\,\mathcal{U }_0\). Then \(\mathcal{U }_0=\mathrm{co}\,\mathcal{R }_0\) where \(\mathcal{R }_0=\mathcal{U }_0\cap \mathcal{R }\), and any path to \({\hat{x}}_n(\zeta )\) will use only \(\mathcal{R }_0\)-steps. This case reduces to the one already proved, because all the quantities in (5.9) are the same as those in a new model where \(\mathcal{R }\) is replaced by \(\mathcal{R }_0\) and then \(\mathcal{U }\) is replaced by \(\mathcal{U }_0\). (Except for the extra terms coming from renormalizing the restricted jump kernel \(\{\hat{p}_z\}_{z\in \mathcal{R }_0}\).) In particular, \(E^\mu [Z_1]=\zeta \) forces \(\mu \) to be supported on \(\Omega \times \mathcal{R }_0^\ell \), and consequently any kernel \(q(\eta , S_z^+\eta )\) that fixes \(\mu \) is supported on shifts by \(z\in \mathcal{R }_0\). \(\square \)

Next we extend the duality to certain \(L^p\) functions.

Corollary 5.2

Same assumptions on \(\Omega \), \(\mathbb{P }\) and \(\mathcal{R }\) as in Theorem 5.1. Let \(\mu \in \mathcal{M }_1(\varvec{\Omega }_\ell )\) and \(\zeta =E^\mu [Z_1]\). Then the inequalities

$$\begin{aligned} E^\mu [g]-\Lambda _\ell (g) \le H_{\ell }^{**}(\mu ) \end{aligned}$$
(5.11)

and

$$\begin{aligned} E^\mu [g]-\Lambda _\ell (g, \zeta ) \le H_{\ell }^{**}(\mu ) \end{aligned}$$
(5.12)

are valid for all functions \(g\) such that \(g(\cdot , z_{1,\ell })\) is local and in \(L^p(\mathbb{P })\) for all \(z_{1,\ell }\) and some \(p>d\), and \(g\) is either bounded above or bounded below.

Proof

Since \(\Lambda _\ell (g, \zeta )\le \Lambda _\ell (g)\), (5.11) is a consequence of (5.12). Let \(\mathcal H \) denote the class of functions \(g\) that satisfy (5.12). \(\mathcal H \) contains bounded continuous local functions by (5.7).

Bounded pointwise convergence implies \(L^p\) convergence. So by the \(L^p\) continuity of \(\Lambda _\ell (g, \zeta )\) [Lemma 3.1(b)], \(\mathcal H \) is closed under bounded pointwise convergence of local functions with common support. General principles now imply that \(\mathcal H \) contains all bounded local Borel functions. To reach the last generalization to functions bounded from only one side, observe that their truncations converge both monotonically and in \(L^p\), thereby making both \(E^\mu [g]\) and \(\Lambda _\ell (g, \zeta )\) converge. \(\square \)

Equation (5.8) gives us a variational representation for \(\Lambda _\ell (g,\zeta )\) but only for bounded continuous \(g\). We come finally to one of our main results, the variational representation for general potentials \(g\).

Theorem 5.3

Let \(\Omega =\Gamma ^{\mathbb{Z }^d}\) be a product of separable metric spaces with Borel \(\sigma \)-algebra \(\mathfrak{S }\), shifts \(\{T_x\}\), and an i.i.d. product measure \(\mathbb{P }\). Assume \(0\notin \mathcal{U }\). Let \(g:\varvec{\Omega }_\ell \rightarrow \mathbb{R }\) be a function such that for each \(z_{1,\ell }\in \mathcal{R }^\ell \), \(g(\cdot ,z_{1,\ell }) \) is a local function of \(\omega \) and a member of \(L^p(\mathbb{P })\) for some \(p>d\). Then for all \(\zeta \in \mathcal{U }\),

$$\begin{aligned} \Lambda _\ell (g,\zeta )=\sup _{\begin{array}{c} \mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ): E^\mu [Z_1]=\zeta \\ c>0 \end{array}} \bigl \{E^\mu [g\wedge c]-H_{\ell }^{**}(\mu )\bigr \}. \end{aligned}$$
(5.13)

Equation (5.13) is valid also when \(H_{\ell }^{**}(\mu )\) is replaced with \(H_{\ell }(\mu )\).

Proof

From (5.12),

$$\begin{aligned} \Lambda _\ell (g,\zeta )\ge \Lambda _\ell (g\wedge c,\zeta )\ge E^\mu [g\wedge c]-H_{\ell }^{**}(\mu ). \end{aligned}$$

Supremum on the right over \(c\) and \(\mu \) gives

$$\begin{aligned} \Lambda _\ell (g,\zeta )\ge \sup _{\begin{array}{c} \mu \in \mathcal{M }_1(\varvec{\Omega }_\ell ): E^\mu [Z_1]=\zeta \\ c>0 \end{array}} \bigl \{E^\mu [\min (g,c)]-H_{\ell }^{**}(\mu )\bigr \}. \end{aligned}$$
(5.14)

For the other direction, let \(c<\infty \) and abbreviate \(g^c=g\wedge c\). Let \(g_m\in C_b(\varvec{\Omega }_\ell )\) be a sequence converging to \(g^c\) in \(L^p(\mathbb{P })\).

Let \(\varepsilon >0\). By (5.8) we can find \(\mu _m\) such that \(E^{\mu _m}[Z_1]=\zeta \), \(H_{\ell }^{**}(\mu _m)<\infty \) and

$$\begin{aligned} \Lambda _\ell (g_m,\zeta )\le \varepsilon +E^{\mu _m}[g_m]-H_{\ell }^{**}(\mu _m). \end{aligned}$$
(5.15)

Take \(\beta >0\) and write

$$\begin{aligned}&\Lambda _\ell (g_m,\zeta )\le \varepsilon +E^{\mu _m}[g^c]-H_{\ell }^{**}(\mu _m)+\beta ^{-1}E^{\mu _m}[\beta (g_m-g^{c})]\\&\quad \le \varepsilon +\sup \bigl \{E^\mu [g^c]-H_{\ell }^{**}(\mu ):c>0,\ E^\mu [Z_1]=\zeta \bigr \}\\&\qquad + \beta ^{-1}\Lambda _\ell \bigl (\beta (g_m-g^{c})\bigr ) +\beta ^{-1}H_{\ell }^{**}(\mu _m) \\&\quad \le \varepsilon + \text{[right-hand } \text{ side } \text{ of } \text{(5.13)] } \\&\qquad + {\mathop {\overline{\text{ lim }}}\limits _{n\rightarrow \infty }}\max \limits _{x_k-x_{k-1}\in \mathcal{R }}\,n^{-1} \sum \limits _{k=0}^{n-1}\left| g_m(T_{x_k}\omega , z_{1,\ell })-g^{c}(T_{x_k\omega }, z_{1,\ell })\right| +\beta ^{-1}H_{\ell }^{**}(\mu _m) \\&\quad \le \varepsilon + \text{[right-hand } \text{ side } \text{ of } \text{(5.13)] } \\&\qquad +C\mathbb{E }\bigl [\; \max \limits _{z_{1,\ell }\in \mathcal{R }^\ell }\left| g_m-g^{c} \right| ^p\,\bigr ] +\beta ^{-1}H_{\ell }^{**}(\mu _m). \end{aligned}$$

The second inequality above used (5.11), and the last inequality used (3.1) and Chebyshev’s inequality. Take first \(\beta \rightarrow \infty \), then \(m\rightarrow \infty \), and last \(c\nearrow \infty \) and \(\varepsilon \searrow 0\). Combined with (5.14), we have arrived at (5.13).

Dropping \(^{**}\) requires no extra work. Since \(H_{\ell }\ge H_{\ell }^{**}\), (5.14) comes for free. For the complementary inequality simply replace \(H_{\ell }^{**}(\mu _m)\) with \(H_{\ell }(\mu _m)\) in (5.15), as justified by the last line of Theorem 5.1. \(\square \)

6 Example: directed polymer in the \(L^2\) regime

We illustrate the variational formula of the previous section with a directed polymer in the \(L^2\) regime. The maximizing processes are basically the Markov chains constructed by Comets and Yoshida [8] and Yilmaz [42]. We restrict to \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\). The closer \(\zeta \) is to the relative boundary, the smaller we need to take the inverse temperature \(\beta \).

The setting is that of Example 1.2 with some simplifications. \(\Omega =\mathbb{R }^{\mathbb{Z }^{d+1}}\) is a product space indexed by the space-time lattice where \(d\) is the spatial dimension and the last coordinate direction is reserved for time. The environment is \(\omega =(\omega _x)_{x\in \mathbb{Z }^{d+1}}\) and translations are \((T_x\omega )_y=\omega _{x+y}\). The coordinates \(\omega _x\) are i.i.d. under \(\mathbb{P }\). The set of admissible steps is of the form \(\mathcal{R }=\{(z^{\prime }, 1): z^{\prime }\in \mathcal{R }^{\prime }\}\) for a finite set \(\mathcal{R }^{\prime }\subset \mathbb{Z }^d\).

To be in the weak disorder regime we assume that the difference of two \(\mathcal{R }\)-walks is at least \(3\)-dimensional. Precisely speaking, the additive subgroup of \(\mathbb{Z }^{d+1}\) generated by \(\mathcal{R }-\mathcal{R }=\{x-y: x,y\in \mathcal{R }\}\) is linearly isomorphic to some \(\mathbb{Z }^m\), and we

$$\begin{aligned} \text{ assume } \text{ that } \text{ the } \text{ dimension } m\ge 3\text{. } \end{aligned}$$
(6.1)

For example, \(d\ge 3\) and \(\mathcal{R }^{\prime }=\{\pm e_i: 1\le i\le d\}\) given by simple random walk qualifies.

The \(P\)-random walk has a kernel \((p_z)_{z\in \mathcal{R }}\). Earlier we assumed \(p_z=\left| \mathcal{R }\right| ^{-1}\), but this is not necessary for the results, any fixed kernel will do. We do assume \(p_z>0\) for each \(z\in \mathcal{R }\).

The potential is \(\beta g(\omega _0,z)\) where \(\beta \in (0,\infty )\) is the inverse temperature parameter. Assume

$$\begin{aligned} \mathbb{E }[e^{c\left| g(\omega ,z)\right| }] <\infty \quad \text{ for } \text{ some } c>0 \text{ and } \text{ all } z\in \mathcal{R }\text{. } \end{aligned}$$
(6.2)

Now \(\Lambda _1(\beta g,\cdot \,)\) is well-defined and continuous on \(\mathcal{U }\).

Define an averaged logarithmic moment generating function

$$\begin{aligned} \lambda (\beta ,\theta )=\log \sum _{z\in \mathcal{R }} p_z\,\mathbb{E }[e^{\beta g(\omega _0,z)+\theta \cdot z}] \quad \text{ for } \beta \in [-c,c] \text{ and } \theta \in \mathbb{R }^{d+1}\text{. } \end{aligned}$$

Under a fixed \(\beta \), define the convex dual in the \(\theta \)-variable by

$$\begin{aligned} \lambda ^*(\beta ,\zeta )=\sup _{\theta \in \mathbb{R }^{d+1}}\{\zeta \cdot \theta -\lambda (\beta ,\theta )\} , \qquad \zeta \in \mathcal{U }. \end{aligned}$$
(6.3)

For each \(\beta \in [-c,c]\) and \(\zeta \in \mathrm{ri}\,\,\mathcal{U }\) there exists \(\theta \in \mathbb{R }^{d+1}\) such that \(\nabla _\theta \lambda (\beta ,\theta )=\zeta \) and this \(\theta \) maximizes in (6.3). A point \(\eta \in \mathbb{R }^{d+1}\) also maximizes if and only if

$$\begin{aligned} (\theta -\eta )\cdot z \text{ is } \text{ constant } \text{ over } z\in \mathcal{R }. \end{aligned}$$
(6.4)

Maximizers cannot be unique now because the last coordinate \(\theta _{d+1}\) can vary freely without altering the expression in braces in (6.3). The spatial part \(\theta ^{\prime }=(\theta _1,\cdots ,\theta _d)\) of a maximizer is unique if and only if \(\mathcal{U }\) has nonempty \(d\)-dimensional interior.

Extend the random walk distribution \(P\) to a two-sided walk \((X_k)_{k\in \mathbb{Z }}\) that satisfies \(X_0=0\) and \(Z_i=X_i-X_{i-1}\) for all \(i\in \mathbb{Z }\), where the steps \((Z_i)_{i\in \mathbb{Z }}\) are i.i.d. \((p_z)\)-distributed. For \(n\in \mathbb{N }\) define forward and backward partition functions

$$\begin{aligned} Z_n^+=E\bigl [ e^{\beta \sum _{k=0}^{n-1}g(\omega _{X_k},Z_{k+1})+\theta \cdot X_n}] \ \quad \text{ and }\ \quad Z_n^-=E\bigl [ e^{\beta \sum _{k=-n}^{-1}g(\omega _{X_k},Z_{k+1})-\theta \cdot X_{-n}}] \end{aligned}$$

and martingales \(W_n^\pm =e^{-n\lambda (\beta ,\theta )} Z_n^\pm \) with \( \mathbb{E }W_n^\pm = 1 .\)

Suppose we have the \(L^1\) convergence \( W_n^\pm \rightarrow W_\infty ^\pm \) for some \((\beta ,\theta )\). Then \(\mathbb{E }W_\infty ^\pm = 1\), and by Kolmogorov’s 0-1 law \(\mathbb{P }(W_\infty ^\pm >0)=1\). Define a probability measure \(\mu ^\theta _0\) on \(\Omega \) by

$$\begin{aligned} \int _\Omega f(\omega )\,\mu ^\theta _0(d\omega ) = \mathbb{E }[ W_\infty ^-W_\infty ^+ f]. \end{aligned}$$

Define a stochastic kernel from \(\Omega \) to \(\mathcal{R }\) by

$$\begin{aligned} q^\theta _0(\omega , z)= p_z e^{\beta g(\omega _0,z)-\lambda (\beta ,\theta )+\theta \cdot z} \frac{W_\infty ^+(T_z\omega )}{W_\infty ^+(\omega )}. \end{aligned}$$

Property \(\sum _{z\in \mathcal{R }} q^\theta _0(\omega , z)=1\) comes from (one of) the identities

$$\begin{aligned} W_\infty ^\pm =\sum _{z\in \mathcal{R }} p_z e^{\beta g(\omega _{a^{(\pm )}},z)-\lambda (\beta ,\theta )+\theta \cdot z} W_\infty ^\pm \circ T_{\pm z} \quad \mathbb{P }\text{-a.s. } \end{aligned}$$
(6.5)

where \(a^{(+)}=0\) and \(a^{(-)}=-z\). These are inherited from the one-step Markov decomposition of \(Z_n^\pm \). For \(\ell \ge 0\), on \(\varvec{\Omega }_\ell \) define the probability measure \(\mu ^\theta \) by

$$\begin{aligned} \mu ^\theta (d\omega , z_{1,\ell })= \mu ^\theta _0(d\omega ) q(\omega , z_1)q(T_{x_{1}}\omega , z_2)\cdots q(T_{x_{\ell -1}}\omega , z_\ell ) \end{aligned}$$
(6.6)

where \(x_j=z_1+\cdots +z_j\), and the stochastic kernel

$$\begin{aligned} q^\theta ((\omega , z_{1,\ell }), (T_{z_1}\omega , z_{2,\ell }z))= q^\theta _0(T_{x_{\ell }}\omega , z) . \end{aligned}$$
(6.7)

We think of \(\beta \) fixed and \(\theta \) varying and so include only \(\theta \) in the notation of \(\mu ^\theta \) and \(q^\theta \). Identities (6.5) can be used to show that \(\mu ^\theta \) is invariant under the kernel \(q^\theta \), or explicitly, for any bounded measurable test function \(f\),

$$\begin{aligned} \sum _{z_{1,\ell }, z} \int _\Omega \mu ^\theta (d\omega , z_{1,\ell }) q^\theta _0(T_{x_{\ell }}\omega , z) f(T_{z_1}\omega , z_{2,\ell }z) \;=\; \int _{\varvec{\Omega }_\ell } f\,d\mu ^\theta . \end{aligned}$$
(6.8)

By Lemma 4.1 of [32] the Markov chain with transition \(q^\theta \) started with \(\mu ^\theta \) is an ergodic process. Let us call in general \((\mu ,q)\) a measure-kernel pair if \(q\) is a Markov kernel and \(\mu \) is an invariant probability measure: \(\mu q=\mu \).

Theorem 6.1

Fix a compact subset \(\mathcal{U }_1\) in the relative interior of \(\mathcal{U }\). Then there exists \(\beta _0>0\) such that, for \(\beta \in (0,\beta _0]\) and \(\zeta \in \mathcal{U }_1\), we can choose \(\theta \in \mathbb{R }^{d+1}\) such that the following holds. First \(\nabla _\theta \lambda (\beta ,\theta )=\zeta \) and \(\theta \) is a maximizer in (6.3). The martingales \(W_n^\pm \) are uniformly integrable and the pair \((\mu ^\theta , q^\theta )\) is well-defined by (6.6)–(6.7). We have

$$\begin{aligned} \Lambda _1(\beta g, \zeta )=-\lambda ^*(\beta ,\zeta ). \end{aligned}$$
(6.9)

A measure-kernel pair \((\mu ,q)\) on \(\varvec{\Omega }_1\) such that \(\mu _0\ll \mathbb{P }\) satisfies

$$\begin{aligned} \Lambda _1(\beta g, \zeta )= E^\mu [\beta g] - H(\mu \times q\vert \mu \times \hat{p}_1) \end{aligned}$$
(6.10)

if and only if \((\mu ,q)=(\mu ^\theta , q^\theta )\).

Remark 6.2

Note that even though \(\nabla _\theta \lambda (\beta ,\theta )=\zeta \) does not pick a unique \(\theta \), by (6.4) replacing \(\theta \) by another maximizer does not change the martingales \(W_n^\pm \) or the pair \((\mu ^\theta , q^\theta )\). Thus \(\zeta \) determines \((\mu ^\theta , q^\theta )\) uniquely.

We omit the proof of Theorem 6.1. Details appear in the preprint [33].