Abstract
This paper addresses the issue of estimating the expectation of a real-valued random variable of the form \(X = g(\mathbf {U})\) where g is a deterministic function and \(\mathbf {U}\) can be a random finite- or infinite-dimensional vector. Using recent results on rare event simulation, we propose a unified framework for dealing with both probability and mean estimation for such random variables, i.e. linking algorithms such as Tootsie Pop Algorithm or Last Particle Algorithm with nested sampling. Especially, it extends nested sampling as follows: first the random variable X does not need to be bounded any more: it gives the principle of an ideal estimator with an infinite number of terms that is unbiased and always better than a classical Monte Carlo estimator—in particular it has a finite variance as soon as there exists \(k \in \mathbb {R}> 1\) such that \({\text {E}}\left[ X^k \right] < \infty \). Moreover we address the issue of nested sampling termination and show that a random truncation of the sum can preserve unbiasedness while increasing the variance only by a factor up to 2 compared to the ideal case. We also build an unbiased estimator with fixed computational budget which supports a Central Limit Theorem and discuss parallel implementation of nested sampling, which can dramatically reduce its running time. Finally we extensively study the case where X is heavy-tailed.
Similar content being viewed by others
References
Au, S.K., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset simulation. Probab. Eng. Mech. 16(4), 263–277 (2001)
Beirlant, J., Caeiro, F., Gomes, M.I.: An overview and open research topics in statistics of univariate extremes. REVSTAT—Stat. J. 10(1), 1–31 (2012)
Bernardo, J.M., Bayarri, M., Berger, J.O., Dawid, A.P., Heckerman, D.: Bayesian Statistics, vol. 9. Oxford University Press, Oxford (2011)
Botev, Z.I., Kroese, D.P.: Efficient Monte Carlo simulation via the generalized splitting method. Stat. Comput. 22(1), 1–16 (2012)
Brewer, B.J., Pártay, L.B., Csányi, G.: Diffusive nested sampling. Stat. Comput. 21(4), 649–656 (2011)
Cérou, F., Guyader, A.: Adaptive multilevel splitting for rare event analysis. Stoch. Anal. Appl. 25(2), 417–443 (2007)
Cérou, F., Del Moral, P., Furon, T., Guyader, A., et al.: Rare event simulation for a static distribution. In: Proceedings of RESIM 2008. http://www.irisa.fr/aspi/fcerou/Resim_Cerou_et_al.pdf (2009)
Cérou, F., Del Moral, P., Furon, T., Guyader, A.: Sequential Monte Carlo for rare event estimation. Stat. Comput. 22(3), 795–808 (2012)
Chopin, N., Robert, C.P.: Properties of nested sampling. Biometrika 97, 741 (2010)
Corless, R.M., Gonnet, G.H., Hare, D.E., Jeffrey, D.J., Knuth, D.E.: On the Lambert W function. Adv. Comput. math. 5(1), 329–359 (1996)
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling Extremal Events: For Insurance and Finance, vol. 33. Springer, New York (1997)
Evans, M.: Discussion of nested sampling for Bayesian computations by John Skilling. Bayesian Stat. 8, 491–524 (2007)
Garvels, M.J.J.: The Splitting Method in Rare Event Simulation. Universiteit Twente, Enschede (2000)
Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
Glynn, P.W., Iglehart, D.L.: Importance sampling for stochastic simulations. Manag. Sci. 35(11), 1367–1392 (1989)
Glynn, P.W., Whitt, W.: The asymptotic efficiency of simulation estimators. Oper. Res. 40(3), 505–520 (1992)
Guyader, A., Hengartner, N., Matzner-Løber, E.: Simulation and estimation of extreme quantiles and extreme probabilities. Appl. Math. Optim. 64(2), 171–196 (2011)
Hill, J.B.: Robust estimation for average treatment effects. Available at SSRN 2260573 (2013). doi:10.2139/ssrn.2260573
Huber, M., Schott, S., et al.: Using TPA for Bayesian inference. Bayesian Stat. 9(9), 257 (2011)
Huber, M., Schott, S., et al.: Random construction of interpolating sets for high-dimensional integration. J. Appl. Probab. 51(1), 92–105 (2014)
Johansson, J.: Estimating the mean of heavy-tailed distributions. Extremes 6(2), 91–109 (2003)
Kahn, H., Harris, T.E.: Estimation of particle transmission by random sampling. Natl. Bur. Stand. Appl. Math. Ser. 12, 27–30 (1951)
Keeton, C.R.: On statistical uncertainty in nested sampling. Mon. Not. R. Astron. Soc. 414(2), 1418–1426 (2011)
Martiniani, S., Stevenson, J.D., Wales, D.J., Frenkel, D.: Superposition enhanced nested sampling. Phys. Rev. X 4(3), 031,034 (2014)
McLeish, D.: A general method for debiasing a Monte Carlo estimator. Monte Carlo Methods Appl. (2011)
Mukherjee, P., Parkinson, D., Liddle, A.R.: A nested sampling algorithm for cosmological model selection. Astrophys. J. Lett. 638(2), L51 (2006)
Necir, A., Rassoul, A., Zitikis, R.: Estimating the conditional tail expectation in the case of heavy-tailed losses. J. Probab. Stat. 2010 (2010). doi:10.1155/2010/596839
Peng, L.: Estimating the mean of a heavy tailed distribution. Stat. Probab. Lett. 52(3), 255–264 (2001)
Propp, J.G., Wilson, D.B.: Exact sampling with coupled markov chains and applications to statistical mechanics. Random Struct. Algorithms 9(1–2), 223–252 (1996)
Rhee, C.H., Glynn, P.W.: Unbiased estimation with square root convergence for sde models. Oper. Res. 63(5), 1026–1043 (2015). doi:10.1287/opre.2015.1404
Roberts, G.: Comments on using TPA for Bayesian inference, by Huber, M. and Schott, S. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, vol. 9, pp. 257–282. Oxford University Press, Oxford (2011)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2004)
Simonnet, E.: Combinatorial analysis of the adaptive last particle method. Stat. Comput., pp. 1–20 (2014)
Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Anal. 1(4), 833–859 (2006)
Vergé, C., Dubarry, C., Del Moral, P., Moulines, E.: On parallel implementation of sequential Monte Carlo methods: the island particle model. Stat. Comput., pp 1–18 (2013)
Walter, C.: Moving particles: a parallel optimal multilevel splitting method with application in quantiles estimation and meta-model based algorithms. Struct. Saf. 55, 10–25 (2015)
Acknowledgments
The author would like to thank his advisors Josselin Garnier (University Paris Diderot) and Gilles Defaux (Commissariat à l’Energie Atomique et aux Energies Alternatives) for their advices and suggestions as well as the reviewers for their very relevant comments which helped improving the manuscript. This work was partially supported by ANR project Chorus.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Proposition 2
one has:
For the variance, one uses the fact that, for \(x>x'\), \(M_x - M_{x'}\) and \(M_{x'}\) are independent to expand \({\text {E}}\left[ \widehat{m}^2 \right] \):
Furthermore renewal property of a Poisson process gives \(M_x - M_{x'} \sim \mathscr {P}(-\log (p_x/p_{x'}))\). Eventually one can conclude using the results of Proposition 1. \(\square \)
Proof of Proposition 3
Starting from the expression of the variance found in Proposition 2:
we make use of Hölder’s inequality:
And therefore:
Using Hölder’s inequality again, one gets:
\(\square \)
Proof of Proposition 4
On the one hand one has:
and on the other hand one can write:
Considering \(f :p \mapsto p \left[ N (p^{-1/N}-1)+1 \right] \), we have \(f(1)=1\) and:
Thus: \(\forall p \in [0,1], f(p) \le 1\). Therefore
which shows that \({\text {var}}\left[ \widehat{m} \right] \le {\text {var}}\left[ \widehat{m}_{MC} \right] \). \(\square \)
Proof of Proposition 5
Starting with the last formulation in (10) for \(\widehat{Z}\), one uses the fact that T and \((X_i)_i\) are independent. Finally, (4) and Proposition 2 let conclude: \({\text {E}}\left[ \widehat{Z} \right] = m\).
For the second-order moment, we use the fact that \(\widehat{Z}\), like \(\widehat{m}\), can be written with an integral:
and apply the same reasoning as for \({\text {E}}\left[ \widehat{m}^2 \right] \): given \(x > x'\), the random variables \(M_x - M_{x'}\), \(M_{x'}\) and T are independent, which brings:
Then using this equality in \({\text {E}}\left[ \widehat{Z}^2 \right] \) gives the solution. \(\square \)
Proof of Lemma 1
Let \(\varepsilon > 0\) be such that \({\text {E}}\left[ X^{1+\varepsilon } \right] < \infty \), \(N \in \mathbb {N}\mid N > 1/\varepsilon \) and \(i \ge 0\). We further extend the definition of \({\text {var}}\left[ \widehat{m} \right] \) given in Proposition 1, Eq. (8) for any \(N \in \mathbb {R}\). Proof of Proposition 3 is based on Hölder’s inequality and still holds in this case, and so for Corollary 1. Hence, according to Corollary 1: \( \exists N' \in \mathbb {R}\) such that \(N' < N\) and \({\text {var}}\left[ \widehat{m} \right] (N') < \infty \). Furthermore, given x and \(x'\) one can write:
Moreover the function \(p : (0, 1) \mapsto p^{N + 1/N' - 2} (-\log p)^i\) is bounded above by \(e^{-i} i^i (N + 1/N' - 2)^{-i}\). Using the Stirling lower bound \(i \ge i^i e^{-i} \sqrt{2 \pi i}\) we can write:
Finally, this inequality brings:
and \((N + 1/N - 2)/(N + 1/N' -2)< 1\), which concludes the first part of the proof.
Let us now assume that X has a density \(f_X\). One has:
Denote \(x_L\) the left end point of X (remember that X is non-negative valued so \(x_L \ge 0\)). Then:
We then consider the change of variable \(u = -\log p_x\) and \(u' = -\log p_{x'}\); for all \(i \ge 1\) one has:
\(\square \)
Proof of Proposition 6
Let \(\alpha > 0\) be such that \((1-1/N) = e^{-\alpha }\). The argument is the same as the one in Proposition 5. One has:
with:
\(\square \)
Proof of Corollary 2
Noticing that for any \(N \ge 2\), one has \(\gamma (\beta _\text {app}(N), N) = (N+1)/2\) gives the first equality. Then one concludes recalling that \({\text {var}}\left[ \widehat{m} \right] \) typically scales with 1 / N (usual results on nested sampling). \(\square \)
Proof of Proposition 7
If \(T = 0\) then no simulation is done other than the first element of each Markov chain, \(\textit{i.e. }\) N simulations are done. Then each step requires the simulation of the next stopping time, \(\textit{i.e. }\) one simulation. Finally, this brings \(\tau = N + T\). \(\square \)
Proof of Corollary 3
Note that \({\text {var}}\left[ \widehat{m} \right] = \sum \limits _{i=0}^{\infty } q_{i,N} - m^2\). Hence, one has \({\text {var}}\left[ \widehat{Z} \right] > {\text {var}}\left[ \widehat{m} \right] \) because \({\text {var}}\left[ \widehat{Z} \right] = {\text {var}}\left[ \widehat{m} \right] \Leftrightarrow \forall i \in \mathbb {N},\; \beta _i = {\text {P}}\left[ T \ge i \right] = 1\) and \({\text {E}}\left[ \tau \right] > N\) because \({\text {E}}\left[ \tau \right] = N \Leftrightarrow {\text {E}}\left[ T \right] = 0\) while \(\forall i \in \mathbb {N},\; {\text {P}}\left[ T \ge i \right] > 0\). Furthermore, the power series expansion of the exponential function and the dominated convergence theorem let us rewrite \({\text {var}}\left[ \widehat{m} \right] \):
which brings: \({\text {var}}\left[ \widehat{m} \right] = q_{1,2} \cdot 2/N + O \left( 1/N^2 \right) \). All together, these inequalities complete the proof. \(\square \)
Proof of Corollary 4
Denote \(B = 1/(e^\beta - 1)\); one has:
With \(\beta = \varTheta (1/N^{1+\varepsilon })\), \(\varepsilon \ge 0\), one has \(B \sim 1/\beta \sim N^{1+\varepsilon }\). Finally, this gives:
which concludes the proof. \(\square \)
Proof of Proposition 8
First one shows that \(i_0\) is well determined. The sequence \((\varDelta _i)_i\) defined by:
is increasing:
Furthermore \(q_0 - m^2 = 2 \displaystyle \int _{0}^{\infty } \displaystyle \int _{x'}^{\infty } p_x p_{x'} \left( p_{x'}^{N-2} - 1 \right) \mathrm {d}x\mathrm {d}x' \le 0 < N q_{1,N}\), so \(\varDelta _0 < 0\), and \(\varDelta _i \rightarrow {\text {var}}\left[ \widehat{m} \right] \) when \(i \rightarrow \infty \) because \((q_{i,N})_i\) decreases at exponential rate. So there exists \(i_0 \in \mathbb {N}\mid \varDelta _{i_0 - 1} \le 0 \text { and } \varDelta _{i_0} > 0\).
Let us now consider the auxiliary problem:
with \(\beta > 0\) and \(q \in \mathbb {R}\). We show that it has a solution if and only if \(q > 0\). Let \(i \ge 1\), cancelling the partial derivatives brings:
Then the solution should be of the form: \(\forall i \in \llbracket 1, \infty ) \,, \beta _i = c_0 \sqrt{q_i}\) for some \(c_0 > 0\). Solving now the problem with \(c_0\), the derivative writes \(q - \beta /c_0^2\). If \(q \le 0\) then it is strictly decreasing and there is no global minimiser. On the contrary, \(q > 0\) brings \(c_0 = \sqrt{\beta /q}\) and \(\forall i \ge 1 \,, \beta _i = c_0 \sqrt{q_i}\).
Thus, in our context with the constraint \(\forall i \in \mathbb {N}\,, \beta _i \le 1\), this means that solving the optimisation problem will set iteratively \(\beta _i = 1\) until the minimiser is feasible, \(\textit{i.e. }\) until \(i_0 \mathop {=}\limits ^{\text {def}} \min \{i \in \mathbb {N}\mid \sum \limits _{j=0}^{i} q_{j,N} - m^2 > (N+i) q_{(i+1),N} \}\). Then the solution will be given by:
\(\square \)
Proof of Corollary 5
By definition of \(i_0\), one has:
which concludes the proof. \(\square \)
Proof of Proposition 9
Denote:
the quantity one seeks to minimise.
First, we show that for any fixed N, there exists a global minimiser of \(Q_N(\beta )\). One has \(Q_N(\beta ) \rightarrow \infty \) when \(\beta \rightarrow 0\) and \(\gamma (\beta ,N) \rightarrow 0\) when \(\beta \rightarrow \infty \). Hence, either \(\exists \beta _\infty \in (0, \infty ]\) such that:
Then \(Q_N\) is continuous on \((0, \beta _\infty )\) with infinite limits on 0 and \(\beta _\infty \), so it reaches its minimum on \((0, \beta _\infty )\); or \(\exists \beta _\infty \in (0, \infty )\) such that:
Since \(Q_N\) is continuous on \(\beta _\infty ^-\) by Monotone Convergence Theorem, \(Q_N\) reaches its minimum on \((0, \beta _\infty ]\).
Let \(\beta _\text {opt}(N) > 0\) be such that \(\inf _\beta Q_N(\beta ) = Q_N(\beta _\text {opt})\). We now show that there exists an optimal N. It is sufficient to show \(Q_N(\beta _\text {opt}) \rightarrow \infty \) when \(N \rightarrow \infty \). Denote \(B = 1/(e^\beta - 1)\); one has:
Hence, depending on the growth rate of B when \(N \rightarrow \infty \), one would have:
Then in any cases \(Q_N(\beta _\text {opt}) \rightarrow \infty \) when \(N \rightarrow \infty \), which means that there exists \(N_\text {opt} \in \mathbb {N}\mid Q_{N_\text {opt}}(\beta _\text {opt}) = \inf _N Q_N(\beta _\text {opt})\).
We now show the relationship between \(\beta _\text {opt}\) and \(N_\text {opt}\): the partial derivatives of \({\text {E}}\left[ \tau \right] \cdot {\text {var}}\left[ \widehat{Z}\, \right] \) against B and N write:
At point \((\beta _\text {opt}, N_\text {opt})\), both equations are cancelled, which gives:
Recalling \(\gamma (B,N) = N B /(B + (N-1)^2)\), this gives the equation: \(B_\text {opt}^2 - (N_\text {opt}^2-1)B_\text {opt} - N_\text {opt}(N_\text {opt}-1)^2 = 0\). One can solve it in \(B_\text {opt}\) and keep the positive root, which gives the solution. \(\square \)
Proof of Proposition 10
For the first equality:
for the second one:
and for the third one:
with \(B = b/(a-1)\). \(\square \)
Proof of Proposition 11
Let \(i \ge 0\), one has:
with \(\varGamma \) standing here for the Gamma function. Furthermore:
\((q_{i,N})_i\) is decreasing iff:
which is indeed the condition for the finiteness of \({\text {var}}\left[ \widehat{m} \right] \) already stated in Proposition 10. \(\square \)
Proof of Proposition 12
The problem can be rewritten:
Furthermore one has:
which brings that the left hand term is equal to \((m / 2)^2 + o(1)\). Writing \(i = N(k_0 + k_1 \log N + k_2 \log \log N)\) brings:
Hence one has to choose \(k_0\), \(k_1\) and \(k_2\) such that the right hand term also equals \((m/2)^2 + o(1)\), which gives the solution. \(\square \)
Proof of Corollary 6
Using the asymptotic expansion of \(i_0\) one finds \(q_{i_0} \sim (N^2 \log N )^{-1} (m - 1)^2\). Furthermore, one has \({\text {E}}\left[ \tau \right] \sim i_0\). Finally, the use of \({\text {E}}\left[ \tau \right] \cdot {\text {var}}\left[ \widehat{Z} \right] \sim q_{i_0} {\text {E}}\left[ \tau \right] ^2\) gives the result. \(\square \)
Proof of Proposition 13
One gets the expression of the variance directly from Sect. 2.2 with \(\gamma (N,\beta )\) instead of N. Then, denoting \(B = 1/(e^\beta - 1)\), one solves the problem:
\(\square \)
Rights and permissions
About this article
Cite this article
Walter, C. Point process-based Monte Carlo estimation. Stat Comput 27, 219–236 (2017). https://doi.org/10.1007/s11222-015-9617-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9617-y