A PAC algorithm in relative precision for bandit problem with costly sampling

Friess, Marie Billaud; Macherey, Arthur; Nouy, Anthony; Prieur, Clémentine

doi:10.1007/s00186-022-00769-x

A PAC algorithm in relative precision for bandit problem with costly sampling

Original Article
Published: 15 June 2022

Volume 96, pages 161–185, (2022)
Cite this article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Marie Billaud Friess¹,
Arthur Macherey ORCID: orcid.org/0000-0002-4790-5953^1,2,
Anthony Nouy¹ &
…
Clémentine Prieur²

214 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

This paper considers the problem of maximizing an expectation function over a finite set, or finite-arm bandit problem. We first propose a naive stochastic bandit algorithm for obtaining a probably approximately correct (PAC) solution to this discrete optimization problem in relative precision, that is a solution which solves the optimization problem up to a relative error smaller than a prescribed tolerance, with high probability. We also propose an adaptive stochastic bandit algorithm which provides a PAC-solution with the same guarantees. The adaptive algorithm outperforms the mean complexity of the naive algorithm in terms of number of generated samples and is particularly well suited for applications with high sampling cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

Eugenio Melilli & Piero Veronese

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Peyman Mohajerin Esfahani & Daniel Kuhn

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

Neville Kenneth Kitson, Anthony C. Constantinou, … Kiattikun Chobtham

Notes

A random variable Z is said to be sub-Gaussian of parameter $\gamma > 0$ if, for all $s \ge 0$ we have $\mathbb {E}[\exp (s(Z - \mathbb {E}[Z])] \le \exp (s^2 \gamma / 2). $

References

Audibert JY, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits. In: Annual conference on learning theory (COLT)
Audibert JY, Bubeck S, Munos R (2011) Bandit view on noisy optimization. Optim Mach Learn 431
Audibert JY, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoret Comput Sci 410(19):1876–1902
Article MathSciNet Google Scholar
Beyer HG, Sendhoff B (2007) Robust optimization—a comprehensive survey. Comput Methods Appl Mech Eng 196(33–34):3190–3218
Article MathSciNet Google Scholar
Bubeck S, Munos R, Stoltz G (2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoret Comput Sci 412(19):1832–1852
Article MathSciNet Google Scholar
Dupa V, Herkenrath U (1982) Stochastic approximation on a discrete set and the multi-armed. Seq Anal 1(1):1–25
MathSciNet Google Scholar
Even-Dar E, Mannor S, Mansour Y (2002) Pac bounds for multi-armed bandit and Markov decision processes. In: International conference on computational learning theory. Springer, pp 255–270 (2002)
Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. In: Proceedings of the 24th annual conference on learning theory, pp 359–376
Gong WB, Ho YC, Zhai W (2000) Stochastic comparison algorithm for discrete optimization with estimation. SIAM J Optim 1(2):384–404
Article MathSciNet Google Scholar
Kalyanakrishnan S, Tewari A, Auer P, Stone P (2012) Pac subset selection in stochastic multi-armed bandits. In: ICML, vol 12, pp 655–662
Kano H, Honda J, Sakamaki K, Matsuura K, Nakamura A, Sugiyama M (2019) Good arm identification via bandit feedback. Mach Learn 108(5):721–745
Article MathSciNet Google Scholar
Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J Mach Learn Res 17(1):1–42
MathSciNet MATH Google Scholar
Kaufmann E, Kalyanakrishnan S (2013) Information complexity in bandit subset selection. In: Conference on learning theory. PMLR, pp 228–251
Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028
Lattimore T, Szepesvári C (2020) Bandit algorithms. Cambridge University Press, Cambridge
Book Google Scholar
Locatelli A, Gutzeit M, Carpentier A (2016) An optimal algorithm for the thresholding bandit problem. International Conference on Machine Learning, 1690–1698
Mnih V (2008) Efficient stopping rules. Ph.D. thesis, University of Alberta
Mnih V, Szepesvári C, Audibert JY (2008) Empirical Bernstein stopping. In: Proceedings of the 25th international conference on Machine learning, pp 672–679
Mukherjee S, Naveen KP, Sudarsanam N, Ravindran B (2017) Thresholding bandits with augmented UCB. International Joint Conference on Artificial Intelligence.
Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609
Article MathSciNet Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
MATH Google Scholar
Tao C, Blanco S, Peng J, Zhou Y (2019)Thresholding bandit with optimal aggregate regret. In: Advances in neural information processing systems, pp 11664–11673
Yan D, Mukai H (1992) Stochastic discrete optimization. SIAM J Control Optim 30(3):594–612
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Nantes Université Centrale Nantes, LMJL, UMR CNRS 6629, 1 rue de la Noë, 44321, Nantes, France
Marie Billaud Friess, Arthur Macherey & Anthony Nouy
Inria, CNRS, Grenoble INP* (*Institute of Engineering Université Grenoble Alpes), LJK, Université Grenoble Alpes, 38000, Grenoble, France
Arthur Macherey & Clémentine Prieur

Authors

Marie Billaud Friess
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Macherey
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Nouy
View author publications
You can also search for this author in PubMed Google Scholar
Clémentine Prieur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Nouy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was partially supported by project MATHAMSUD FANTASTIC 20-MATH-05.

Appendices

A. Intermediate results

Here we provide intermediate results used thereafter for the proof of Proposition 2.3 in Sect. 1. We first recall a version of Bennett’s inequality from (Audibert et al. 2009)[Lemma 5].

Lemma A.1

Let U be a random variable defined on $(\Omega , \mathcal {F}, \mathbb {P})$ such that $U \le b$ almost surely, with $b \in \mathbb {R}$. Let $U_1, \ldots , U_m$ be i.i.d. copies of U and $\overline{U}_{\ell } = \frac{1}{\ell }\sum _{i=1}^{\ell } U_i$. For any $x>0$, it holds, with probability at least $1 - \exp (-x)$, simultaneously for all $1 \le \ell \le m$

$$\begin{aligned} \ell \left( \overline{U}_\ell - \mathbb {E} \left[ U \right] \right) \le \sqrt{2m \mathbb {E} \left[ U^2 \right] x} + b_+ x / 3, \end{aligned}$$

(38)

with $b_+ = \max (0, b)$.

Now, the following result provides a bound with high probability for the estimated variance of an i.i.d. sequence of bounded random variables.

Lemma A.2

Let X be a bounded random variable defined on $(\Omega , \mathcal {F}, \mathbb {P})$, such that $a \le X \le b$ almost surely, with $a<b$ two real numbers. Let $X_1, \ldots , X_m$ be i.i.d. copies of X and $\overline{V}_m = \dfrac{1}{m} \sum _{i=1}^m (X_i - \overline{X}_m)^2$ where $\overline{X}_m = \frac{1}{m}\sum _{i=1}^m X_i$. Then, for any $x>0$

$$\begin{aligned} \mathbb {P} \left( \overline{V}_m \le \mathbb {V}[X] + \sqrt{2\mathbb {V}[X] \dfrac{(b-a)^2x}{m}} + \dfrac{x(b-a)^2}{3m} \right) \ge 1 - \exp (-x). \end{aligned}$$

(39)

Proof

Let us define $U= (X - \mathbb {E}(X))^2$ which satisfies $U \le (b-a)^2$ almost surely. Applying Lemma A.1 with U defined previously with $\ell =m$ gives for any $x>0$

$$\begin{aligned} \mathbb {P} \left( m \left( \overline{U}_m - \mathbb {E}[U] \right) \le \sqrt{2m \mathbb {E}[U^2]x} + \dfrac{x(b-a)^2}{3}\right) \ge 1 - \exp (-x). \end{aligned}$$

Moreover, as $\overline{U}_m = \overline{V}_m + (\overline{X}_m - \mathbb {E}[X])^2$ and using the boundedness of U we get

$$\begin{aligned} \mathbb {P} \left( \overline{V}_m \le \mathbb {E}[U] + \sqrt{2\mathbb {E}[U] \dfrac{(b-a)^2x}{m}} + \dfrac{x(b-a)^2}{3m} \right) \ge 1 - \exp (-x), \end{aligned}$$

which ends the proof since $\mathbb {E}[U] = \mathbb {V}[X]$. $\square $

We recall a second result in the line of Mnih (2008)[Lemma 3].

Lemma A.3

Let q, k be positive real numbers. If $t >0$ is a solution of

$$\begin{aligned} \frac{\log qt}{t} = k, \end{aligned}$$

(40)

then

$$\begin{aligned} t \le \dfrac{2}{k} \log \dfrac{2q}{k}. \end{aligned}$$

(41)

Moreover, if $t'$ is such that

$$\begin{aligned} t^{'} \ge \dfrac{2}{k} \log \dfrac{2q}{k}, \end{aligned}$$

(42)

then

$$\begin{aligned} \dfrac{\log qt^{'}}{t^{'}} \le k. \end{aligned}$$

(43)

Proof

Let $t>0$ be a solution of (40). Since the function $\log $ is concave, it holds for all $s>0$

$$\begin{aligned} kt = \log (q t)\le \log (q s) + \frac{ t - s}{s} . \end{aligned}$$

In particular, for $s = \frac{2}{k} > 0$ we get

$$\begin{aligned} t \le \frac{2}{k} \left( \log \dfrac{2q}{k}- 1 \right) \le \frac{2}{k} \log \dfrac{2q}{k} , \end{aligned}$$

(44)

which yields (41).

Now, let $\varphi : s \mapsto \frac{\log (qs)}{s}$ defined for $s>0$. This function is continuous, strictly increasing on $(0, \frac{e}{q}]$ and strictly decreasing on $[\frac{e}{q}, \infty )$ so it admits a maximum at $t=\frac{e}{q}$. The existence of a solution $t>0$ of (40) implies $ k \le \frac{q}{e}$. If $ k = \frac{q}{e}$ then $t = \frac{e}{q}$ and $\varphi (t)$ is the maximum of $\varphi $. For any $t' >0$, in particular satisfying (42), we have $\varphi (t') \le \varphi (t) = k$ which is (43). If $ 0< k < \frac{q}{e}$, there are two solutions $t_1,t_2$ to (40) such that $0< t_1< \frac{e}{q} < t_2$. By (41) and (42) we have $t'\ge t_2 >\frac{e}{q} $ and since $\varphi $ is stricly discreasing on $[\frac{e}{q}, \infty )$ it holds $\varphi (t')\le \varphi (t_2) = k$ , that is (43). $\square $

B. Proof of Proposition 2.3

Let us define the two events $A = \bigcap _{m \ge 1} A_m$ and $B = \bigcap _{m \ge 1} B_m$ with

$$\begin{aligned} A_m = \left\{ \overline{V}_m \le \sigma ^2 + \sqrt{2 \sigma ^2 (b-a)^2 \log (3/d_m)/m} + \log (3/d_m) (b-a)^2 /3m \right\} , \end{aligned}$$

and

$$\begin{aligned} B_m = \left\{ | \overline{Z}_m - \mu | \le c_m \right\} . \end{aligned}$$

Applying Lemma A.2 with $x=\log (3/d_m)$ for $A_m, m \ge 1$ together with a union bound argument leads to $\mathbb {P}(A) \ge 1 - \delta /3$. Similarly, using a union bound argument and Theorem 2.2 with $x=\log (3/d_m)$, for $B_m, m \ge 1$, gives $\mathbb {P}(B) \ge 1 - \delta $. By gathering these two results we have

$$\begin{aligned} \mathbb {P} \left( A \cap B \right) \ge 1 - \left( \mathbb {P} (A^c) + \mathbb {P}(B^c) \right) \ge 1 - \frac{4 \delta }{3}, \end{aligned}$$

(45)

where $A^c$ and $B^c$ correspond respectively to the complementary events of A and B.

It remains to prove that $A\cap B$ implies

$$\begin{aligned} M \le \left\lceil \frac{2}{\nu } \left[ \log \left( \frac{3}{\delta c} \right) + p\log ( \frac{2p}{\nu } ) \right] \right\rceil , \end{aligned}$$

(46)

which will prove (19). In what follows, we suppose that $A \cap B$ holds.

First we derive an upper bound for $\overline{V}_m$. Since A holds, we have

$$\begin{aligned} \overline{V}_m \le \sigma ^2 + \sqrt{2 \sigma ^2 (b-a)^2 \log (3/d_m)/m} + \log (3/d_m) (b-a)^2 /3m. \end{aligned}$$

(47)

Lemma A.3 with $k = \frac{\sigma ^2}{p (b-a)^2}$ and $q = \left( {\frac{3}{\delta c}}\right) ^{1/p}$ gives for any integer $m \ge M_{\sigma ^2}$

$$\begin{aligned} \dfrac{(b-a)^2}{ m } \log \dfrac{ 3 }{ d_m} \le \sigma ^2, \end{aligned}$$

(48)

where

$$\begin{aligned} M_{\sigma ^2} = \dfrac{2(b-a)^2}{\sigma ^2} \left( p \log \left( \dfrac{2 p (b-a)^2}{\sigma ^2} \right) + \log \left( \dfrac{3}{c \delta } \right) \right) . \end{aligned}$$

Again, Lemma A.3 with $k = \frac{\epsilon ^2 \mu ^2}{p (b-a)^2}$ and $q = \left( {\frac{3}{\delta c}}\right) ^{1/p}$ gives for any integer $m \ge M_{\epsilon ^2 \mu ^2}$

$$\begin{aligned} \dfrac{(b-a)^2}{ m } \log \dfrac{ 3 }{ d_m} \le \epsilon ^2 \mu ^2, \end{aligned}$$

(49)

where

$$\begin{aligned} M_{\epsilon ^2 \mu ^2} = \dfrac{2(b-a)^2}{\epsilon ^2 \mu ^2} \left( p \log \left( \dfrac{2p (b - a)^2}{\epsilon ^2 \mu ^2} \right) + \log \left( \dfrac{3}{c \delta } \right) \right) . \end{aligned}$$

For all $m \ge \min \left( M_{\sigma ^2}, M_{\epsilon ^2 \mu ^2} \right) $, i.e. $m\ge M_{\sigma ^2}$ or $m\ge M_{\epsilon ^2 \mu ^2}$, we obtain from (47) and (48), or (47) and (49), that

$$\begin{aligned} \overline{V}_m \le (1 + \sqrt{2} + 1/3) \max (\sigma ^2,\epsilon ^2 \mu ^2). \end{aligned}$$

(50)

In what follows, we define $\underline{M}= \min \left( M_{\sigma ^2}, M_{\epsilon ^2 \mu ^2} \right) $. Now, we deduce from (50) an upper bound for $c_m$. By definition,

$$\begin{aligned} c_m = \sqrt{\dfrac{2\overline{V}_m \log (3/d_m)}{m}} + \sqrt{\dfrac{3 (b-a)^2 \log (3/d_m)^2}{m^2} }, \end{aligned}$$

then for all integer $m \ge \underline{M}$ and using either (48), or (49), we have

$$\begin{aligned} c_m \le \sqrt{\dfrac{ \alpha \log (3/d_m) }{m}}, \end{aligned}$$

(51)

with $ \alpha := (\sqrt{2 + 2\sqrt{2} + 2/3} + 3)^2 \max (\sigma ^2,\epsilon ^2 \mu ^2)$.

Now, using (51), we seek a bound for M, the smallest integer such that $c_M \le \epsilon | \overline{Z}_M |$. To that aim, let us introduce the integer $M^\star $,

$$\begin{aligned} M^\star = \min \left\{ m \in \mathbb {N}^*: m \ge \underline{M}, \sqrt{\dfrac{\alpha \log (3/d_{m}) }{{m}}} \le \dfrac{\epsilon | \mu |}{1 + \epsilon }\right\} , \end{aligned}$$

(52)

and the integer valued random variable $M^+$

$$\begin{aligned} M_+ = \min \left\{ m \in \mathbb {N}^* : c_m \le \dfrac{\epsilon | \mu |}{1 + \epsilon } \right\} . \end{aligned}$$

(53)

If $\underline{M}\ge M_+$ then $M^\star \ge M_+$.

Otherwise, $ \underline{M} < M_+$ and we have $ M_+ = \min \left\{ m \ge \underline{M} : c_m \le \dfrac{\epsilon | \mu |}{1 + \epsilon } \right\} . $ Moreover, as (51) holds for all $m \ge \underline{M} $, we get the inclusion

$$\begin{aligned} \left\{ m \in \mathbb {N}^*: m \ge \underline{M}, \sqrt{\dfrac{\alpha \log (3/d_{m}) }{{m}}} \le \dfrac{\epsilon | \mu |}{1 + \epsilon }\right\} \subset \left\{ m \in \mathbb {N}^* : m \ge \underline{M} , c_m \le \dfrac{\epsilon | \mu |}{1 + \epsilon } \right\} . \end{aligned}$$

Taking the $\min $ leads again to $M^\star \ge M_+$. Moreover, since B holds, $|\mu | - c_{M_+} \le | \overline{Z}_{M_+} |$ and using (53) it implies that $c_{M_+} \le \epsilon | \overline{Z}_{M_+} |$. By definition of M we get $M_+ \ge M$. Hence, we have $M^\star \ge M$. To conclude the proof, it remains to find an upper bound for $M^\star $. Applying again Lemma A.3 with $k = \frac{\epsilon ^2 \mu ^2}{(1+\epsilon )^2 \alpha p }$ and $q = \left( \frac{3}{\delta c}\right) ^{1/p}$ gives for any integer $m \ge M_f$

$$\begin{aligned} \dfrac{\alpha \log (3/d_{m}) }{{m}} \le \dfrac{\epsilon ^2 \mu ^2}{(1 + \epsilon )^2} \end{aligned}$$

(54)

with

$$\begin{aligned} M_f = \dfrac{2(1+\epsilon )^2\alpha }{\epsilon ^2 \mu ^2} \left( p \log \left( \dfrac{2p(1+\epsilon )^2\alpha }{\epsilon ^2 \mu ^2} \right) + \log \left( \dfrac{3}{c \delta } \right) \right) . \end{aligned}$$

If $M_f \le \underline{M}$, (52) and (54) imply $M^\star = \lceil \underline{M}\rceil $, where $ \lceil \cdot \rceil $ denotes the ceil function. Otherwise $M_f > \underline{M}$ and we obtain $M^\star \le \lceil M_f \rceil $. Thus, it provides the following upper bound

$$\begin{aligned} M^\star \le \max \left( \lceil \underline{M}\rceil , \lceil M_f \rceil \right) = \lceil \max \left( \underline{M} ,M_f \right) \rceil . \end{aligned}$$

Introducing $\nu = \min \left( \frac{\max (\sigma ^2,\epsilon ^2 \mu ^2)}{(b-a)^2} , \frac{\epsilon ^2 \mu ^2}{(1+\epsilon )^2 \alpha } \right) $ we have from the definition of $M_{\sigma ^2}, M_{\epsilon ^2\mu ^2}$ and $M_f$

$$\begin{aligned} M^\star \le \left\lceil \dfrac{2}{\nu } \left( p \log \left( \dfrac{2p}{\nu } \right) + \log \left( \dfrac{3}{c \delta } \right) \right) \right\rceil . \end{aligned}$$

(55)

Since $M^\star \ge M$ and $A \cap B$ implies (55), we deduce that $A \cap B$ implies (46), which concludes the proof of the first result.

Let us now prove the result in expectation. Let $K := \left\lceil \dfrac{2}{\nu } \left( p \log \left( \dfrac{2p}{\nu } \right) + \log \left( \dfrac{3}{c \delta } \right) \right) \right\rceil .$ We first note that

$$\begin{aligned} \mathbb {E}(M) = \sum _{k=0}^\infty \mathbb {P}(M>k) \le K + \sum _{k=K}^\infty \mathbb {P}(M>k) . \end{aligned}$$

If $M>k $, then $c_k >\epsilon \vert \bar{Z}_k \vert $. For $k\ge K$, we would like to prove that $c_k >\epsilon \vert \bar{Z}_k \vert $ implies $(A_k\cap B_k)^c$, or equivalently that $A_k\cap B_k$ implies $c_k \le \epsilon \vert \bar{Z}_k \vert $. For $k\ge K$, $A_k$ implies (51) and (54), and therefore $c_k \le \frac{\epsilon \vert \mu \vert }{1+\epsilon }$. Also, $B_k$ implies $\vert \mu \vert \le \vert \bar{Z}_k \vert + c_k$. Combining the previous inequalities, we easily conclude that $A_k\cap B_k$ implies $c_k \le \epsilon \vert \bar{Z}_k \vert $. For $k\ge K$, we then have $\mathbb {P}(M>k) \le \mathbb {P}(c_k >\epsilon \vert \bar{Z}_k \vert ) \le \mathbb {P}((A_k \cap B_k)^c) \le \mathbb {P}(A_k^c) + \mathbb {P}(B_k^c) \le 4d_k/3$, and then

$$\begin{aligned} \mathbb {E}(M) \le K + \sum _{k=K}^\infty 4d_k/3 \le K + 4\delta /3 , \end{aligned}$$

which ends the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Friess, M.B., Macherey, A., Nouy, A. et al. A PAC algorithm in relative precision for bandit problem with costly sampling. Math Meth Oper Res 96, 161–185 (2022). https://doi.org/10.1007/s00186-022-00769-x

Download citation

Received: 21 September 2020
Revised: 31 August 2021
Accepted: 07 January 2022
Published: 15 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00186-022-00769-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A PAC algorithm in relative precision for bandit problem with costly sampling

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

A survey of Bayesian Network structure learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A. Intermediate results

Lemma A.1

Lemma A.2

Proof

Lemma A.3

Proof

B. Proof of Proposition 2.3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

A survey of Bayesian Network structure learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A. Intermediate results

Lemma A.1

Lemma A.2

Proof

Lemma A.3

Proof

B. Proof of Proposition 2.3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation