Skip to main content
Log in

A PAC algorithm in relative precision for bandit problem with costly sampling

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

This paper considers the problem of maximizing an expectation function over a finite set, or finite-arm bandit problem. We first propose a naive stochastic bandit algorithm for obtaining a probably approximately correct (PAC) solution to this discrete optimization problem in relative precision, that is a solution which solves the optimization problem up to a relative error smaller than a prescribed tolerance, with high probability. We also propose an adaptive stochastic bandit algorithm which provides a PAC-solution with the same guarantees. The adaptive algorithm outperforms the mean complexity of the naive algorithm in terms of number of generated samples and is particularly well suited for applications with high sampling cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. A random variable Z is said to be sub-Gaussian of parameter \(\gamma > 0\) if, for all \(s \ge 0\) we have \(\mathbb {E}[\exp (s(Z - \mathbb {E}[Z])] \le \exp (s^2 \gamma / 2). \)

References

  • Audibert JY, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits. In: Annual conference on learning theory (COLT)

  • Audibert JY, Bubeck S, Munos R (2011) Bandit view on noisy optimization. Optim Mach Learn 431

  • Audibert JY, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoret Comput Sci 410(19):1876–1902

    Article  MathSciNet  Google Scholar 

  • Beyer HG, Sendhoff B (2007) Robust optimization—a comprehensive survey. Comput Methods Appl Mech Eng 196(33–34):3190–3218

    Article  MathSciNet  Google Scholar 

  • Bubeck S, Munos R, Stoltz G (2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoret Comput Sci 412(19):1832–1852

    Article  MathSciNet  Google Scholar 

  • Dupa V, Herkenrath U (1982) Stochastic approximation on a discrete set and the multi-armed. Seq Anal 1(1):1–25

    MathSciNet  Google Scholar 

  • Even-Dar E, Mannor S, Mansour Y (2002) Pac bounds for multi-armed bandit and Markov decision processes. In: International conference on computational learning theory. Springer, pp 255–270 (2002)

  • Garivier A, Cappé O (2011) The KL-UCB algorithm for bounded stochastic bandits and beyond. In: Proceedings of the 24th annual conference on learning theory, pp 359–376

  • Gong WB, Ho YC, Zhai W (2000) Stochastic comparison algorithm for discrete optimization with estimation. SIAM J Optim 1(2):384–404

    Article  MathSciNet  Google Scholar 

  • Kalyanakrishnan S, Tewari A, Auer P, Stone P (2012) Pac subset selection in stochastic multi-armed bandits. In: ICML, vol 12, pp 655–662

  • Kano H, Honda J, Sakamaki K, Matsuura K, Nakamura A, Sugiyama M (2019) Good arm identification via bandit feedback. Mach Learn 108(5):721–745

    Article  MathSciNet  Google Scholar 

  • Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J Mach Learn Res 17(1):1–42

    MathSciNet  MATH  Google Scholar 

  • Kaufmann E, Kalyanakrishnan S (2013) Information complexity in bandit subset selection. In: Conference on learning theory. PMLR, pp 228–251

  • Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028

  • Lattimore T, Szepesvári C (2020) Bandit algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Locatelli A, Gutzeit M, Carpentier A (2016) An optimal algorithm for the thresholding bandit problem. International Conference on Machine Learning, 1690–1698

  • Mnih V (2008) Efficient stopping rules. Ph.D. thesis, University of Alberta

  • Mnih V, Szepesvári C, Audibert JY (2008) Empirical Bernstein stopping. In: Proceedings of the 25th international conference on Machine learning, pp 672–679

  • Mukherjee S, Naveen KP, Sudarsanam N, Ravindran B (2017) Thresholding bandits with augmented UCB. International Joint Conference on Artificial Intelligence.

  • Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609

    Article  MathSciNet  Google Scholar 

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Tao C, Blanco S, Peng J, Zhou Y (2019)Thresholding bandit with optimal aggregate regret. In: Advances in neural information processing systems, pp 11664–11673

  • Yan D, Mukai H (1992) Stochastic discrete optimization. SIAM J Control Optim 30(3):594–612

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony Nouy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was partially supported by project MATHAMSUD FANTASTIC 20-MATH-05.

Appendices

A. Intermediate results

Here we provide intermediate results used thereafter for the proof of Proposition 2.3 in Sect. 1. We first recall a version of Bennett’s inequality from (Audibert et al. 2009)[Lemma 5].

Lemma A.1

Let U be a random variable defined on \((\Omega , \mathcal {F}, \mathbb {P})\) such that \(U \le b\) almost surely, with \(b \in \mathbb {R}\). Let \(U_1, \ldots , U_m\) be i.i.d. copies of U and \(\overline{U}_{\ell } = \frac{1}{\ell }\sum _{i=1}^{\ell } U_i\). For any \(x>0\), it holds, with probability at least \(1 - \exp (-x)\), simultaneously for all \(1 \le \ell \le m\)

$$\begin{aligned} \ell \left( \overline{U}_\ell - \mathbb {E} \left[ U \right] \right) \le \sqrt{2m \mathbb {E} \left[ U^2 \right] x} + b_+ x / 3, \end{aligned}$$
(38)

with \(b_+ = \max (0, b)\).

Now, the following result provides a bound with high probability for the estimated variance of an i.i.d. sequence of bounded random variables.

Lemma A.2

Let X be a bounded random variable defined on \((\Omega , \mathcal {F}, \mathbb {P})\), such that \(a \le X \le b\) almost surely, with \(a<b\) two real numbers. Let \(X_1, \ldots , X_m\) be i.i.d. copies of X and \(\overline{V}_m = \dfrac{1}{m} \sum _{i=1}^m (X_i - \overline{X}_m)^2\) where \(\overline{X}_m = \frac{1}{m}\sum _{i=1}^m X_i\). Then, for any \(x>0\)

$$\begin{aligned} \mathbb {P} \left( \overline{V}_m \le \mathbb {V}[X] + \sqrt{2\mathbb {V}[X] \dfrac{(b-a)^2x}{m}} + \dfrac{x(b-a)^2}{3m} \right) \ge 1 - \exp (-x). \end{aligned}$$
(39)

Proof

Let us define \(U= (X - \mathbb {E}(X))^2\) which satisfies \(U \le (b-a)^2\) almost surely. Applying Lemma A.1 with U defined previously with \(\ell =m\) gives for any \(x>0\)

$$\begin{aligned} \mathbb {P} \left( m \left( \overline{U}_m - \mathbb {E}[U] \right) \le \sqrt{2m \mathbb {E}[U^2]x} + \dfrac{x(b-a)^2}{3}\right) \ge 1 - \exp (-x). \end{aligned}$$

Moreover, as \(\overline{U}_m = \overline{V}_m + (\overline{X}_m - \mathbb {E}[X])^2\) and using the boundedness of U we get

$$\begin{aligned} \mathbb {P} \left( \overline{V}_m \le \mathbb {E}[U] + \sqrt{2\mathbb {E}[U] \dfrac{(b-a)^2x}{m}} + \dfrac{x(b-a)^2}{3m} \right) \ge 1 - \exp (-x), \end{aligned}$$

which ends the proof since \(\mathbb {E}[U] = \mathbb {V}[X]\). \(\square \)

We recall a second result in the line of Mnih (2008)[Lemma 3].

Lemma A.3

Let qk be positive real numbers. If \(t >0\) is a solution of

$$\begin{aligned} \frac{\log qt}{t} = k, \end{aligned}$$
(40)

then

$$\begin{aligned} t \le \dfrac{2}{k} \log \dfrac{2q}{k}. \end{aligned}$$
(41)

Moreover, if \(t'\) is such that

$$\begin{aligned} t^{'} \ge \dfrac{2}{k} \log \dfrac{2q}{k}, \end{aligned}$$
(42)

then

$$\begin{aligned} \dfrac{\log qt^{'}}{t^{'}} \le k. \end{aligned}$$
(43)

Proof

Let \(t>0\) be a solution of (40). Since the function \(\log \) is concave, it holds for all \(s>0\)

$$\begin{aligned} kt = \log (q t)\le \log (q s) + \frac{ t - s}{s} . \end{aligned}$$

In particular, for \(s = \frac{2}{k} > 0\) we get

$$\begin{aligned} t \le \frac{2}{k} \left( \log \dfrac{2q}{k}- 1 \right) \le \frac{2}{k} \log \dfrac{2q}{k} , \end{aligned}$$
(44)

which yields (41).

Now, let \(\varphi : s \mapsto \frac{\log (qs)}{s}\) defined for \(s>0\). This function is continuous, strictly increasing on \((0, \frac{e}{q}]\) and strictly decreasing on \([\frac{e}{q}, \infty )\) so it admits a maximum at \(t=\frac{e}{q}\). The existence of a solution \(t>0\) of (40) implies \( k \le \frac{q}{e}\). If \( k = \frac{q}{e}\) then \(t = \frac{e}{q}\) and \(\varphi (t)\) is the maximum of \(\varphi \). For any \(t' >0\), in particular satisfying (42), we have \(\varphi (t') \le \varphi (t) = k\) which is (43). If \( 0< k < \frac{q}{e}\), there are two solutions \(t_1,t_2\) to (40) such that \(0< t_1< \frac{e}{q} < t_2\). By (41) and (42) we have \(t'\ge t_2 >\frac{e}{q} \) and since \(\varphi \) is stricly discreasing on \([\frac{e}{q}, \infty )\) it holds \(\varphi (t')\le \varphi (t_2) = k\) , that is (43). \(\square \)

B. Proof of Proposition 2.3

Let us define the two events \(A = \bigcap _{m \ge 1} A_m\) and \(B = \bigcap _{m \ge 1} B_m\) with

$$\begin{aligned} A_m = \left\{ \overline{V}_m \le \sigma ^2 + \sqrt{2 \sigma ^2 (b-a)^2 \log (3/d_m)/m} + \log (3/d_m) (b-a)^2 /3m \right\} , \end{aligned}$$

and

$$\begin{aligned} B_m = \left\{ | \overline{Z}_m - \mu | \le c_m \right\} . \end{aligned}$$

Applying Lemma A.2 with \(x=\log (3/d_m)\) for \(A_m, m \ge 1\) together with a union bound argument leads to \(\mathbb {P}(A) \ge 1 - \delta /3\). Similarly, using a union bound argument and Theorem 2.2 with \(x=\log (3/d_m)\), for \(B_m, m \ge 1\), gives \(\mathbb {P}(B) \ge 1 - \delta \). By gathering these two results we have

$$\begin{aligned} \mathbb {P} \left( A \cap B \right) \ge 1 - \left( \mathbb {P} (A^c) + \mathbb {P}(B^c) \right) \ge 1 - \frac{4 \delta }{3}, \end{aligned}$$
(45)

where \(A^c\) and \(B^c\) correspond respectively to the complementary events of A and B.

It remains to prove that \(A\cap B\) implies

$$\begin{aligned} M \le \left\lceil \frac{2}{\nu } \left[ \log \left( \frac{3}{\delta c} \right) + p\log ( \frac{2p}{\nu } ) \right] \right\rceil , \end{aligned}$$
(46)

which will prove (19). In what follows, we suppose that \(A \cap B\) holds.

First we derive an upper bound for \(\overline{V}_m\). Since A holds, we have

$$\begin{aligned} \overline{V}_m \le \sigma ^2 + \sqrt{2 \sigma ^2 (b-a)^2 \log (3/d_m)/m} + \log (3/d_m) (b-a)^2 /3m. \end{aligned}$$
(47)

Lemma A.3 with \(k = \frac{\sigma ^2}{p (b-a)^2}\) and \(q = \left( {\frac{3}{\delta c}}\right) ^{1/p}\) gives for any integer \(m \ge M_{\sigma ^2}\)

$$\begin{aligned} \dfrac{(b-a)^2}{ m } \log \dfrac{ 3 }{ d_m} \le \sigma ^2, \end{aligned}$$
(48)

where

$$\begin{aligned} M_{\sigma ^2} = \dfrac{2(b-a)^2}{\sigma ^2} \left( p \log \left( \dfrac{2 p (b-a)^2}{\sigma ^2} \right) + \log \left( \dfrac{3}{c \delta } \right) \right) . \end{aligned}$$

Again, Lemma A.3 with \(k = \frac{\epsilon ^2 \mu ^2}{p (b-a)^2}\) and \(q = \left( {\frac{3}{\delta c}}\right) ^{1/p}\) gives for any integer \(m \ge M_{\epsilon ^2 \mu ^2}\)

$$\begin{aligned} \dfrac{(b-a)^2}{ m } \log \dfrac{ 3 }{ d_m} \le \epsilon ^2 \mu ^2, \end{aligned}$$
(49)

where

$$\begin{aligned} M_{\epsilon ^2 \mu ^2} = \dfrac{2(b-a)^2}{\epsilon ^2 \mu ^2} \left( p \log \left( \dfrac{2p (b - a)^2}{\epsilon ^2 \mu ^2} \right) + \log \left( \dfrac{3}{c \delta } \right) \right) . \end{aligned}$$

For all \(m \ge \min \left( M_{\sigma ^2}, M_{\epsilon ^2 \mu ^2} \right) \), i.e. \(m\ge M_{\sigma ^2}\) or \(m\ge M_{\epsilon ^2 \mu ^2}\), we obtain from (47) and (48), or (47) and (49), that

$$\begin{aligned} \overline{V}_m \le (1 + \sqrt{2} + 1/3) \max (\sigma ^2,\epsilon ^2 \mu ^2). \end{aligned}$$
(50)

In what follows, we define \(\underline{M}= \min \left( M_{\sigma ^2}, M_{\epsilon ^2 \mu ^2} \right) \). Now, we deduce from (50) an upper bound for \(c_m\). By definition,

$$\begin{aligned} c_m = \sqrt{\dfrac{2\overline{V}_m \log (3/d_m)}{m}} + \sqrt{\dfrac{3 (b-a)^2 \log (3/d_m)^2}{m^2} }, \end{aligned}$$

then for all integer \(m \ge \underline{M}\) and using either (48), or (49), we have

$$\begin{aligned} c_m \le \sqrt{\dfrac{ \alpha \log (3/d_m) }{m}}, \end{aligned}$$
(51)

with \( \alpha := (\sqrt{2 + 2\sqrt{2} + 2/3} + 3)^2 \max (\sigma ^2,\epsilon ^2 \mu ^2)\).

Now, using (51), we seek a bound for M, the smallest integer such that \(c_M \le \epsilon | \overline{Z}_M |\). To that aim, let us introduce the integer \(M^\star \),

$$\begin{aligned} M^\star = \min \left\{ m \in \mathbb {N}^*: m \ge \underline{M}, \sqrt{\dfrac{\alpha \log (3/d_{m}) }{{m}}} \le \dfrac{\epsilon | \mu |}{1 + \epsilon }\right\} , \end{aligned}$$
(52)

and the integer valued random variable \(M^+\)

$$\begin{aligned} M_+ = \min \left\{ m \in \mathbb {N}^* : c_m \le \dfrac{\epsilon | \mu |}{1 + \epsilon } \right\} . \end{aligned}$$
(53)

If \(\underline{M}\ge M_+\) then \(M^\star \ge M_+\).

Otherwise, \( \underline{M} < M_+\) and we have \( M_+ = \min \left\{ m \ge \underline{M} : c_m \le \dfrac{\epsilon | \mu |}{1 + \epsilon } \right\} . \) Moreover, as (51) holds for all \(m \ge \underline{M} \), we get the inclusion

$$\begin{aligned} \left\{ m \in \mathbb {N}^*: m \ge \underline{M}, \sqrt{\dfrac{\alpha \log (3/d_{m}) }{{m}}} \le \dfrac{\epsilon | \mu |}{1 + \epsilon }\right\} \subset \left\{ m \in \mathbb {N}^* : m \ge \underline{M} , c_m \le \dfrac{\epsilon | \mu |}{1 + \epsilon } \right\} . \end{aligned}$$

Taking the \(\min \) leads again to \(M^\star \ge M_+\). Moreover, since B holds, \(|\mu | - c_{M_+} \le | \overline{Z}_{M_+} |\) and using (53) it implies that \(c_{M_+} \le \epsilon | \overline{Z}_{M_+} |\). By definition of M we get \(M_+ \ge M\). Hence, we have \(M^\star \ge M\). To conclude the proof, it remains to find an upper bound for \(M^\star \). Applying again Lemma A.3 with \(k = \frac{\epsilon ^2 \mu ^2}{(1+\epsilon )^2 \alpha p }\) and \(q = \left( \frac{3}{\delta c}\right) ^{1/p}\) gives for any integer \(m \ge M_f\)

$$\begin{aligned} \dfrac{\alpha \log (3/d_{m}) }{{m}} \le \dfrac{\epsilon ^2 \mu ^2}{(1 + \epsilon )^2} \end{aligned}$$
(54)

with

$$\begin{aligned} M_f = \dfrac{2(1+\epsilon )^2\alpha }{\epsilon ^2 \mu ^2} \left( p \log \left( \dfrac{2p(1+\epsilon )^2\alpha }{\epsilon ^2 \mu ^2} \right) + \log \left( \dfrac{3}{c \delta } \right) \right) . \end{aligned}$$

If \(M_f \le \underline{M}\), (52) and (54) imply \(M^\star = \lceil \underline{M}\rceil \), where \( \lceil \cdot \rceil \) denotes the ceil function. Otherwise \(M_f > \underline{M}\) and we obtain \(M^\star \le \lceil M_f \rceil \). Thus, it provides the following upper bound

$$\begin{aligned} M^\star \le \max \left( \lceil \underline{M}\rceil , \lceil M_f \rceil \right) = \lceil \max \left( \underline{M} ,M_f \right) \rceil . \end{aligned}$$

Introducing \(\nu = \min \left( \frac{\max (\sigma ^2,\epsilon ^2 \mu ^2)}{(b-a)^2} , \frac{\epsilon ^2 \mu ^2}{(1+\epsilon )^2 \alpha } \right) \) we have from the definition of \(M_{\sigma ^2}, M_{\epsilon ^2\mu ^2}\) and \(M_f\)

$$\begin{aligned} M^\star \le \left\lceil \dfrac{2}{\nu } \left( p \log \left( \dfrac{2p}{\nu } \right) + \log \left( \dfrac{3}{c \delta } \right) \right) \right\rceil . \end{aligned}$$
(55)

Since \(M^\star \ge M\) and \(A \cap B\) implies (55), we deduce that \(A \cap B\) implies (46), which concludes the proof of the first result.

Let us now prove the result in expectation. Let \(K := \left\lceil \dfrac{2}{\nu } \left( p \log \left( \dfrac{2p}{\nu } \right) + \log \left( \dfrac{3}{c \delta } \right) \right) \right\rceil .\) We first note that

$$\begin{aligned} \mathbb {E}(M) = \sum _{k=0}^\infty \mathbb {P}(M>k) \le K + \sum _{k=K}^\infty \mathbb {P}(M>k) . \end{aligned}$$

If \(M>k \), then \(c_k >\epsilon \vert \bar{Z}_k \vert \). For \(k\ge K\), we would like to prove that \(c_k >\epsilon \vert \bar{Z}_k \vert \) implies \((A_k\cap B_k)^c\), or equivalently that \(A_k\cap B_k\) implies \(c_k \le \epsilon \vert \bar{Z}_k \vert \). For \(k\ge K\), \(A_k\) implies (51) and (54), and therefore \(c_k \le \frac{\epsilon \vert \mu \vert }{1+\epsilon }\). Also, \(B_k\) implies \(\vert \mu \vert \le \vert \bar{Z}_k \vert + c_k\). Combining the previous inequalities, we easily conclude that \(A_k\cap B_k\) implies \(c_k \le \epsilon \vert \bar{Z}_k \vert \). For \(k\ge K\), we then have \(\mathbb {P}(M>k) \le \mathbb {P}(c_k >\epsilon \vert \bar{Z}_k \vert ) \le \mathbb {P}((A_k \cap B_k)^c) \le \mathbb {P}(A_k^c) + \mathbb {P}(B_k^c) \le 4d_k/3\), and then

$$\begin{aligned} \mathbb {E}(M) \le K + \sum _{k=K}^\infty 4d_k/3 \le K + 4\delta /3 , \end{aligned}$$

which ends the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Friess, M.B., Macherey, A., Nouy, A. et al. A PAC algorithm in relative precision for bandit problem with costly sampling. Math Meth Oper Res 96, 161–185 (2022). https://doi.org/10.1007/s00186-022-00769-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-022-00769-x

Keywords

Navigation