Skip to main content
Log in

Optimal decision making in heterogeneous and biased environments

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Abstract

The issue of optimal performance in speeded two-choice tasks has played a substantial role in the development and evaluation of decision making theories. For difficulty-homogeneous environments, the means to achieve optimality are prescribed by the sequential probability ratio test (SPRT), or equivalently, by the drift diffusion model (DDM). Biases in the external environments are easily accommodated into these models by adopting a prior integration bias. However, for difficulty-heterogeneous environments, the issue is more elusive. I show that in such cases, the SPRT and the DDM are no longer equivalent and both are suboptimal. Optimality is achieved by a diffusion-like accumulation of evidence while adjusting the choice thresholds during the time course of a trial. In the second part of the paper, assuming that decisions are made according to the popular DDM, I show that optimal performance in biased environments mandates incorporating a dynamic-bias component (a shift in the drift threshold) in addition to the prior bias (a shift in the starting point) into the model. These conclusions support a conjecture by Hanks, Mazurek, Kiani, Hopp, and Shadlen, (The Journal of Neuroscience, 31(17), 6339–6352, 2011) and contradict a recent attempt to refute this conjecture by arguing that optimality is achieved with the aid of prior bias alone (van Ravenzwaaij et al., 2012). The psychological plausibility of such “mathematically optimal” strategies is discussed. The current paper contributes to the ongoing effort to understand optimal behavior in biased and heterogeneous environments and corrects prior conclusions with respect to optimality in such conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Notes

  1. In Usher & McClelland’s Leaking Competing Accumulators model (LCA; 2001), the evidence in favor of each alternative is taxed by mutual inhibition from the other alternative.

  2. Drugowitsch et al. (2012) allowed for the possibility that integration of information is associated with a temporal cost c(t). Throughout the paper, I ignore such costs i.e., assume that c(t) = 0.

  3. Note that the term evidence refers to the variable that is being integrated by the diffuser in order to make a decision (e.g. perceptual samples). Once the statistical properties of such evidence, i.e. its distribution under both response alternatives is specified, the observer can form his or her belief, i.e. calculate the probability of the correctness of each response alternative (or the likelihood ratio), given the stream of evidence that has been collected (see Appendix B). Both processes, integration of evidence or belief-update are subsumed under the general term ‘integrating information’.

  4. I note from the outset that the van Ravenzwaaij et al. article consists of both a theoretical study of optimal behavior (in both homogeneous and heterogeneous biased environments) and an empirical study of actual behavior. Here, I question only the conclusions with respect to the theoretical analysis of the heterogeneous environments.

  5. Note that this residual time components is subsumed in the term tres in the definition of reward rate, see Eq. (1) below.

  6. Throughout the paper I follow the customary convention and fix s = 0.1. This practice reflects the assumption that the noise level is identical across all conditions (difficulty levels, in the current case). However, mathematically speaking, this procedure poses an ‘over-constraint’ on the model (Donkin, Brown & Heathcote, 2009)

  7. This in effect assumes no integration costs. When there are such costs, integration may terminate prior to the interrogation time T (see Drugowitsch et al., 2012). See also Footnote 2.

  8. Here I make the assumption that the dynamic bias is time-constant and hence that the integration bias, \( \frac{a}{2}-z+{v}_ct, \) builds up linearly during the trial. More generally, v c could be a function of time, but I do not consider this possibility here.

  9. Some formulations of the reward rate assume that errors are followed with negative-reward penalties and/or an increase in the inter-trial temporal interval. Here, for simplicity, I assume that no such penalties exist.

  10. For a homogeneous environment, no threshold adjustment is necessary according to the dynamic-programming based decision rule (See Drugowitsch et al., 2012) and hence both the DDM and the SPRT are optimal.

  11. The two linear functions of T, in both sides of Eq. 4 either coincide (for biases that are selected according to Eqs. 56) or otherwise intersect for at most a unique value of T.

  12. These studies used a different criterion for optimality namely, the Bayes Risk (BR) which minimizes a weighted sum of the mean RT and Error rate (Wald & Wolfowitz, 1948).

  13. For example, if an observer adjusts his or her threshold every 100 ms during a two second interval then 20 parameters are required to describe the adjustment procedure.

  14. Typically, when the Simplex algorithm converges, the search simplex has shrunk to a small diameter. By starting a novel Simplex iteration one increases the diameter of the search simplex. Thus, the next iteration can converge to a different point.

References

  • Balci, F., Simen, P., Niyogi, R., Saxe, A., Hughes, J. A., Holmes, P., & Cohen, J. D. (2011). Acquisition of decision making criteria: Reward rate ultimately beats accuracy. Attention, Perception, & Psychophysics, 73(2), 640–657.

    Article  Google Scholar 

  • Bitzer, S., Park, H., Blankenburg, F., & Kiebel, S. J. (2014). Perceptual decision making: Drift-diffusion model is equivalent to a Bayesian model. Frontiers in Human Neuroscience, 8.

  • Bogacz, R. (2009) Optimal decision making theories. In J. C. Dreher & L. Tremblay (Eds.), Handbook of reward and decision making. Elsevier.

  • Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113(4), 700–765.

    Article  PubMed  Google Scholar 

  • Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178.

    Article  PubMed  Google Scholar 

  • Busemeyer, J. R., & Myung, I. J. (1992). An adaptive approach to human decision making: Learning theory and human performance. Journal of Experimental Psychology: General, 121, 177–194.

    Article  Google Scholar 

  • Cisek, P., Puskas, G. A., & El-Murr, S. (2009). Decisions in changing conditions: The urgency-gating model. The Journal of Neuroscience, 29(37), 11560–11571.

    Article  PubMed  Google Scholar 

  • Deneve, S. (2012). Making decisions with unknown sensory reliability. Frontiers in Neuroscience, 6.

  • Diederich, A., & Busemeyer, J. R. (2006). Modeling the effects of payoff on response bias in a perceptual discrimination task: Bound-change, drift-rate-change, or two-stage-processing hypothesis. Perception & Psychophysics, 68(2), 194–207.

    Article  Google Scholar 

  • Donkin, C., Brown, S. D., & Heathcote, A. (2009). The overconstraint of response time models: Rethinking the scaling problem. Psychonomic Bulletin & Review, 16(6), 1129–1135.

    Article  Google Scholar 

  • Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., & Pouget, A. (2012). The cost of accumulating evidence in perceptual decision making. The Journal of Neuroscience, 32(11), 3612–3628.

    Article  PubMed Central  PubMed  Google Scholar 

  • Edwards, W. (1965). Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing. Journal of Mathematical Psychology, 2(2), 312–329.

    Article  Google Scholar 

  • Geisler, W. S. (2003). Ideal observer analysis. In L. Chalupa & J. Werner (Eds.), The visual neurosciences (pp. 825–837). Cambridge, MA: MIT Press.

    Google Scholar 

  • Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions and reward. Neuron, 36, 299–308.

    Article  PubMed  Google Scholar 

  • Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.

    Article  PubMed  Google Scholar 

  • Hanks, T. D., Mazurek, M. E., Kiani, R., Hopp, E., & Shadlen, M. N. (2011). Elapsed decision time affects the weighting of prior probability in a perceptual decision task. The Journal of Neuroscience, 31(17), 6339–6352.

    Article  PubMed Central  PubMed  Google Scholar 

  • Kiani, R., & Shadlen, M. N. (2009). Representation of confidence associated with a decision by neurons in the parietal cortex. Science, 324(5928), 759–764.

    Article  PubMed Central  PubMed  Google Scholar 

  • Laming, D. R. J. (1968). Information theory of choice-reaction times. London: Academic Press.

    Google Scholar 

  • Mozer, M. C., Kinoshita, S., & Davis, C. (2004). Control of response initiation: Mechanisms of adaptation to recent experience. In M. Hahn & S. C. Stoness (Eds.), Proceedings of the Twenty Sixth Annual Conference of the Cognitive Science Society (pp. 981–986). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Mulder, M. J., Wagenmakers, E. J., Ratcliff, R., Boekel, W., & Forstmann, B. U. (2012). Bias in the brain: a diffusion model analysis of prior probability and potential payoff. The Journal of Neuroscience, 32(7), 2335–2343.

    Article  PubMed  Google Scholar 

  • Myung, I. J., & Busemeyer, J. R. (1989). Criterion learning in a deferred decision making task. American Journal of Psychology, 102, 1–16.

    Article  Google Scholar 

  • Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313.

    Article  Google Scholar 

  • Norris, D. (2006). The Bayesian Reader: Explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113(2), 327–357.

    Article  PubMed  Google Scholar 

  • Norris, D. (2009). Putting it all together: A unified account of word recognition and reaction-time distributions. Psychological Review, 116(1), 207–219.

    Article  PubMed  Google Scholar 

  • Rao, R. P. (2004). Bayesian computation in recurrent neural circuits. Neural Computation, 16, 1–38.

    Article  PubMed  Google Scholar 

  • Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59.

    Article  Google Scholar 

  • Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of the lexical decision task. Psychological Review, 111, 159–182.

    Article  PubMed Central  PubMed  Google Scholar 

  • Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873–922.

    Article  PubMed Central  PubMed  Google Scholar 

  • Ratcliff, R., & Rouder, J. N. (2000). A diffusion model account of masking in two-choice letter identification. Journal of Experimental Psychology: Human Perception and Performance, 26(1), 127–140.

    PubMed  Google Scholar 

  • Thura, D., Beauregard-Racine, J., Fradet, C. W., & Cisek, P. (2012). Decision making by urgency gating: theory and experimental support. Journal of Neurophysiology, 108(11), 2912–2930.

    Article  PubMed  Google Scholar 

  • Turner, B. M., Van Zandt, T., & Brown, S. (2011). A dynamic stimulus-driven model of signal detection. Psychological Review, 118(4), 583–613.

    Article  PubMed  Google Scholar 

  • Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131.

    Article  PubMed  Google Scholar 

  • Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108(3), 550–592.

    Article  PubMed  Google Scholar 

  • van Ravenzwaaij, D., Mulder, M. J., Tuerlinckx, F., & Wagenmakers, E. J. (2012). Do the dynamics of prior information depend on task context? An analysis of optimal performance and an empirical test. Frontiers in Psychology, 3.

  • Vickers, D. (1979). Decision processes in visual perception. New York: Academic Press.

    Google Scholar 

  • Wagenmakers, E. J. (2009). Methodological and empirical developments for the Ratcliff diffusion model of response times and accuracy. European Journal of Cognitive Psychology, 21(5), 641–671.

    Article  Google Scholar 

  • Wagenmakers, E. J., Ratcliff, R., Gomez, P., & McKoon, G. (2008). A diffusion model account of criterion shifts in the lexical decision task. Journal of Memory and Language, 58, 140–159.

    Article  PubMed Central  PubMed  Google Scholar 

  • Wald, A. (1947). Sequential analysis. New York: Wiley.

    Google Scholar 

  • Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 19(3), 326–339.

    Article  Google Scholar 

Download references

Author Notes

The author thanks Marius Usher (MU), Eric-Jan Wagenmakers (EJW), and Don van Ravenzwaaij (DVR) for helpful discussions, and also EJW, DVR, and Andrew Heathcote for providing excellent suggestions during the revision process, and finally, Konstantinos Tsetsos for comments about an earlier version of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rani Moran.

Appendices

Appendix A: Relationship between Wald and RR optimality

In this appendix, I show that WO and RRO are equivalent. This means that a decision rule which achieves one form of optimality achieves also the other, as specified below.

An RRO decision rule is also WO

In this section, I show that if a decision rule maximizes the reward-rate then it is also a Wald optimal strategy (Bogacz et al., 2006; Bogacz, 2009). Denote by \( \tilde{D} \) a decision rule that achieves a maximal reward rate for a given environment and for a given value of mean residual time t res . The accuracy and mean decision times for \( \tilde{D} \) are AC RR (t res ) and t d,RR (t res ) respectively. I argue that \( \tilde{D} \) is also Wald optimal in that AC RR (t res ) must be the maximal possible accuracy (among all decision rules) with mean decision time t d,RR (t res ). That is:

$$ A{C}_{RR}\left({t}_{res}\right)=A{C}_{Wald}\left({t}_{d, RR}\left({t}_{res}\right)\right), $$
(A1)

To see this, note that the reward rate that a WO decision rule, with mean decision time t d,RR (t res ) achieves is given by \( \frac{A{C}_{Wald}\left({t}_{d, RR}\left({t}_{res}\right)\right)}{t_{d, RR}\left({t}_{res}\right)+{t}_{res}} \). Since the Wald rule provide the maximal accuracy for a given mean decision time,

$$ A{C}_{Wald}\left({t}_{d, RR}\left({t}_{res}\right)\right)\ge A{C}_{RR}\left({t}_{res}\right), $$
(A2)

Thus the reward rate of the Wald optimal rule is:

$$ \frac{A{C}_{Wald}\left({t}_{d, RR}\left({t}_{res}\right)\right)}{t_{d, RR}\left({t}_{res}\right)+{t}_{res}}\ge \frac{A{C}_{RR}\left({t}_{res}\right)}{t_{d, RR}\left({t}_{res}\right)+{t}_{res}}= RR\left(\tilde{D}\right), $$
(A3)

In words, the RR of the Wald optimal rule (with mean decision time t d,RR (t res )) is at least as large as the reward rate obtained by \( \tilde{D} \). However, by definition \( \tilde{D} \) is the RR-optimal rule and therefore an equality must hold in Eq. A3 and thus in Eq. A2 as well. Hence Eq. A1 is satisfied.

WO decision rule is also RRO

In the current section I show that, given a target mean value of the decision time t 0 there exists a positive mean residual time t * res for which the WO rule (with mean decision time t 0 and its associated Wald-optimal accuracy AC Wald (t 0)) maximizes the reward rate. To simplify notation, henceforth I denote the mean decision time by t (instead of t d ) and the maximal accuracy by A(t) (instead of AC Wald (t d )

The function A(t) has several important properties. First, by taking 0 decision time an observer can achieve a maximal accuracy of max{p 1, 1 − p 1} where p 1 is the a-priori probability that option ‘1’ (rather than ‘2’) is correct. Without loss of generality we can assume that p 1 ≥ 0.5 and hence A(0) = p 1.

Second, A(t) is a monotonically increasing function of t. Indeed, if t 1 > t 2 then one potential decision rule with mean decision time t 1 is to adopt the WO decision rule for a mean decision time t 2 and then ‘sit and wait’ for a duration of t 1t 2 before issuing a decision. This will yield accuracy of A(t 2). Of course, waiting without integrating information is suboptimal because observers can collect further information which will facilitate accuracy. Therefore A(t 1) > A(t 2).

Third, A(t) is a concave function of t. This means that for all t 1, t 2 and λ ∈ [0, 1]:

$$ A\left(\lambda {t}_1+\left(1-\lambda \right){t}_2\right)\ge \lambda A\left({t}_1\right)+\left(1-\lambda \right)A\left({t}_2\right), $$
(A4)

Indeed, Consider the following ‘mixture’ decision rule: With probability λ ∈ [0, 1] the observer follows the Wald optimal decision rule for mean decision time t 1 and otherwise (i.e. with probability (1 − λ)), the observer follows the Wald optimal decision rule for mean decision time t 2. This mixture rule provides accuracy that is equal to the right hand side of Eq. A4 and its mean decision time is λt 1 + (1 − λ)t 2. By definition, A Wald-optimal decision rule for mean decision time λt 1 + (1 − λ)t 2 will provide at least the same accuracy, and so Eq. A4 is satisfied. If the Wald optimal decision rule will be more efficient than the mixture rule then strict concavity is obtained (i.e. in A4. we will have strict inequality).

Assuming A(t) is a differentiable function, the monotonicity and concavity properties translate too

$$ A^{\prime }(t)>0,A^{\prime\prime }(t)<0 $$
(A5)

Consider next the reward rate, \( RR=\frac{ AC}{t+{t}_{res}} \). We already know, from the previous subsection, that any rule that maximizes the RR is a Wald optimal strategy. Therefore, the optimal reward rate is achieved by maximizing with respect to t the reward function:

$$ R(t)=\frac{A(t)}{t+{t}_{res}}, $$
(A6)

Taking derivatives with respect to t we find that

$$ R^{\prime }(t)=\frac{A^{\prime }(t)\left(t+{t}_{res}\right)-A(t)}{{\left(t+{t}_{res}\right)}^2} $$
(A7)

And the condition for stationary points is thus:

$$ A^{\prime }(t)\left(t+{t}_{res}\right)-A(t)=0, $$
(A8)

Consider a target mean decision time t 0 > 0. I next show that there exists some positive t * res for which t 0 is a stationary point. Indeed, defining

$$ {t}_{res}^{*}=\frac{A\left({t}_0\right)}{A^{\prime}\left({t}_0\right)}-{t}_0, $$
(A9)

We note that t 0 solves Eq. A8. So it remains to be seen that t * res is indeed positive:

$$ {t}_{res}^{*}=\frac{A\left({t}_0\right)-{t}_0A^{\prime}\left({t}_0\right)}{A^{\hbox{'}}\left({t}_0\right)}=\frac{\left(A(0)+{\displaystyle {\int}_0^{t_0}}A^{\prime}\left(\tau \right) d\tau \right)-{t}_0A^{\prime}\left({t}_0\right)}{A^{\prime}\left({t}_0\right)}, $$
(A10)

Noting that A ′ (τ) is a decreasing function of τ (A ′ ′ (τ) < 0 according to Eq. 5), we obtained that \( {\displaystyle {\int}_0^{t_0}A^{\prime}\left(\tau \right) d\tau >{t}_0A^{\prime}\left({t}_0\right)} \). Thus, continuing Eq. A10,

$$ {t}_{res}^{*}=\frac{A(0)+{\displaystyle {\int}_0^{t_0}}A^{\prime}\left(\tau \right) d\tau -{t}_0A^{\prime}\left({t}_0\right)}{A^{\prime}\left({t}_0\right)}>\frac{p_1+{t}_0A^{\prime}\left({t}_0\right)-{t}_0A^{\prime}\left({t}_0\right)}{A^{\prime}\left({t}_0\right)}=\frac{p_1}{A^{\prime}\left({t}_0\right)}>0, $$
(A11)

Next, I show that the stationary point t 0 is a maximum point. Indeed, taking another derivative from Eq. A7 we obtain that:

$$ R^{\prime\prime }(t)=\frac{A^{\prime\prime }(t){\left(t+{t}_{res}\right)}^3-2\left(t+{t}_{res}\right)\left[A^{\prime }(t)\left(t+{t}_{res}\right)-A(t)\right]}{{\left(t+{t}_{res}\right)}^4}, $$
(A12)

Evaluating Eq. A12 at the stationary point t 0 simplifies to \( R^{\prime\prime}\left({t}_0\right)=\frac{A^{\prime\prime}\left({t}_0\right)}{t_0+{t}_{res}}<0 \) which shows that t 0 is a local maximum point of the reward rate.

Next, I show that t 0 is in fact a global maximum. If we assume it is not, then Eq. A8 has another root (stationary point) at the global maximum. Thus, Eq. A8 has at least two different roots. But this means that the derivative of Eq. A8 must also have a root. Thus there exists a positive t such that A ′ ′ (t)(t + t * res ) = 0, which is impossible because A ′ ′ (t) < 0 and t + t * res > 0. Therefore, t 0 must be a global maximum.

To conclude, given a target mean decision time t 0, I found a mean positive value of the residual time, t * res (Eq. A11) for which the WO decision rule (with mean decision time t 0 and accuracy A(t 0)) is RR-optimal.

Appendix B: SPRT in Gaussian environments

In the current Appendix, I extend the SPRT model to Gaussian heterogeneous environments. I assume that the on each trial, the difficulty level is drawn from a Gaussian distribution. The unique source of uncertainty concerns which of the two response alternatives is correct. On each temporal interval dt a new independent perceptual sample is generated and is distributed ~ N(vdt, s 2 dt) where s 2 is the variance rate, and v is the drift rate for the current trial. The participant needs to decide in favor of one of two hypotheses:

  • H 0: The current drift rate v was generated from a N(v 0, η 2) distribution or:

  • H 1: The current drift rate v was generated from a N(−v 0, η 2) distribution.

Importantly, the positive parameters v 0 and η, which corresponds to the mean and to the standard deviation of difficulty distribution respectively are known.

Denote by \( \tilde{x}(t) \)and x(t) respectively the entire stream of accumulated perceptual evidence and the total accumulated evidence obtained by time t (thus x(t) is simply the state of \( \tilde{x}(t) \) at time t). According to Bayes’ rule the posterior odds is the product of the prior odds and the Bayes factor (BF):

$$ \frac{P\left({H}_0\Big|\tilde{x}(t)\ \right)}{P\left({H}_1\Big|\tilde{x}(t)\right)}=\frac{P\left(\tilde{x}(t)\Big|{H}_0\right)}{P\left(\tilde{x}(t)\Big|{H}_1\right)}\frac{P\left({H}_0\right)}{P\left({H}_1\right)} $$
(B1)

Let us next focus on the numerator term \( P\left(\tilde{x}(t)\Big|{H}_0\right) \). It can be shown (see Drugowitsch et al., 2012, Eq. 10) that conditional on a drift rate v:

$$ P\left(\tilde{x}(t)\Big|v\right)=D\left(\tilde{x}(t)\right){e}^{\frac{2x(t)v-t{v}^2}{2{s}^2}} $$
(B2)

Where \( D\left(\tilde{x}(t)\right) \) depends on the specific stream \( \tilde{x}(t) \) but not on the drift rate v.

Reading the following derivation, throughout the section proportionality (∝) denotes equality up to a multiplicative term that is invariant with respect to v 0 (and its sign) but may depend on the specific stream \( \tilde{x}(t) \). Note that, \( P\left(\tilde{x}(t)\Big|{H}_0\right) \) is obtained by integrating \( P\left(\tilde{x}(t)\Big|v\right) \) over the drift distribution. Thus:

$$ \begin{array}{c}\hfill P\left(\left.\tilde{x}(t)\right|{H}_0\right)\propto {\displaystyle {\int}_{-\infty}^{\infty }P\left(\left.\tilde{x}(t)\right|v\right){e}^{-\frac{{\left(v-{v}_0\right)}^2}{2{\eta}^2}} dv=D\left(\tilde{x}(t)\right){\displaystyle {\int}_{-\infty}^{\infty }{e}^{\frac{2x(t)v-t{v}^2}{2{s}^2}}}{e}^{-\frac{{\left(v-{v}_0\right)}^2}{2{\eta}^2}} dv}\hfill \\ {}\hfill \propto {\displaystyle {\int}_{-\infty}^{\infty }{e}^{\frac{\eta^2v\left(2x(t)- vt\right)-{s}^2{\left(v-{v}_0\right)}^2}{2{s}^2{\eta}^2}}} dv\hfill \\ {}\hfill ={e}^{-\frac{v_0^2}{2{\eta}^2}}{\displaystyle {\int}_{-\infty}^{\infty }{e}^{-\frac{\left({s}^2+{\eta}^2t\right){v}^2-2\left({\eta}^2x(t)+{s}^2{v}_0\right)v}{2{s}^2{\eta}^2}}} dv\hfill \\ {}\hfill ={e}^{-\frac{v_0^2}{2{\eta}^2}}{\displaystyle {\int}_{-\infty}^{\infty }{e}^{-\frac{v^2-2\left(\frac{\eta^2x(t)+{s}^2{v}_0}{s^2+{\eta}^2t}\right)v}{\frac{2{s}^2{\eta}^2}{s^2+{\eta}^2t}}}} dv\hfill \\ {}\hfill ={e}^{-\frac{v_0^2}{2{\eta}^2}}{\displaystyle {\int}_{-\infty}^{\infty }{e}^{-\frac{{\left(v-\frac{\eta^2x(t)+{s}^2{v}_0}{s^2+{\eta}^2t}\right)}^2}{\frac{2{s}^2{\eta}^2}{s^2+{\eta}^2t}}}}{e}^{\frac{{\left({\eta}^2x(t)+{s}^2{v}_0\right)}^2}{2{s}^2{\eta}^2\left({s}^2+{\eta}^2t\right)}} dv\hfill \\ {}\hfill \propto {e}^{-\frac{v_0^2}{2{\eta}^2}+\frac{2{s}^2{\eta}^2x(t){v}_0+{s}^4{v}_0^2}{2{s}^2{\eta}^2\left({s}^2+{\eta}^2t\right)}}{\displaystyle {\int}_{-\infty}^{\infty }{e}^{-\frac{{\left(v-\frac{\eta^2x(t)+{s}^2{v}_0}{s^2+{\eta}^2t}\right)}^2}{\frac{2{s}^2{\eta}^2}{s^2+{\eta}^2t}}}} dv\hfill \end{array} $$
(B3)

Examining the integrand in the final term we note that it is proportional to the probability density function of a Normal distribution with mean = \( \frac{\eta^2x(t)+{s}^2{v}_0}{\left({s}^2+{\eta}^2t\right)} \) and variance = \( \frac{s^2{\eta}^2}{s^2+{\eta}^2t} \). Therefore this integral is independent of v 0 so we obtain:

$$ P\left(\tilde{x}(t)\Big|{H}_0\right)\propto {e}^{-\frac{v_0^2}{2{\eta}^2}+\frac{2{\eta}^2x(t){v}_0+{s}^2{v}_0^2}{2{\eta}^2\left({s}^2+{\eta}^2t\right)}} $$
(B4)

We can now derive the term \( P\left(\tilde{x}(t)\Big|{H}_1\right) \) by replacing in Eq. B4 v 0 by − v 0 to obtain:

$$ P\left(\tilde{x}(t)\Big|{H}_1\right)\propto {e}^{-\frac{v_0^2}{2{\eta}^2}+\frac{-2{\eta}^2x(t){v}_0+{s}^2{v}_0^2}{2{\eta}^2\left({s}^2+{\eta}^2t\right)}} $$
(B5)

Equations A4A5 share the same proportion factor, hence returning to the BF, it follows that

$$ \frac{P\left(\tilde{x}(t)\Big|{H}_0\right)}{P\left(\tilde{x}(t)\Big|{H}_1\right)}={e}^{\frac{2x(t){v}_0}{\left({s}^2+{\eta}^2t\right)}} $$
(B6)

Finally, taking logarithms of Eq. B1 and using Eq. B6, we obtain:

$$ \tilde{\uppi}=\frac{2x(t){v}_0}{\left({s}^2+{\eta}^2t\right)}+\pi $$
(B7)

where π and \( \tilde{\uppi} \) are the log-prior and log-posterior odds respectively.

In SPRT integration of perceptual evidence occurs until the posterior reaches a target level \( \pm \alpha, \alpha \equiv \ln \left(\frac{A}{1-A}\right) \), where A is a target level of accuracy. From Eq. B7 it follows that integration occurs until

$$ x(t)\varepsilon \left\{-\frac{s^2\left(\alpha +\pi \right)}{2{v}_0}-\frac{\eta^2\left(\alpha +\pi \right)}{2{v}_0}\ t,\frac{s^2\left(\alpha -\pi \right)}{2{v}_0}+\frac{\eta^2\left(\alpha -\pi \right)}{2{v}_0}\ t\right\} $$
(B8)

This means that a diffuser (with starting point x(0) = z) will terminate all trials with the same posteriors level of ± α if the time-variant response thresholds are set at distances \( -\frac{s^2\left(\alpha +\pi \right)}{2{v}_0}-\frac{\eta^2\left(\alpha +\pi \right)}{2{v}_0}t \) (the lower threshold) and \( \frac{s^2\left(\alpha -\pi \right)}{2{v}_0}+\frac{\eta^2\left(\alpha -\pi \right)}{2{v}_0}\ t \) (the upper threshold) from the starting point. Note that the lower and upper response thresholds respectively are linearly decreasing and increasing functions of time and that the boundary separation increases with rate \( \frac{\eta^2\alpha }{v_0} \). Additionally, in the particular case that the environment is unbiased (i.e. π = 0) both thresholds change with equal absolute rates \( \frac{\eta^2\alpha }{2{v}_0} \) but in opposite direction.

Another implication of Eq. B7 is that in the DDM, where integration stops when either of the (constant) thresholds is reached (located in distances − z or a − z from the starting point) then the log odds are: \( \tilde{\uppi}=\pi -\frac{2z{v}_0}{\left({s}^2+{\eta}^2t\right)} \), for the lower boundary and \( \tilde{\uppi}=\frac{2\left(a-z\right){v}_0}{\left({s}^2+{\eta}^2t\right)}+\pi \), for the other threshold. Recall, that the logs odds are formulated in terms of the ‘upper’ (H 0) choice-alternative relative to the ‘lower’ (H 1) choice-alternative. If instead, the log odds are formulated with respect to the chosen relative to the non-chosen alternative, the log-odds for the lower threshold is obtained by flipping the sign: \( \tilde{\uppi}=\frac{2z{v}_0}{\left({s}^2+{\eta}^2t\right)}-\pi \). Note that the log odds for both alternatives decrease monotonically as a function of  t, tending towards the prior odds (± π) as t → ∞.

Appendix C: Simulation methods

In this appendix I describe the method I used for finding the optimal triplet (a, z, v c ) for the DDM in biased heterogeneous environments. For a single difficulty level v, the accuracy and the MRT are given by (c.f. Eq. 8–12 in van Ravenzwaaij et al. 2012):

$$ Acc\left(v|a,z,{v}_c,\beta, s\right)=\beta \frac{\left({e}^{\frac{2a\left(v+{v}_c\right)}{s^2}}-{e}^{\frac{2\left(a-z\right)\left(v+{v}_c\right)}{s^2}}\right)}{e^{\frac{2a\left(v+{v}_c\right)}{s^2}}-1}+\left(1-\beta \right)\frac{\left({e}^{\frac{2a\left(v-{v}_c\right)}{s^2}}-{e}^{\frac{2z\left(v-{v}_c\right)}{s^2}}\right)}{e^{\frac{2a\left(v-{v}_c\right)}{s^2}}-1} $$
(C1)
$$ MRT\left(v|a,z,{v}_c,\beta, s\right)=\beta \left(-\frac{z}{v+{v}_c}+\frac{a\left({e}^{-\frac{2z\left(v+{v}_c\right)}{s^2}}-1\right)}{\left(v+{v}_c\right)\left({e}^{-\frac{2a\left(v+{v}_c\right)}{s^2}}-1\right)}\right)+\left(1-\beta \right)\left(-\frac{a-z}{v-{v}_c}+\frac{a\left({e}^{-\frac{2\left(a-z\right)\left(v-{v}_c\right)}{s^2}}-1\right)}{\left(v-{v}_c\right)\left({e}^{-\frac{2a\left(v-{v}_c\right)}{s^2}}-1\right)}\right) $$
(C2)

When the environment is heterogeneous so that drift is distributed ~ f(v) I found the accuracy and MRT by integrating the corresponding terms over the distribution f. The two cases that are explored in the paper are a Gaussian and a discrete f with two equi-probable drift rates. For the Gaussian case the integration was performed by numerical integration and the discrete integration was handled by arithmetic averaging.

The optimization problem can now be formulated as

$$ \begin{array}{l}\left(a,z,{v}_c\right)= argmin{\displaystyle \int MRT\left(v|a,z,{v}_c,\beta, s\right) df(v)}\hfill \\ {}s.t{\displaystyle \int Acc\left(v|a,z,{v}_c,\beta, s\right) df(v)\ge A}\hfill \end{array} $$
(C3)

where A is the desired accuracy level. Note that for the optimal triplet the constraint is always satisfied with equality, otherwise a sufficiently slight reduction to the threshold separation a would diminish MRT while maintaining accuracy above the desired level, contradicting the optimality of the triplet. Defining:

$$ \begin{array}{l}F\left(a,z,{v}_c,\beta, s\right)=\hfill \\ {}\left\{{\displaystyle \int \begin{array}{ll} MRT\left(v|a,z,{v}_c,\beta, s\right) df(v)\hfill &, {\displaystyle \int Acc\left(v|a,z,{v}_c,\beta, s\right) df(v)\ge A}\hfill \\ {}\infty \hfill &, {\displaystyle \int Acc\left(v|a,z,{v}_c,\beta, s\right) df(v)<A}\hfill \end{array}}\right.\hfill \end{array} $$
(C4)

the optimal triplet is defined by (a, z, v c ) = argminF(a, z, v c , β, s).

I took considerable measures to avoid local minima in the search for the triplet that minimizes F. This search was conducted with a combination of genetic algorithms and the iterative Nelder-Mead (Nelder & Mead, 1965) Simplex method, (implanted by the routines “ga,” “fminsearch” available in Mathwork’s MATLAB). I repeated the following steps 10,000 times. First, I minimized the objective function by running the genetic algorithm. The output triplet was then fed as the starting point for the simplex algorithms. The simplex algorithm in turn was iterated several times; each iteration started with the parameters obtained from the termination of the previous iteration.Footnote 14 This was repeated until the objective function improved by less than 1e-5 on two consecutive runs. The triplet that minimized the objective function more than 10,000 iterations of a genetic algorithm followed by a sequence of simplex iterations was considered to be the optimal triplet.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moran, R. Optimal decision making in heterogeneous and biased environments. Psychon Bull Rev 22, 38–53 (2015). https://doi.org/10.3758/s13423-014-0669-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13423-014-0669-3

Keywords

Navigation