Abstract
The issue of optimal performance in speeded two-choice tasks has played a substantial role in the development and evaluation of decision making theories. For difficulty-homogeneous environments, the means to achieve optimality are prescribed by the sequential probability ratio test (SPRT), or equivalently, by the drift diffusion model (DDM). Biases in the external environments are easily accommodated into these models by adopting a prior integration bias. However, for difficulty-heterogeneous environments, the issue is more elusive. I show that in such cases, the SPRT and the DDM are no longer equivalent and both are suboptimal. Optimality is achieved by a diffusion-like accumulation of evidence while adjusting the choice thresholds during the time course of a trial. In the second part of the paper, assuming that decisions are made according to the popular DDM, I show that optimal performance in biased environments mandates incorporating a dynamic-bias component (a shift in the drift threshold) in addition to the prior bias (a shift in the starting point) into the model. These conclusions support a conjecture by Hanks, Mazurek, Kiani, Hopp, and Shadlen, (The Journal of Neuroscience, 31(17), 6339–6352, 2011) and contradict a recent attempt to refute this conjecture by arguing that optimality is achieved with the aid of prior bias alone (van Ravenzwaaij et al., 2012). The psychological plausibility of such “mathematically optimal” strategies is discussed. The current paper contributes to the ongoing effort to understand optimal behavior in biased and heterogeneous environments and corrects prior conclusions with respect to optimality in such conditions.
Notes
In Usher & McClelland’s Leaking Competing Accumulators model (LCA; 2001), the evidence in favor of each alternative is taxed by mutual inhibition from the other alternative.
Drugowitsch et al. (2012) allowed for the possibility that integration of information is associated with a temporal cost c(t). Throughout the paper, I ignore such costs i.e., assume that c(t) = 0.
Note that the term evidence refers to the variable that is being integrated by the diffuser in order to make a decision (e.g. perceptual samples). Once the statistical properties of such evidence, i.e. its distribution under both response alternatives is specified, the observer can form his or her belief, i.e. calculate the probability of the correctness of each response alternative (or the likelihood ratio), given the stream of evidence that has been collected (see Appendix B). Both processes, integration of evidence or belief-update are subsumed under the general term ‘integrating information’.
I note from the outset that the van Ravenzwaaij et al. article consists of both a theoretical study of optimal behavior (in both homogeneous and heterogeneous biased environments) and an empirical study of actual behavior. Here, I question only the conclusions with respect to the theoretical analysis of the heterogeneous environments.
Note that this residual time components is subsumed in the term tres in the definition of reward rate, see Eq. (1) below.
Throughout the paper I follow the customary convention and fix s = 0.1. This practice reflects the assumption that the noise level is identical across all conditions (difficulty levels, in the current case). However, mathematically speaking, this procedure poses an ‘over-constraint’ on the model (Donkin, Brown & Heathcote, 2009)
This in effect assumes no integration costs. When there are such costs, integration may terminate prior to the interrogation time T (see Drugowitsch et al., 2012). See also Footnote 2.
Here I make the assumption that the dynamic bias is time-constant and hence that the integration bias, \( \frac{a}{2}-z+{v}_ct, \) builds up linearly during the trial. More generally, v c could be a function of time, but I do not consider this possibility here.
Some formulations of the reward rate assume that errors are followed with negative-reward penalties and/or an increase in the inter-trial temporal interval. Here, for simplicity, I assume that no such penalties exist.
For a homogeneous environment, no threshold adjustment is necessary according to the dynamic-programming based decision rule (See Drugowitsch et al., 2012) and hence both the DDM and the SPRT are optimal.
These studies used a different criterion for optimality namely, the Bayes Risk (BR) which minimizes a weighted sum of the mean RT and Error rate (Wald & Wolfowitz, 1948).
For example, if an observer adjusts his or her threshold every 100 ms during a two second interval then 20 parameters are required to describe the adjustment procedure.
Typically, when the Simplex algorithm converges, the search simplex has shrunk to a small diameter. By starting a novel Simplex iteration one increases the diameter of the search simplex. Thus, the next iteration can converge to a different point.
References
Balci, F., Simen, P., Niyogi, R., Saxe, A., Hughes, J. A., Holmes, P., & Cohen, J. D. (2011). Acquisition of decision making criteria: Reward rate ultimately beats accuracy. Attention, Perception, & Psychophysics, 73(2), 640–657.
Bitzer, S., Park, H., Blankenburg, F., & Kiebel, S. J. (2014). Perceptual decision making: Drift-diffusion model is equivalent to a Bayesian model. Frontiers in Human Neuroscience, 8.
Bogacz, R. (2009) Optimal decision making theories. In J. C. Dreher & L. Tremblay (Eds.), Handbook of reward and decision making. Elsevier.
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113(4), 700–765.
Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178.
Busemeyer, J. R., & Myung, I. J. (1992). An adaptive approach to human decision making: Learning theory and human performance. Journal of Experimental Psychology: General, 121, 177–194.
Cisek, P., Puskas, G. A., & El-Murr, S. (2009). Decisions in changing conditions: The urgency-gating model. The Journal of Neuroscience, 29(37), 11560–11571.
Deneve, S. (2012). Making decisions with unknown sensory reliability. Frontiers in Neuroscience, 6.
Diederich, A., & Busemeyer, J. R. (2006). Modeling the effects of payoff on response bias in a perceptual discrimination task: Bound-change, drift-rate-change, or two-stage-processing hypothesis. Perception & Psychophysics, 68(2), 194–207.
Donkin, C., Brown, S. D., & Heathcote, A. (2009). The overconstraint of response time models: Rethinking the scaling problem. Psychonomic Bulletin & Review, 16(6), 1129–1135.
Drugowitsch, J., Moreno-Bote, R., Churchland, A. K., Shadlen, M. N., & Pouget, A. (2012). The cost of accumulating evidence in perceptual decision making. The Journal of Neuroscience, 32(11), 3612–3628.
Edwards, W. (1965). Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing. Journal of Mathematical Psychology, 2(2), 312–329.
Geisler, W. S. (2003). Ideal observer analysis. In L. Chalupa & J. Werner (Eds.), The visual neurosciences (pp. 825–837). Cambridge, MA: MIT Press.
Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions and reward. Neuron, 36, 299–308.
Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.
Hanks, T. D., Mazurek, M. E., Kiani, R., Hopp, E., & Shadlen, M. N. (2011). Elapsed decision time affects the weighting of prior probability in a perceptual decision task. The Journal of Neuroscience, 31(17), 6339–6352.
Kiani, R., & Shadlen, M. N. (2009). Representation of confidence associated with a decision by neurons in the parietal cortex. Science, 324(5928), 759–764.
Laming, D. R. J. (1968). Information theory of choice-reaction times. London: Academic Press.
Mozer, M. C., Kinoshita, S., & Davis, C. (2004). Control of response initiation: Mechanisms of adaptation to recent experience. In M. Hahn & S. C. Stoness (Eds.), Proceedings of the Twenty Sixth Annual Conference of the Cognitive Science Society (pp. 981–986). Hillsdale, NJ: Erlbaum.
Mulder, M. J., Wagenmakers, E. J., Ratcliff, R., Boekel, W., & Forstmann, B. U. (2012). Bias in the brain: a diffusion model analysis of prior probability and potential payoff. The Journal of Neuroscience, 32(7), 2335–2343.
Myung, I. J., & Busemeyer, J. R. (1989). Criterion learning in a deferred decision making task. American Journal of Psychology, 102, 1–16.
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313.
Norris, D. (2006). The Bayesian Reader: Explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113(2), 327–357.
Norris, D. (2009). Putting it all together: A unified account of word recognition and reaction-time distributions. Psychological Review, 116(1), 207–219.
Rao, R. P. (2004). Bayesian computation in recurrent neural circuits. Neural Computation, 16, 1–38.
Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59.
Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of the lexical decision task. Psychological Review, 111, 159–182.
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873–922.
Ratcliff, R., & Rouder, J. N. (2000). A diffusion model account of masking in two-choice letter identification. Journal of Experimental Psychology: Human Perception and Performance, 26(1), 127–140.
Thura, D., Beauregard-Racine, J., Fradet, C. W., & Cisek, P. (2012). Decision making by urgency gating: theory and experimental support. Journal of Neurophysiology, 108(11), 2912–2930.
Turner, B. M., Van Zandt, T., & Brown, S. (2011). A dynamic stimulus-driven model of signal detection. Psychological Review, 118(4), 583–613.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131.
Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108(3), 550–592.
van Ravenzwaaij, D., Mulder, M. J., Tuerlinckx, F., & Wagenmakers, E. J. (2012). Do the dynamics of prior information depend on task context? An analysis of optimal performance and an empirical test. Frontiers in Psychology, 3.
Vickers, D. (1979). Decision processes in visual perception. New York: Academic Press.
Wagenmakers, E. J. (2009). Methodological and empirical developments for the Ratcliff diffusion model of response times and accuracy. European Journal of Cognitive Psychology, 21(5), 641–671.
Wagenmakers, E. J., Ratcliff, R., Gomez, P., & McKoon, G. (2008). A diffusion model account of criterion shifts in the lexical decision task. Journal of Memory and Language, 58, 140–159.
Wald, A. (1947). Sequential analysis. New York: Wiley.
Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 19(3), 326–339.
Author Notes
The author thanks Marius Usher (MU), Eric-Jan Wagenmakers (EJW), and Don van Ravenzwaaij (DVR) for helpful discussions, and also EJW, DVR, and Andrew Heathcote for providing excellent suggestions during the revision process, and finally, Konstantinos Tsetsos for comments about an earlier version of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Relationship between Wald and RR optimality
In this appendix, I show that WO and RRO are equivalent. This means that a decision rule which achieves one form of optimality achieves also the other, as specified below.
An RRO decision rule is also WO
In this section, I show that if a decision rule maximizes the reward-rate then it is also a Wald optimal strategy (Bogacz et al., 2006; Bogacz, 2009). Denote by \( \tilde{D} \) a decision rule that achieves a maximal reward rate for a given environment and for a given value of mean residual time t res . The accuracy and mean decision times for \( \tilde{D} \) are AC RR (t res ) and t d,RR (t res ) respectively. I argue that \( \tilde{D} \) is also Wald optimal in that AC RR (t res ) must be the maximal possible accuracy (among all decision rules) with mean decision time t d,RR (t res ). That is:
To see this, note that the reward rate that a WO decision rule, with mean decision time t d,RR (t res ) achieves is given by \( \frac{A{C}_{Wald}\left({t}_{d, RR}\left({t}_{res}\right)\right)}{t_{d, RR}\left({t}_{res}\right)+{t}_{res}} \). Since the Wald rule provide the maximal accuracy for a given mean decision time,
Thus the reward rate of the Wald optimal rule is:
In words, the RR of the Wald optimal rule (with mean decision time t d,RR (t res )) is at least as large as the reward rate obtained by \( \tilde{D} \). However, by definition \( \tilde{D} \) is the RR-optimal rule and therefore an equality must hold in Eq. A3 and thus in Eq. A2 as well. Hence Eq. A1 is satisfied.
WO decision rule is also RRO
In the current section I show that, given a target mean value of the decision time t 0 there exists a positive mean residual time t * res for which the WO rule (with mean decision time t 0 and its associated Wald-optimal accuracy AC Wald (t 0)) maximizes the reward rate. To simplify notation, henceforth I denote the mean decision time by t (instead of t d ) and the maximal accuracy by A(t) (instead of AC Wald (t d )
The function A(t) has several important properties. First, by taking 0 decision time an observer can achieve a maximal accuracy of max{p 1, 1 − p 1} where p 1 is the a-priori probability that option ‘1’ (rather than ‘2’) is correct. Without loss of generality we can assume that p 1 ≥ 0.5 and hence A(0) = p 1.
Second, A(t) is a monotonically increasing function of t. Indeed, if t 1 > t 2 then one potential decision rule with mean decision time t 1 is to adopt the WO decision rule for a mean decision time t 2 and then ‘sit and wait’ for a duration of t 1 − t 2 before issuing a decision. This will yield accuracy of A(t 2). Of course, waiting without integrating information is suboptimal because observers can collect further information which will facilitate accuracy. Therefore A(t 1) > A(t 2).
Third, A(t) is a concave function of t. This means that for all t 1, t 2 and λ ∈ [0, 1]:
Indeed, Consider the following ‘mixture’ decision rule: With probability λ ∈ [0, 1] the observer follows the Wald optimal decision rule for mean decision time t 1 and otherwise (i.e. with probability (1 − λ)), the observer follows the Wald optimal decision rule for mean decision time t 2. This mixture rule provides accuracy that is equal to the right hand side of Eq. A4 and its mean decision time is λt 1 + (1 − λ)t 2. By definition, A Wald-optimal decision rule for mean decision time λt 1 + (1 − λ)t 2 will provide at least the same accuracy, and so Eq. A4 is satisfied. If the Wald optimal decision rule will be more efficient than the mixture rule then strict concavity is obtained (i.e. in A4. we will have strict inequality).
Assuming A(t) is a differentiable function, the monotonicity and concavity properties translate too
Consider next the reward rate, \( RR=\frac{ AC}{t+{t}_{res}} \). We already know, from the previous subsection, that any rule that maximizes the RR is a Wald optimal strategy. Therefore, the optimal reward rate is achieved by maximizing with respect to t the reward function:
Taking derivatives with respect to t we find that
And the condition for stationary points is thus:
Consider a target mean decision time t 0 > 0. I next show that there exists some positive t * res for which t 0 is a stationary point. Indeed, defining
We note that t 0 solves Eq. A8. So it remains to be seen that t * res is indeed positive:
Noting that A ′ (τ) is a decreasing function of τ (A ′ ′ (τ) < 0 according to Eq. 5), we obtained that \( {\displaystyle {\int}_0^{t_0}A^{\prime}\left(\tau \right) d\tau >{t}_0A^{\prime}\left({t}_0\right)} \). Thus, continuing Eq. A10,
Next, I show that the stationary point t 0 is a maximum point. Indeed, taking another derivative from Eq. A7 we obtain that:
Evaluating Eq. A12 at the stationary point t 0 simplifies to \( R^{\prime\prime}\left({t}_0\right)=\frac{A^{\prime\prime}\left({t}_0\right)}{t_0+{t}_{res}}<0 \) which shows that t 0 is a local maximum point of the reward rate.
Next, I show that t 0 is in fact a global maximum. If we assume it is not, then Eq. A8 has another root (stationary point) at the global maximum. Thus, Eq. A8 has at least two different roots. But this means that the derivative of Eq. A8 must also have a root. Thus there exists a positive t such that A ′ ′ (t)(t + t * res ) = 0, which is impossible because A ′ ′ (t) < 0 and t + t * res > 0. Therefore, t 0 must be a global maximum.
To conclude, given a target mean decision time t 0, I found a mean positive value of the residual time, t * res (Eq. A11) for which the WO decision rule (with mean decision time t 0 and accuracy A(t 0)) is RR-optimal.
Appendix B: SPRT in Gaussian environments
In the current Appendix, I extend the SPRT model to Gaussian heterogeneous environments. I assume that the on each trial, the difficulty level is drawn from a Gaussian distribution. The unique source of uncertainty concerns which of the two response alternatives is correct. On each temporal interval dt a new independent perceptual sample is generated and is distributed ~ N(vdt, s 2 dt) where s 2 is the variance rate, and v is the drift rate for the current trial. The participant needs to decide in favor of one of two hypotheses:
-
H 0: The current drift rate v was generated from a N(v 0, η 2) distribution or:
-
H 1: The current drift rate v was generated from a N(−v 0, η 2) distribution.
Importantly, the positive parameters v 0 and η, which corresponds to the mean and to the standard deviation of difficulty distribution respectively are known.
Denote by \( \tilde{x}(t) \)and x(t) respectively the entire stream of accumulated perceptual evidence and the total accumulated evidence obtained by time t (thus x(t) is simply the state of \( \tilde{x}(t) \) at time t). According to Bayes’ rule the posterior odds is the product of the prior odds and the Bayes factor (BF):
Let us next focus on the numerator term \( P\left(\tilde{x}(t)\Big|{H}_0\right) \). It can be shown (see Drugowitsch et al., 2012, Eq. 10) that conditional on a drift rate v:
Where \( D\left(\tilde{x}(t)\right) \) depends on the specific stream \( \tilde{x}(t) \) but not on the drift rate v.
Reading the following derivation, throughout the section proportionality (∝) denotes equality up to a multiplicative term that is invariant with respect to v 0 (and its sign) but may depend on the specific stream \( \tilde{x}(t) \). Note that, \( P\left(\tilde{x}(t)\Big|{H}_0\right) \) is obtained by integrating \( P\left(\tilde{x}(t)\Big|v\right) \) over the drift distribution. Thus:
Examining the integrand in the final term we note that it is proportional to the probability density function of a Normal distribution with mean = \( \frac{\eta^2x(t)+{s}^2{v}_0}{\left({s}^2+{\eta}^2t\right)} \) and variance = \( \frac{s^2{\eta}^2}{s^2+{\eta}^2t} \). Therefore this integral is independent of v 0 so we obtain:
We can now derive the term \( P\left(\tilde{x}(t)\Big|{H}_1\right) \) by replacing in Eq. B4 v 0 by − v 0 to obtain:
Equations A4–A5 share the same proportion factor, hence returning to the BF, it follows that
Finally, taking logarithms of Eq. B1 and using Eq. B6, we obtain:
where π and \( \tilde{\uppi} \) are the log-prior and log-posterior odds respectively.
In SPRT integration of perceptual evidence occurs until the posterior reaches a target level \( \pm \alpha, \alpha \equiv \ln \left(\frac{A}{1-A}\right) \), where A is a target level of accuracy. From Eq. B7 it follows that integration occurs until
This means that a diffuser (with starting point x(0) = z) will terminate all trials with the same posteriors level of ± α if the time-variant response thresholds are set at distances \( -\frac{s^2\left(\alpha +\pi \right)}{2{v}_0}-\frac{\eta^2\left(\alpha +\pi \right)}{2{v}_0}t \) (the lower threshold) and \( \frac{s^2\left(\alpha -\pi \right)}{2{v}_0}+\frac{\eta^2\left(\alpha -\pi \right)}{2{v}_0}\ t \) (the upper threshold) from the starting point. Note that the lower and upper response thresholds respectively are linearly decreasing and increasing functions of time and that the boundary separation increases with rate \( \frac{\eta^2\alpha }{v_0} \). Additionally, in the particular case that the environment is unbiased (i.e. π = 0) both thresholds change with equal absolute rates \( \frac{\eta^2\alpha }{2{v}_0} \) but in opposite direction.
Another implication of Eq. B7 is that in the DDM, where integration stops when either of the (constant) thresholds is reached (located in distances − z or a − z from the starting point) then the log odds are: \( \tilde{\uppi}=\pi -\frac{2z{v}_0}{\left({s}^2+{\eta}^2t\right)} \), for the lower boundary and \( \tilde{\uppi}=\frac{2\left(a-z\right){v}_0}{\left({s}^2+{\eta}^2t\right)}+\pi \), for the other threshold. Recall, that the logs odds are formulated in terms of the ‘upper’ (H 0) choice-alternative relative to the ‘lower’ (H 1) choice-alternative. If instead, the log odds are formulated with respect to the chosen relative to the non-chosen alternative, the log-odds for the lower threshold is obtained by flipping the sign: \( \tilde{\uppi}=\frac{2z{v}_0}{\left({s}^2+{\eta}^2t\right)}-\pi \). Note that the log odds for both alternatives decrease monotonically as a function of t, tending towards the prior odds (± π) as t → ∞.
Appendix C: Simulation methods
In this appendix I describe the method I used for finding the optimal triplet (a, z, v c ) for the DDM in biased heterogeneous environments. For a single difficulty level v, the accuracy and the MRT are given by (c.f. Eq. 8–12 in van Ravenzwaaij et al. 2012):
When the environment is heterogeneous so that drift is distributed ~ f(v) I found the accuracy and MRT by integrating the corresponding terms over the distribution f. The two cases that are explored in the paper are a Gaussian and a discrete f with two equi-probable drift rates. For the Gaussian case the integration was performed by numerical integration and the discrete integration was handled by arithmetic averaging.
The optimization problem can now be formulated as
where A is the desired accuracy level. Note that for the optimal triplet the constraint is always satisfied with equality, otherwise a sufficiently slight reduction to the threshold separation a would diminish MRT while maintaining accuracy above the desired level, contradicting the optimality of the triplet. Defining:
the optimal triplet is defined by (a, z, v c ) = argminF(a, z, v c , β, s).
I took considerable measures to avoid local minima in the search for the triplet that minimizes F. This search was conducted with a combination of genetic algorithms and the iterative Nelder-Mead (Nelder & Mead, 1965) Simplex method, (implanted by the routines “ga,” “fminsearch” available in Mathwork’s MATLAB). I repeated the following steps 10,000 times. First, I minimized the objective function by running the genetic algorithm. The output triplet was then fed as the starting point for the simplex algorithms. The simplex algorithm in turn was iterated several times; each iteration started with the parameters obtained from the termination of the previous iteration.Footnote 14 This was repeated until the objective function improved by less than 1e-5 on two consecutive runs. The triplet that minimized the objective function more than 10,000 iterations of a genetic algorithm followed by a sequence of simplex iterations was considered to be the optimal triplet.
Rights and permissions
About this article
Cite this article
Moran, R. Optimal decision making in heterogeneous and biased environments. Psychon Bull Rev 22, 38–53 (2015). https://doi.org/10.3758/s13423-014-0669-3
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13423-014-0669-3