In silico development and validation of Bayesian methods for optimizing deep brain stimulation to enhance cognitive control

Sumedh S Nagrale; Ali Yousefi; Theoden I Netoff; Alik S Widge

doi:10.1088/1741-2552/acd0d5

1. Introduction

Deep brain stimulation (DBS) is a well established, clinically successful surgical treatment for medication-refractory movement disorders [1]. It is being explored as a treatment for mental disorders, including obsessive compulsive disorder (OCD), major depression (MDD), addiction, and schizophrenia [2–5], in cases where conventional therapies such as medication and psychotherapy have failed. The ventral internal capsule/ventral striatum (VCVS) is a particularly promising DBS target. It is approved by the US Food and Drug Administration for treatment of OCD and is the only psychiatric DBS target to meet its endpoint in a randomized controlled trial for MDD [6]. At the same time, VCVS DBS consistently shows a 60%–70% response rate in large open label studies [7–9], meaning that roughly 1/3 of patients undergo this invasive intervention for little benefit, and another 1/3 are technically responders but still have life-impairing residual symptoms.

These difficulties arise in part because the success of VCVS DBS depends upon proper programming. The only successful VCVS DBS trial in MDD used an extended programming period, with a year of open-label treatment and adjustment before blinded testing [6]. DBS programming is challenging regardless of the target or disorder. The large number of free parameters (which contacts are active at which polarity, waveform amplitude, pulse width, and stimulation frequency) creates a combinatorially explosive space. Worse, brain responses to stimulation are often non-linear [10–12]. In tremor-related disorders (the primary clinical application for DBS), this challenge is partly overcome by the speed of response. Effective settings cause tremor changes within seconds, providing immediate feedback to help a clinician identify reasonable parameters. In mental disorders, while there are immediate subjective effects of some setting changes [13, 14], these do not consistently predict long-term outcomes [2, 15, 16]. Rather, programming is usually based on patient self-report of symptoms over days to weeks [8, 15, 16]. Self report has a poor signal-to-noise ratio, as patients' symptoms can be heavily influenced by external events (e.g. mood and compulsive symptoms being dramatically worsened by the recent pandemic [17]) and the subjective stimulation response can depend on those externally-influenced states [18]. Although such state-dependent effects may be clinically relevant, they also confound clinicians' attempts to identify reliable and consistent effects. The long latency between setting changes makes it much more difficult to fully explore the DBS parameter space. Thus, much of the apparent non-response to VCVS DBS, across disorders, may represent a target engagement problem—some patients may never receive stimulation that adequately affects the brain circuits mediating clinical response [2, 19, 20].

Recent work, at VCVS and other targets [21–23], argues that target engagement could be resolved through imaging. In this view, the correct contact and other settings can be identified based on electric field modeling, with the goal being to electrically engage specific white matter bundles. Follow-on studies, however, found that imaging-based target engagement does not fully predict or determine clinical response [24, 25]. This suggests that, beyond anatomy, the programmed parameters need to cause a specific physiologic response in the target and its connected network to achieve clinical response. In theory, closed loop sensing, e.g. of the local field potential (LFP) at the DBS target or a connected structure, might allow monitoring of that physiologic response [26–29]. This approach has been successful in movement disorders [30–32]. The challenge is that, again owing to the poor signal-to-noise ratio of self report and lack of quantitative, predictive behavioral assays and reliable neural or other biomarkers, it has been difficult to identify reliable physiologic markers for DBS tuning in non-motor applications [27, 28, 33]. It may be possible to derive such markers by extensive data collection and patient-specific model fitting [18, 26, 34], but the analytic pipelines required for such modeling may not scale well to clinical practice.

A more scalable approach might be to measure physiologic engagement indirectly, through effects on behavior. Changes in clinically relevant circuits should be reflected in relatively rapid changes in brain functions linked to those circuits [35]. Just as DBS for tremor can be rapidly titrated by monitoring tremor in-office, it should be possible to continuously monitor an objective behavior (e.g. performance on a psychophysical task) and titrate stimulation to optimize that behavior. We recently demonstrated a prototype of that approach. In two human studies, we showed that DBS-frequency stimulation (∼130 Hz) of VCVS enhances cognitive control [36, 37]. Cognitive control is the ability to withhold a prepotent response in favor of a more adaptive, long-term-goal-oriented response. It is an attractive target for measurement and enhancement with DBS, since cognitive control deficits are common across a range of mental disorders [38–40]. A method for engaging cognitive control circuits could thus make VCVS DBS applicable for a wide range of clinical problems. We showed that cognitive control could be measured through DBS-induced changes in reaction time (RT) on a standard cognitive conflict task, the multi source interference task (MSIT) [36, 37]. In both prior pilot studies, the participants who experienced stimulation-induced control enhancement also reported improved depressive and anxious symptoms. Most importantly, these improvements were sensitive to stimulation location within the VCVS. We saw generally larger effects with right-sided stimulation in the more dorsal capsule, but the best location varied between patients. Further, the behavioral effects of stimulation began and ended within a few seconds of stimulation changes [37]. It should thus be possible to use these rapid-onset effects to identify when the optimal contact is being stimulated. The challenge is that because the RT effects are subtle (5%–10% of the overall scale), they cannot be immediately detected by a human operator, nor can the effects of different contacts be quickly distinguished. These subtle changes can be detected by a monitoring algorithm, if it uses a filtering/smoothing approach to ignore stochastic RT variability [37]. The missing ingredient is an approach for using that monitoring to quickly but reliably identify the optimal stimulation site and other parameters for engaging cognitive control circuitry.

In other DBS applications, Bayesian approaches have been suggested and demonstrated to be an useful tool to optimize DBS settings [41–45] in a closed loop neuromodulation [41]. Used Bayesian optimization to determine the best DBS parameters that reduced beta power in a basal ganglia-thalamocortical computational model of Parkinson's disease (PD). Here, Bayesian optimization, at its heart, relies on a Gaussian process fitted from the observed data. Similar in silico models were developed in [42] from intraoperative cortical and motor evoked potential data from PD patients to test optimization algorithms to identify pareto sets for multi-objective optimization problems. They employed two offline methods, fixed grid and random search, as well as two online methods, evolutionary search and Bayesian optimization with expected hypervolume improvement acquisition function. On the ground truth model, a set of stimulation settings were evaluated, and a surrogate model was constructed to estimate the whole stimulation parameter setting. They found that Bayesian optimization outperformed other methods [45]. Shows a semi-automated technique for optimizing DBS settings in PD patients in order to minimize rigidity. Based on the participant's expressed preferences for stimulation settings [46], used Bayesian preference learning to find individualized optimal stimulation patterns. Recently, Bayesian algorithms have been demonstrated to be safe and viable options in humans, as in [43, 44], with additional constraint to the objective function to be optimized for PD. Generalizing beyond movement disorders, we recently demonstrated Bayesian optimization in neurostimulation for pain [47]. By tracking patient self-report of pain relief, a Bayesian algorithm was able to identify a program to control two interacting cortical stimulation leads.

Here, we combine these ideas to present an approach for selecting a contact for psychiatric DBS, based on Bayesian optimization. Bayesian optimizers maintain a prior estimate of the relative effects of different parameters (e.g. of different contacts, or of changes in amplitude within a contact). They then select a next contact to test, based on an information gain function that varies with the specific optimizer (see examples in Methods). Based on the observed behavioral response, the prior estimate is updated (posterior distribution calculation), then the next test value is selected. The process repeats until a convergence criterion is met. Using a behavioral simulator derived from our prior empirical data, we show that a specific optimizer (upper confidence bound (UCB1)) and rate of change in stimulation parameters can reliably converge to a global optimum, the vast majority of the time, in a clinically feasible amount of testing.

2. Methods

2.1. Overall design

Testing a wide range of optimization strategies directly in patients undergoing VCVS DBS is impractical. It would require thousands of task trials across hundreds of runs, which is unlikely to be tolerated by most patients. Thus, we developed a simulation testbed for our cognitive control paradigm. The simulation is meant to model a patient continuously performing the MSIT (or a similar cognitive control assay), while stimulation is adjusted to improve task performance (to minimize RTs without inducing errors). The overall system (figure 1(A)) includes a data generator, a sensor, and an optimizer. The generator models a patient performing the MSIT and emitting behavioral data, in the form of RT values that are simultaneously influenced by stimulation, task factors, and noise processes. The sensor models the real-time estimation we would then need to do with a patient, analogous to the processing in [37]. It attempts to filter out the noise and task processes to identify stimulation-driven RT changes, without explicit information about the current stimulation settings. This smoothed derivative of RT (sometimes referred to as a 'cognitive state') is then used by the optimization algorithm to update its distribution estimate. Based on the update, the optimizer then modifies the simulated stimulation applied to the generator. This process continues until the algorithm converges.

**Figure 1.** System design. (A) The generator model imitates a DBS recipient. A sensor model infers the generator's internal/unobservable cognitive state based on the observable reaction time (RT). The optimization algorithm then takes the cognitive state estimate as input and selects a new stimulation regime to be applied to the generator. Over time, the algorithms converge and the optimal or near optimal location is determined. (B) The internal parameters of the generator are inferred by model fitting to data from actual participants who performed MSIT and received stimulation. (C) Data is simulated using the generator model and a gamma distribution to approximate human RTs. New simulated patients can be created by slightly modifying the internal generator parameters or initial state ('model tuning'). (D) The sensor model is derived by fitting a new model to a block of simulated RT data from the generator. Once derived, it continuously estimates the generator's internal state trial-by-trial through a Kalman-like filtering process [48].
Download figure:
Standard image High-resolution image

In the current work, we focused on the problem described in [37], optimizing the stimulation location (choosing among available DBS contacts or combinations of contacts), assuming that other parameters such as amplitude, pulse width, and frequency are already reasonable. We had empirical data to constrain this problem (see below), making it a logical starting point.

2.2. Empirical data

The dataset includes six participants with long-standing pharmaco-resistant epilepsy. They were reported as participants 8–13 in [37]. We relabel them here as S1–S6. Participants voluntarily enrolled with fully informed consent (obtained by a member of the study staff who was not the participant's primary clinician) in accordance with guidelines and procedures approved by the local institutional review boards at Partners Healthcare (Massachusetts General Hospital), with secondary review from the US Army Human Research Protections Office.

Participants performed the MSIT with simultaneous behavior and LFP recordings. Stimuli for the MSIT were presented on a computer screen with either Presentation software (neurobehavioral systems) or Psychophysics Toolbox. Each participant performed 1–3 sessions, and each session consisted of multiple blocks of 32 or 64 trials, (ranging between 5–10 blocks) with brief rest periods (median break time of 5.25 min) in between the blocks. They were instructed to be as fast and as accurate as possible. Median success rates of 100% ± 2.47% and 97.1 ± 5.52% were reported during congruent and incongruent conditions, respectively [37]. The task contained a roughly equal number of congruent and incongruent trials. Stimuli were presented for 1.75 s, with an inter-trial interval randomly jittered within 2–4 s. Simultaneously, LFPs were recorded using implanted depth electrodes; these data were not considered in the current study. The electrode locations were determined by a team of caregivers solely on clinical grounds and were not in any way modified for the research.

Only one site was stimulated during each MSIT block. Stimulation was a 600 ms long train of symmetric biphasic (charge balanced) 2–4 mA, with 90 μs square pulses at a frequency of 130 Hz, delivered using a Cerestim 96 (Blackrock Instruments). Stimulation was delivered through a neighboring pair of contacts on a single depth electrode (bipolar) with parameters set manually by the experimenter. Depth electrodes (Ad-Tech Medical or PMT) had a diameter of 0.8–1.0 mm and had 8–16 contacts (platinum/iridium), each 1–2.4 mm long. The stimulations were triggered by a separate PC that was either delivering or monitoring the task or behavioral stimuli. All stimulations were delivered at the image onset to influence a decision-making process that begins with that onset. Participants S1–S6 completed 343, 378, 320, 383, and 440 trials, respectively, with 233, 254, 192, 255, 224, and 249 trials of stimulation and 110, 124, 128, 128, 159, and 191 trials of non-stimulation. S1-S6 performed 6, 6, 5, 8, 9, and 10 blocks respectively. All stimulations were given to varying sites within the internal capsule as described in [37] where sites were chosen to have one site each in the ventral and the dorsal IC. Up to four sites were tested per patient, with 1–3 blocks per tested site. All stimulation was administered at the image's onset in order to impact a decision-making process that began at that time. Prior to task-linked stimulation, all participants were tested at 1, 2, and 4 mA for 1 s, repeated five times, with 5–10 s between each 130 Hz pulse train. Participants were told that stimulation was active and asked to describe any acute perceptions. The absence of epileptiform after-discharges was confirmed, ensuring that the individuals could not detect stimulation, such as odd feelings. If participants reported any sensation (for example, tactile sensations in the limbs or head), task-linked stimulation was reduced to the next lowest intensity. Note that although this experiment used intermittent IC stimulation, we have previously shown that the same effects hold with continuous IC stimulation as would be used in clinical DBS [36].

2.3. Data model

The MSIT is described extensively in [36, 37]. In brief, it is a standard cognitive control/conflict paradigm where roughly 50% of trials contain distractors that induce pre-potent (incorrect) responses. These are referred to as 'incongruent', 'interference', or 'high conflict' trials. Successful task performance requires suppression of those responses, both through reactive processes (e.g. stopping an incorrect action once started) and proactive processes (e.g. engaging attentional systems to detect impending errors). In this context, task RTs measure a participant's ability to apply cognitive control—faster RTs without errors imply more efficient control systems.

Task RTs are positive random variables with right-skewed distributions [49, 50]. In particular, RTs are often well described by Gamma distributions, with parameters that vary depending on the trial type (e.g. the high vs. low conflict trials in MSIT). Thus, we model the RT of a participant performing MSIT as described in [50],

$\begin{align}{Z_{{\text{RT}}}}\sim \Gamma \left( {\mu ,v} \right) + \alpha \end{align} \tag{ 1.1 }$

where ZRT is the observed RT, α is the offset term representing the lower bound of the RT ( $\alpha \geqslant 0$ ), 1/v is the dispersion observed in the RT, and μ is the mean RT. Here, the rate parameter for the Gamma distribution will be, $\beta = v/\mu$ and its shape parameter is defined by, $\alpha = v$ .

That mean RT varies both systematically and stochastically. Systematically, it is higher on high conflict trials. Stochastically, it has both random variation (captured in $v$ ) and slow change over time. The latter may be due to learning processes (participants get slightly better at the task over the first 100–200 trials) or, more commonly, due to the effect of brain stimulation that alters the participant's 'baseline' RT. As in [37], we express this in terms of two unobservable 'cognitive states', ${x_{{\text{base}}}}$ and ${x_{{\text{conflict}}}}$ , connected to the RT by a log link function [50, 51],

$\begin{align}{\text{ln(}}\mu ) = {b_0} + {b_1}{x_{{\text{base}}}} + {b_2}{I_{{\text{conflict}}}}{x_{{\text{conflict}}}}^{}\end{align} \tag{ 1.2 }$

${x_{{\text{base}}}}$ is the baseline cognitive state, which represents the mean RT when no conflict is present. ${x_{{\text{conflict}}}}$ is the mean amount of increase in RT during high conflict. ${I_{{\text{conflict}}}}$ is an indicator variable representing presence or absence of conflict in a given trial of MSIT. b₀, b₁, b₂ are then weights (effectively, regression coefficients) that relate cognitive states to the distribution mean μ. b₀ provides a bound on RT, whereas b₁, b₂ are scaling factors. We have demonstrated this model as well suited to MSIT datasets in multiple prior papers [37, 48, 51].

The temporal evolution of cognitive states at trial k is defined by a first order autoregressive model, AR(1), as in [50, 51].

$\begin{align}{x_{{\text{base}},k + 1}} = {a_1}{x_{{\text{base}},k}} + {b_{in1}}{u_m} + {w_{1,k}}\end{align} \tag{ 2.1 }$

$\begin{align}{x_{{\text{conflict}},k + 1}} = {a_2}{x_{{\text{conflict}},k}} + {b_{in2}}{u_m} + {w_{2,k}}\end{align} \tag{ 2.2 }$

$\begin{equation*}{w_{1,k}}\sim N\left( {0,{\sigma _1}} \right)\end{equation*}$

$\begin{equation*}{w_{2,k}}\sim N\left( {0,{\sigma _2}} \right).\end{equation*}$

The next states ${x_{{\text{base}},k + 1}}$ , ${x_{{\text{conflict}},k + 1}}$ are dependent on the current states ${x_{{\text{base}},k}}$ , ${x_{{\text{conflict}},k}}$ , the input ${u_m}_{}$ (which represents the presence of applied DBS at one of the m stimulation sites), and the Gaussian state noise processes $w$ 1, $k$ , $w$ 2, $w$ with zero mean and variance σ1, σ2 respectively.

Here, we simulated a case analogous to our empirical experiments in [37], where we tested different contacts along a cylindrical lead while holding other stimulation parameters (amplitude, pulse width) constant. In this simulation, the input vector ${u_m}$ has length equal to the number of available contacts, all of which are assumed to be stimulated in monopolar mode (as in a clinical monopolar survey). The elements of u are 1 when a given contact is active, and 0 when it is not. The weighting vectors ${b_{in1}},{b_{in2}}$ then describe how stimulation alters the state variables. Note that in this formulation, effective DBS changes these underlying generating states, but does not act directly on the mean RT $\mu$ . Thus, when a new stimulation regime is applied, the states ${x_{{\text{base}}}}$ and ${x_{{\text{conflict}}}}$ will not immediately change but will drift to a new steady state. This is consistent with our observations in [37]. The rate of drift will depend on a and ${b_{{\text{in}}}}$ . As further described below, this influences the design of the optimization approach.

2.4. Generator model

The state space model of (1) and (2) is applied to generate new samples of behavior (MSIT RTs) that model those we would expect to observe from a patient undergoing DBS. The evolution of behavior is represented by stepping (2.1) and (2.2) forward in time. The modeling requires identifying reasonable values for the parameters in these equations.

We perform parameter and state estimation using the COMPASS toolkit [48], which uses a combination of an expectation maximization (EM) algorithm and a Kalman-type filtering/smoothing approach to simultaneously infer model parameters and cognitive states given the observed behavior in a block of trials.

The parameter estimation problem for MSIT as described in (1) and (2) would be

$\begin{align}\phi = \left( {\alpha ,v,{b_0},{b_1},{b_2},{a_1},{a_2},{w_1},{w_2},{b_{in1}},{b_{in2}}} \right).\end{align} \tag{ 3 }$

This forms a complex estimation problem with a large number of unknown parameters. This increases the possibility of converging to a local maximum, resulting in sub optimal estimates. To prevent this, we reduce the parameter estimation search space. First, we temporarily ignore the input weightings ${b_{in1}}$ and ${b_{in2}}$ , by setting them to 0. As described below in section 'Optimizer test with simulated stimulation response surfaces', we will assign new values to these parameters when performing the simulation, making it irrelevant to estimate them from patients' data. Note that we can still use data from patients receiving stimulation (as in [37]) to estimate the model—any drift in ${x_{{\text{base}}}}$ and ${x_{{\text{conflict}}}}$ caused by stimulation is captured by the dispersion 1/v and state Gaussian noise W. For convenience, here the generator models were estimated only from non-stimulation blocks within the empirical dataset.

As a further simplification, we apply no scaling to the cognitive states, which is done by setting scaling parameters ${b_1}$ and ${b_2}$ to 1. Finally, we recognize that the state noise process $W$ is the most challenging free parameter to estimate, as a wide range of values can be consistent with observed data. We therefore assign values to ${\sigma _1}$ and ${\sigma _2}$ that we identify as working well across patients; these are described further in 'Grid search for model noise processes' below. This simplifies the parameter search space to

$\begin{align}{\phi _{{\text{generator}}}} = \left( {\alpha ,v,{b_0}} \right).\end{align} \tag{ 4 }$

The problem (4) is then estimated using the COMPASS EM algorithm. The convergence of the EM algorithm is determined by assessing the maximum likelihood (ML) of the data at each iteration, given the current parameters ${\phi _{generator}}$ . The convergence criterion for EM algorithm is a plateau in the ML change given by (4.1)

$\begin{align}\left| {\frac{{M{L_t} - M{L_{t - 1}}}}{{M{L_{t - 1}}}}} \right| < 0.001.\end{align} \tag{ 4.1 }$

To demonstrate the models' ability to simulate new participants that are very similar to the empirical data, 1000 new models ('generator pool') were estimated based on each participant's data (S1–S6). For each model fit, the initial conditions ${x_0}$ in (2) were selected randomly from a normal distribution N(0,1), autoregressive parameters ${a_{}}$ were selected close to 1 [ ${a_1}$ = 0.9999, ${a_2}$ = 0.9999] and for $W$ values for ${\sigma _1}$ and ${\sigma _2}$ were randomly selected from the scale range [7, 16] and [13, 27] respectively identified in 'Model noise process determination via grid search'.

2.5. Generator validation

To validate the generator model, we computed the Kolmogorov–Smirnov (KS) distance between the empirical distribution of those simulated RTs and the empirical probability distribution function (pdf) of a gamma distribution fit to the simulated RTs (MATLAB fitdist function). A KS distance/test at p > 0.05 would suggest that the generator is producing RT-like, gamma-distributed data. We performed two versions of the simulation, one with model tuning and other without model tuning. Model tuning was performed to demonstrate that, if necessary, the generator model could closely replicate the actual distribution obtained from any given empirical participant. Model tuning was a trial and error method of modifying model parameters A, v, w and $\alpha$ by hand. We simulated both the tuned model and untuned model each for 1000 trials as described above. These simulations randomly generated interference trials by setting ${I_{{\text{conflict}}}}$ to 0 or 1 on each trial. No stimulation effects were included. This was repeated 1000 times for each participant, followed by the KS test as just noted. We report the fraction of simulations that had p > 0.05. Further, a KS distance/test was performed between the simulated RT from tuned models and empirical data to verify if the models can be tuned to generate distribution similar to the empirical data. Again we reported with a fraction of simulation that had p > 0.05.

2.6. Sensor model

In actual patients, we do not have direct access to their underlying cognitive states but infer these by fitting the model of (1) and (2) to the observed RTs. In our simulations, however, the ground truth is known, and we can confirm that our sensor model accurately recovers it. The sensor model fitting problem, analogous to the generator problem, is given by

$\begin{align}{\phi _{{\text{sensor}}}} = \left( {\alpha ,v,{b_0}} \right).\end{align} \tag{ 5 }$

The difference between the sensor and the generator model is the data used for modeling and evaluation. For the generator model, we fit it and tested its deviance against the empirical data from S1–S6. For the sensor model, we used data simulated from a pool of randomly initialized generators. We used the same COMPASS EM algorithm, with convergence rule (4.1), and ${\sigma _1}$ , ${\sigma _2}$ drawn from the scale range [7, 12] and [13, 27]; the origin of this choice is described below in 'Grid search for model noise processes'. Further, we added 0.0625 (equal to scale parameter 8) to both ${\sigma _1}$ and ${\sigma _2}$ . This slight increase in noise covariance is helpful in later modeling steps, when we attempt to capture the effect of stimulation. Because the sensor model does not explicitly represent stimulation, a slightly higher state noise leads to more rapid adaptation when stimulation is applied and the state variable is forced away from its default AR(1) behavior.

For each sensor model, the initial state ${x_0}$ in (2) was randomly selected from a normal distribution N(0,1). To further accommodate a wide range of generators, we set the lower bound $\alpha$ of RT to 0.1. We simulated 1000 fits of a sensor model to a generator model. The generator for each simulation was chosen from the 'generator pool' described above under 'Generator model'. We then evaluated sensor tracking ability on this randomly selected generator model using normalized root mean squared error (NRMSE), as described in 'Sensor tracking validation' below.

2.7. Grid search for model noise processes

For the problem described in (3), a large number of unknown parameters increases the probability of incorrect convergence of the EM algorithm. In initial pilots, we noticed that this was particularly problematic when attempting to estimate the covariance of the state noise process W. (Here, W is a diagonal matrix containing parameters ${\sigma _1},\,{\sigma _2}$ ). This estimation often converged to different values, with diagonal elements between [ ${10^{ - 1}}$ , ${10^{ - 7}}$ ]. All other parameters in $\phi$ would be altered in turn. This wide range of W values is not reasonable on face value, suggesting a need for a constraint. We theorized that there might be acceptable values of W that may work well across all patients, simplifying the problem of (3)–(5).

We tested this theory by conducting a two-dimensional grid search using a logarithmically spaced grid. Grid search was selected above other approaches such as random, gradient descent, or Bayesian search because it performs an exhaustive search for the underlying hyperparameter W. The advantages of the other search methods is faster convergence, but for an offline problem with no time constraints, grid search is the better choice. Further grid search has the advantage of revealing information on the underlying unknown structure of the response surface as a function of W. At each grid point, we set the diagonal elements of W to the specified values, estimated ${\phi _{{\text{generator}}}}$ using COMPASS, then calculated the deviance between observed RTs and those predicted by the model (μ from equation (1)). The logarithmic grid is given by

$\begin{align}{\text{W}} = \left| {\begin{array}{*{20}{c}} {\frac{1}{{{{\sqrt 2 }^{{\text{scal}}{{\text{e}}_1}}}}}}&0 \\ 0&{\frac{1}{{{{\sqrt 2 }^{{\text{scal}}{{\text{e}}_2}}}}}} \end{array}} \right|\end{align} \tag{ 6 }$

where ${\text{scal}}{{\text{e}}_1}\epsilon$ [1, 40] and ${\text{scal}}{{\text{e}}_2}\epsilon$ [1, 40]. The range was selected empirically based on values observed in previous studies [37]. Numerically, ${\sigma _1},{\sigma _2}$ takes values from [9.5367 × 10⁻⁰⁷, 0.707] for logarithmic search. For each point in the grid search, the problem (4) was estimated by fitting to data including both non-stimulation and stimulation trials from the empirical dataset using the COMPASS EM algorithm. We then evaluated the deviance between observed RTs and the one predicted by the model. This fit was performed separately for each dataset for participants S1–S6. It was repeated five times for each participant, using a random ${x_0}$ selected from N(0,1), plus an additional run with initial condition ${x_0}$ set to 0. For this model inference we set ${A_k}$ close to 1 [ ${a_1}$ = 0.9999, ${a_2}$ = 0.9999], $W$ was set based on the current grid point, b was 0 and the overall distribution was assumed to be gamma.

2.8. Sensor tracking validation

In previous reports, we validated sensor models by showing that the residuals of predicted RTs were consistent with a white noise process [37, 51]. Here, because we had access to the ground truth generator internal states, we directly measured whether the sensor model accurately tracked those states. We calculated the NRMSE between the ${x_{{\text{base}}}}$ and ${x_{{\text{conflict}}}}$ values in the generator and those inferred by the sensor. NRMSE is the root mean square error between the generator and sensor states after both variables have been re-scaled multiplicatively to the [−1, 1] interval. Because our optimization is concerned primarily with changes in a state variable relative to its own baseline, and not with the specific numeric value of that variable, NRMSE is a more appropriate metric. It tracks whether the sensor's estimate changes in the same direction and scale as the generator ground truth.

NRMSE was calculated between the sensor model (predicted RT and cognitive states) and the corresponding ground truth from the generator. We selected generators randomly from the generator pool (1000 models from 6 participants as noted above). To ensure that we demonstrated that the sensors tracked well over all assumed noise process values, we enforced a further constraint that ${\sigma _1}$ in the generator be drawn uniformly from the selected scale range of [7, 12]. The generator-sensor models were simulated for 1000 trials each, for 1000 randomly selected generators and sensors. During these trials, interference variable ${I_{{\text{conflict}}}}$ was set randomly on each trial, and the value of this indicator was visible to both sensor and generator.

To illustrate the sensor's timescale for settling to a steady state when stimulation is changed, we also simulated the two-site, high contrast stimulation ${b_{{\text{SS}}}}$ as described in 'Optimizer test with simulated stimulation response surfaces'. This simulation also used 1000 trials, but stimulation was changed every 25 trials between the two elements of ${b_{{\text{SS}}}}$ .

2.9. Bayesian optimization to identify optimal stimulation site

Identifying the optimal stimulation contact on a cylindrical DBS lead, assuming that other parameters are held constant during this survey, is a multi arm bandit (MAB) problem. MABs consist of competing discrete options (here, k independent stimulation sites from available stimulation sites in u), of which one of the options produces the maximum effect (here, reduction in the RT). Common algorithms for solving MABs include a greedy algorithm [52, 53], $\epsilon$ -greedy algorithm [52, 53], UCB1 [52], Bayesian upper confidence bound (Bayes-UCB) [54]and Thompson sampling (TS) [53].

In general, the goal of these algorithms is to minimize the approximate cost function $Q\left( u \right)$ by performing a trade off between exploring (selecting arms with uncertain payoff) and exploiting (selecting the arm currently estimated to have the best payoff). Algorithms differ in the cost functions and the steps they take to perform the trade off between exploration-exploitation, as described below.

2.9.1. Greedy and $\epsilon$ -greedy algorithm

The pure greedy algorithm solely exploits and is a degenerate case of $\epsilon$ -greedy with $\epsilon$ = 0. The $\epsilon$ -greedy algorithm is an improvement to the greedy algorithm which primarily exploits, but also explores a new stimulation site ${u_k}$ randomly with a small probability $\epsilon$ (here selected as 0.1). The cost function $Q\left( u \right)$ is adapted from [52]

$\begin{align} Q\left( u \right) & = \frac{1}{{{N_t}\left( u \right)}}\mathop \sum \limits_{t = 1}^T RT_{}^t \nonumber\\[8pt] {u_k}^t & = \frac{{argmin}}{{{u_k} \in u}}\left( {Q\left( u \right)} \right)\,\,\,p = 1 - \epsilon \nonumber\\[8pt] {u_k}^t & = random\left( u \right)p \geqslant \epsilon \end{align} \tag{ 7 }$

where u is the input vector to (2), with 1 indicating the selected site ${u_k}$ , ${N_t}\left( u \right)$ is the number of times a site is selected, T is the overall number of trials done so far, p is a random probability drawn uniformly from [0,1], and ${u_k}^t$ is the stimulation site k selected at time t based on cost function.

2.9.2. UCB1

UCB1 explores and exploits based on the Hoeffding inequality [52] that accounts for the number of times a particular stimulation site ${u_k}$ has been selected. It maintains an estimate of each site's worst-case performance, with a confidence interval around that estimate that shrinks as the site is selected more. Over time, as all sites have been repeatedly selected, UCB1 becomes less exploratory and more exploitative. The cost function [52] for UCB1 is

$\begin{align} Q\left( u \right) & = {\mu _t}\left( u \right) - \sqrt {\frac{{2ln\left( T \right)}}{{{N_t}\left( u \right)}}} \nonumber\\[8pt] {u_k}^t & = \frac{{{\text{argmin}}}}{{{u_k} \in u}}\left( {Q\left( u \right)} \right) \end{align} \tag{ 8 }$

where ${\mu _t}\left( u \right)$ is the vector of the mean RT of stimulation sites in u.

2.9.3. Bayesian inference and conjugate prior

Bayesian algorithms differ from UCB1 in that UCB1 directly maintains an estimate of the distribution of observed RTs, whereas Bayesian algorithms maintain an estimate of the parameters defining that distribution.

The exploration-exploitation of stimulation sites is based on these belief dynamics about the quality of each available site. This belief is expressed in terms of prior ${\Pi^o}$ and posterior distributions ${\Pi^t}$ . The ${\Pi^o}$ is the probability distribution before the RT is observed at t, whereas the ${\Pi^t}$ is the probability distribution after observation at t. On each trial t, a random hypothetical RT ${h^t}$ is drawn for each stimulation site ${u_k}$ in u, using the prior estimate of the distribution parameters ${\theta _k}$ , and the site with the best draw is selected by (9)

$\begin{align} {h^t}_{} & = X\left( {{\theta _k}} \right){\text{for}}\,{\text{each}}\,k \nonumber\\[4pt] u & = {\text{min}}\left( {{h^t}} \right) .\end{align} \tag{ 9 }$

The Bayesian algorithms assume the RT observed on each trial to be an independent identically distributed draw. With each observation, they perform a posterior update of the parameter belief to obtain a posterior belief ${\Pi^t}$ , following Bayes' rule. This ${\Pi^t}$ then is the prior for the subsequent trial at t + 1.

Prior and posterior distributions are selected from conjugate pairs. For conjugate distributions the prior (conjugate prior) and the posterior distribution fall in the same probability distribution family [53]. Here, the posterior update is performed in conjunction with a rule described in (10), which transforms the observed RT into a reward/payoff metric for the action of selecting stimulation site ${u_k}$ .

For any arbitrary distribution parameters ${\theta _1},$ ${\theta _2} \in {\theta _n}$ , if $R{T_k}^t \geqslant R{T_{val}}$ : update ${\theta _1}$ ; else: update ${\theta _2}$

$\begin{align} {\text{Where }}R{T_{{\text{val}}}} & = R{T^{t - 1}}\,{\text{if }}{u_k}^t \ne {u_k}^{t - 1} \nonumber\\[4pt] & = {\text{ min}}\,{\text{(}}\mu \left( {{u_k}^{^{\prime}}} \right))\,{\text{if }}{u_k}{u_k}^t = {u_k}^{t - 1} \end{align} \tag{ 10 }$

where ${u_k}^{^{\prime}} \subseteq u|{u_{k}} \ne {u_t}$ . (10) states that if the current and previously selected stimulation site are different, then the threshold $R{T_{{\text{val}}}}$ is the previous $R{T^{t - 1}}$ Otherwise, $R{T_{{\text{val}}}}$ is the minimum RT among the mean RTs of the rest of the stimulation sites. The parameters ${\theta _1}$ and ${\theta _2}$ depend on the specific choice of conjugate prior. The relevant distributions and their update rules are expressed in table 1.

Table 1. Conjugate prior and posterior distributions and update rules for Bayesian multi-arm bandit algorithms.

Algorithms	Likelihood	Conjugate prior ${\Pi^0}$	Posterior hyperparameters update ${\Pi^t}$
Algorithms	Likelihood	Conjugate prior ${\Pi^0}$	RT $\geqslant$ X	RT $<$ X
Bayes-UCB	Normal RT ∼ N( $\mu ,{\sigma ^2}$ )	Gamma $\left( {\alpha ,\beta } \right)$	$\alpha = \alpha$ + $\frac{n}{2}$	$\beta = \beta + \frac{{\mathop \sum \nolimits _{t = 1}^T{{(R{T^t} - \mu )}^2}}}{2}$
TS-Bernoulli	Bernoulli ${r_t}\sim Ber\left( p \right)$	Beta ( $\alpha ,\beta )$	$\alpha = \alpha + {r_t}$ ${r_{t}} \subset \left[ {0,1} \right]$	$\beta = \beta + 1 - {r_t}$ ${r_{t}} \subset \left[ {0,1} \right]$
TS-Poisson	Poisson ${r_{t}}\sim Pois\left( \lambda \right)$	Gamma $\left( {\alpha ,\beta } \right)$	$\alpha = \alpha + R{T^t}$	$\beta = \beta + 1 - {r_t}$ ${r_{t}} \subset \left[ {0,1} \right]$
TS-Normal	Normal RT ∼ N( $\mu$ , $\frac{1}{\tau }$ )	Gamma $\left( {\alpha ,\beta } \right)$	$\alpha = \alpha$ + $\frac{n}{2}$	$\beta = \beta + \frac{{\mathop \sum \nolimits _{t = 1}^T{{(R{T^t} - \mu )}^2}}}{2}$
C-TS	Normal RT ∼ N( $\mu ,{\sigma ^2}$ )	Normal ( ${\mu _z}$ , $\frac{1}{{1 + {k_z}}}$ )	${\mu _z} = \frac{{{\mu _z}{k_z} + R{T^t}}}{{{k_{z}} + 1}}$	${k_z} = {k_z} + 1$

2.9.4. Bayesian UCB

Bayes-UCB assumes that the process generating the observed RT is normally distributed, where the parameters mean $\mu$ and variance $\sigma$ for that distribution are unknown. A gamma distribution $\Gamma \left( {\alpha ,\beta } \right)$ is an appropriate choice of conjugate prior ( ${\Pi^o}$ ) for such a normally distributed process [54]. Prior to each trial t, Bayes-UCB estimates $\mu$ and $\sigma$ using the prior shape ( $\alpha$ ) and rate ( $\beta$ ) hyperparameters (11). It then calculates a cost function/hypothetical RT ${h^t}$ to select the best stimulation according to the current belief. Once the actual RT is observed, as per Bayes theorem the prior belief is updated using posterior update ${\Pi^t} = \Gamma \left( {\alpha + \frac{n}{2},\beta + \frac{{\mathop \sum \nolimits _{t = 1}^T{{(R{T_t} - \mu )}^2}}}{2}} \right)$ in combination with (10),

$\begin{align} \mu & = \frac{\alpha }{\beta }\,,\sigma \, = \,\frac{\alpha }{{{\beta ^2}}}, \nonumber\\[6pt] {h^t} & = {\mu _t}^*\left( u \right) - c*\frac{\sigma }{{\sqrt {{N_t}\left( u \right)} }} \nonumber\\[6pt] {u_k} & = \frac{{argmin}}{{{u_k}\epsilon u}}\left( {{h^t}} \right) \end{align} \tag{ 11 }$

where c is a confidence level value (here selected as 1.96 corresponding to 95% confidence level) which adds to the exploration-exploitation. The margin of error $\frac{\sigma }{{\sqrt {{N_t}\left( u \right)} }}$ decreases as the sample size increases and algorithm is more certain about the mean ${\mu _t}\left( u \right)$ .

2.9.5. TS

TS is a further class of Bayesian algorithms that can achieve high efficiency sampling of complex MAB problems [53]. We tested multiple variants of TS that differed in their assumptions about the payoff's underlying generative distribution. These included Bernoulli, Poisson, and normal distributions. See the Supplementary Methods [55, 56] for specific descriptions of these TS variants.

2.9.6. Improper priors

As in most MAB algorithms, initially the priors are set as improper priors. To make the posterior proper, we force selection of each stimulation site at least once before control of the selection is given to the optimization algorithm [54].

2.9.7. Brute force (BF)

We compared all of the above optimizers against simple BF. The BF algorithm selects each stimulation site repeatedly for a proportional fraction of the simulated trials. The exact number of applications is determined by the block size, see 'Block-wise vs. trial-wise optimization' below. It then declares the optimal site to be the site with the lowest mean RT, i.e. the one with maximum stimulation effect.

2.10. Optimizer test with simulated stimulation response surfaces

The performance of a given optimizer depends on the problem structure. For instance, if there is a single obvious global best setting, the greedy and BF algorithms will suffice. If the response surface is shallow, with local minima very close to the global minimum, the more complex optimizers are more likely necessary. Greedy and BF algorithms also are expected to scale less well as the number of options (stimulation sites) expands, e.g. as DBS leads move from legacy quadripolar designs to modern multi-contact segmented designs [57, 58].

We thus simulated a range of increasingly complex optimization problems, from 2 to 8 stimulation sites (table 2). We modeled the effects of individual sites after the empirical effects observed in [37]. The source data for changes in the RT due to the stimulation are shown in supplementary table 3(a) of [37]. We assumed that all sites either decrease ${x_{{\text{base}}}}$ (the desired effect) or have no effect. Table 2 gives the values for ${b_{{\text{in}}}}$ 's effect on ${x_{{\text{base}}}}$ ; the effect on ${x_{{\text{conflict}}}}$ was assumed to be 10-fold smaller (consistent with the empirical findings that VCVS stimulation mainly affects ${x_{{\text{base}}}}$ ). For each re-simulation of the problem (see below), the order of these sites was randomly shuffled.

Table 2. Stimulation sites and stimulation effect for different number of stimulation sites. The units are effectively log-seconds, consistent with the definition of ${x_{{\text{base}}}}$ .

Problem size	Stimulation effect
	1	2	3	4	5	6	7	8
8	0	−0.005	−0.01	−0.02	−0.03	−0.031	−0.04	−0.07
6	0	−0.005	−0.01	−0.02	−0.04	−0.07
4	0	−0.005	−0.01	−0.04
2	−0.01	−0.04

Another key performance factor for an optimization algorithm is its ability to distinguish between two closely spaced minima. We therefore also simulated the two-site case, but with increasingly small separation between the two options, all the way to a degenerate case with no difference between options. These values b1–b7 are given in table 3.

Table 3. Stimulation effects for two stimulation site locations for the optimization problems, with decreasing differences between the two sites. ${b_{{\text{SS}}}}$ is a special case used only for demonstrating steady-state response/settling of the sensor model. It is specifically chosen to have very large and visibly different effects.

Stimulation sites	Stimulation effect		Stimulation sites	Stimulation effect
	1	2		1	2
b1	0.05	−0.05	b5	0	−0.02
b2	0	−0.10	b6	0	−0.01
b3	0	−0.05	b7	0	0
b4	0	−0.03	${b_{ss}}$	0.25	−0.25

The primary metric for optimizer evaluation is accuracy, defined as the fraction of times that a given algorithm identifies the known best site.

2.11. Optimization simulations

Optimizer accuracy may be sensitive to the steady-state settling behavior of the specific problem, the convergence rules, and aspects of the problem. We tested the above algorithms across a range of assumptions (table 4), including:

Table 4. Simulation configuration for overall closed loop simulations to identify the effects of different parameters on the algorithm's accuracy.

	Replicates	Trials per replicate	Block size	Stimulation sites
Effect of block-wise vs. trial-wise stimulation	1000	600	1,5, 10, 15, 20, 25, 30, 35, 40, 45, 50	8
Effect of convergence criteria	1000	100, 200, 300, 400, 500, 600, 700, 800, 900, 1000	15	8
Effect of number of stimulation sites	1000	600	15	2, 4, 6, 8
Effect of signal to noise ratio	1000	600	15	8
Discriminability of different stimulation effects	1000	600	15	2
Ensemble (1–7)	1000	600	15	2, 4, 6, 8

2.11.1. Effect of block-wise vs. trial-wise stimulation

By default, all the above optimization algorithms explore a new setting (stimulation site) on every trial. This will work poorly, however, with the specific linear dynamical system model underlying our generator-sensor model. As described in [37], the model is meant to smooth out stochastic variation in RTs, and as such, the underlying state variables change slowly in response to perturbation. The effect of a new stimulation site may not be clearly detectable until it has been applied for several trials in a row. We thus compared trial-wise optimization to a block-wise optimization, where the stimulation was only able to change every n trials, and where the mean RT over that block of trials is used as the outcome to update the algorithm's internal model. We refer to n as the 'blocksize'. The block approach creates a trade-off—higher blocksize will stabilize the estimate and make subtle effects more obvious, but will require more total trials to converge. We thus evaluated convergence properties for blocksize ranging from 1 to 50, with a step size of 5. For each of these, we simulated only 600 trials. This represents 2–3 h of total optimization time (assuming some breaks), which is about the most a patient can be expected to tolerate in a clinical setting [15]. Using this more limited trial count also helps highlight the trade off just described. For each value of blocksize and for each proposed algorithm, we simulated 1000 task runs each with 600 trials, with a generator randomly drawn from the above generator model pool (1000 models seeded from 6 participants). The sensor model was similarly selected randomly as described under 'Sensor model'. For each run, the optimizer was only permitted to choose stimulation sites once all sites had been selected at least once (i.e. we initialized the algorithm by a forced exploration). This leads to prior propers and encourages convergence of the sensor to the true current value of ${x_{{\text{base}}}}$ regardless of the value of ${x_0}$ . We simulated 8 stimulation sites for this problem, as described in table 2.

2.11.2. Effect of convergence criterion

The convergence of a bandit algorithm depends on the complexity of the problem under consideration. Waiting for algorithms to converge could be impractical if the algorithm takes longer to converge, as real humans will have difficulty tolerating more than 2–3 h of testing. We thus tested how optimization performance depends on the number of trials (a fixed stopping criterion). We tested 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1000 trial runs. The overall simulation was repeated 1000 times with a problem of eight stimulation sites (table 2) and blocksize 15. The position of the optimal stimulation site was shuffled for each repetition. The generator models were randomly selected for each simulation run as above.

2.11.3. Effect of number of stimulation sites

As the number of available options (stimulation sites) increases, algorithms may take longer to reliably estimate each site's effect. This could give an advantage to the Bayesian algorithms, which generally explore more efficiently. To test this possibility, we evaluated how algorithm performance depends on the number of potential stimulation sites. For this simulation, we tested 2, 4, 6, and 8 sites, with surfaces described in table 2. These simulations were repeated for 1000 replicates, with each replicate having 600 trials and a blocksize of 15. The location of the optimal stimulation site was shuffled for each repetition.

2.11.4. Effect of state noise/signal to noise ratio (SNR)

The SNR influences how well bandit algorithms perform. A noisy system would produce noisy RTs, and it would be difficult for the algorithm to identify any changes caused by the stimulation effect. We evaluated how different noise floors affect the algorithm accuracy. For this simulation, rather than using the values from the grid search, we selected $W$ ( ${\sigma _1}$ and ${\sigma _2}$ ) as described in table 5. These values were perturbed by N(0, 0.0001) in order to use a slightly different generator model in each simulation. These values correspond to different SNRs, expressed as the ratio of the largest stimulation effect in b to the size of the noise variance. We simulated each SNR for 1000 replicates, using 600 trials per replicate, a blocksize of 15, and 8 stimulation sites as described in table 2.

Table 5. Different state noise levels. ${b_{{\text{diff}}}}$ represents the difference between the most effective and second most effective stimulation site.

$\sigma$	0.5	0.4	0.3	0.2	0.1	0.05	0.04	0.03	0.02	0.01	0.005
$\frac{{{b_{diff}}}}{\sigma }$	0.06	0.075	0.1	0.15	0.3	0.6	0.75	1	1.5	3	6
dB	−24.44	−22.5	−20	−16.48	−10.46	−4.44	−2.5	0	3.5	9.54	15.56

2.11.5. Discriminability of different stimulation effects

It may be challenging to detect a global optimum if two sites have very similar performance. We evaluated how close two stimulation sites could be while still being reliably distinguishable. For simulations, generators were randomly selected from the generator pool and sensor models selected as described in 'Sensor model'. Each row of table 3 was simulated with 1000 replicates, 600 trials per replicate, 2 stimulation sites, and blocksize of 15.

2.12. Ensemble simulations

Even the best optimization algorithm can still converge away from the global optimum, particularly if the measurement (RT) is noisy. This might be mitigated by an ensemble approach, where optimization is performed repeatedly from different initial conditions, or simply by allowing the random choices inherent in the algorithms to proceed from different seeds. Clinically, this could be realized by redoing the optimization with the same patient on multiple days, as has been suggested in [34]. We simulated this ensemble case, by repeatedly re-simulating the same generator and different sensor model. We tested ensembles of 1, 2, 3, 4, 5, 6 and 7 repetitions over 1000 generator-sensor pairs selected as described above, with 600 trials per replicate, blocksize 15, stimulation problem of size 2, 4, 6 and 8, and noise level as defined in the 'Grid search for model noise processes' section. The selection for computing accuracy was then based on a hard majority vote across repetitions.

3. Results

Successful optimization depends on: (1) selection of reasonable parameters for the sensor model, particularly the noise process terms, (2) the generator's ability to simulate RTs with a distribution comparable to actual participants, (3) the sensor model's ability to accurately track the generator's underlying unobservable states, without direct information about them and (4) the selection of an optimizer that can reliably converge to the correct answer in this specific use case.

3.1. Model noise process determination via grid search

Modeling includes estimation of parameters using the COMPASS EM algorithm. The full parameter estimation problem given in (3) has many unknowns, which in practice often leads to convergence on a local maximum. This is particularly problematic when estimating the state noise W, leading to the simplifications in (4) and (5). To enable those simplifications, we performed a grid search to identify reasonable fixed/cross-patient values for W, seeking values that minimize the deviance between actual and predicted data during the expectation step of EM.

Figure 2 shows the deviance for the state-space model of (1) and (2), fitted using different noise variances for ${x_{{\text{base}}}}$ and ${x_{{\text{conflict}}}}$ . A region of values corresponding to [4, 19] ( ${\sigma _1}$ ≡ [0.0014,0.25]) for ${x_{{\text{base}}}}$ and [2, 34] ( ${\sigma _2}$ ≡ [0.000007629,0.5]) for ${x_{{\text{conflict}}}}$ produced small deviance on average (figure 2(A)). Due to the unavailability of ground truth about optimal scale values, the specific best point within this region cannot be determined. In practice, any W drawn from within the low-deviance zones would likely work well. We verified that this region of low deviance is present in single-participant data (examples in figures 2(B) and (C)). In inspecting those data, we noted that some participants (e.g. S1 shown in figure 2(B)) have a narrower range of acceptable W values. We thus used this narrower range for all further simulations described below. This corresponded to scale [7, 16] ( ${\sigma _1}$ ≡ [0.0039, 0.0884]) for ${x_{{\text{base}}}}$ and [13, 27] ( ${\sigma _2}$ ≡ [0.00008,0.0110]) for ${x_{{\text{conflict}}}}$ .

**Figure 2.** Logarithmic grid search for W. The deviance is calculated for the model fitted with different values of ${\sigma _1}$ and ${\sigma _2}$ (here expressed as scale factors, see equation 6. Deviance values greater than 0 were replaced by 0 to better highlight the contours. (A) Averaged deviance across individuals. There is a zone of low deviance across participants. (B)–(C) Examples of the same deviance plot from 1000 simulations of individual participants. The region shown in (A) is present in the individual plots, i.e. is not an artifact of averaging.
Download figure:
Standard image High-resolution image

3.2. Generator model validation

A good model fit using (4) and (5) acts as an approximation of actual participants receiving stimulation, which means that a good model would generate the same distribution as that of the participants' (S1–S6) data. However, empirically due to the assumptions for (4) and (5), the estimated model requires additional model tuning for exactly matching the distribution of any individual participant. We validated that the generator models, with and without tuning, produced plausible behavior.

Without any tuning (simple fitting to empirical data via COMPASS), the RTs emitted by the generator follow the expected gamma distribution (figures 3(A) and S1). While not a good model for S1 specifically, the generator model output thus follows the expected statistics of actual patient data, i.e. could be used as an example of a new, unobserved patient. To verify this, we fit a gamma distribution to the simulated RT, then calculated the empirical distribution of the RTs expected given that fit. The KS distance between the generator-model RT's pdf and the pdf of a gamma distribution corresponded to p > 0.05 in 100% of the simulation runs across patients. As further verification, we showed that interference trials have higher RTs by the amount that would be expected during MSIT (figure 3(B)). We also verified that, if necessary, a specific patient's data could be matched through the generator model. Hand-tuning parameters A, $\mu$ , w, v and $\alpha$ generated simulations that almost perfectly matched with the empirical observations (figures 3(A) and S1; table S1). The calculated KS distance between the empirical and the simulated RT (1000 trials) from the tuned model had a p > 0.05 for 98.9, 96.2, 90.4, 100.0, 91.5, 98.6% of the 1000 simulation runs for S1–S6 respectively.

**Figure 3.** Simulations of MSIT behavior as emitted by the generator model. (A) Comparison of actual and simulated behavior (RT distributions) for S1. The curves represent the mean distribution of 1000 runs of 110 trials (number of trials S1 performed) with trials randomly set as interference/non-interference. The KS distance for p > 0.05 between empirical RT and model tuned RT was 95.2%. The shaded bound represents the standard error of that mean over 1000 re-simulations. The empirical behavior (black) matches well with the RT distribution emitted by a tuned model fit to those data (red). The distribution of an untuned model has the same overall shape, but a different mode and slightly different skew, demonstrating that slight changes to the model parameters can effectively simulate new patients. (B) Distribution of RTs during interference and non-interference conditions for 1000 simulated trials from S2. As expected, interference trials have higher RT by approximately 200 ms.
Download figure:
Standard image High-resolution image

3.3. Sensor model tracking of ground truth and steady state response

The sensor model tracks the generator well (figures 4(A)–(C) and S2) across the simulations, with a median NRMSE of 0.0935 for the actual/estimated RT, 0.0541 for ${x_{{\text{base}}}}$ , and 0.0784 for ${x_{{\text{conflict}}}}$ . This suggests that the sensor models can accurately track the unobservable generator state, and that this holds true across many random realizations. However, this tracking is not instantaneous. As seen particularly in figure 4(C), the nature of the sensor model, the random initialization of ${x_0},$ and the chosen noise covariances cause the tracking to be temporarily inaccurate for the first 20 trials.

**Figure 4.** Sensor model tracking the generator's output and unobserved internal states. (A) Comparison between emitted generator RTs and sensor's prediction of those RTs. This panel shows the first 150 out of 1000 simulated trials. As expected, the sensor output is a smoothed (expected value) version of the generator output. (B)–(C), sensor state estimates compared to generator ground truth. There is accurate tracking of both states after about 20 trials, with this 'lead in' period necessary because ${x_0}$ of the sensor and generator models are initialized to different random numbers. (D) Generator and sensor steady state response for a single patient. At trial 0, stimulation switches between the two values of ${b_{{\text{ss}}}}$ , applying a strong forcing. The generator's internal state settles to steady state in roughly ten trials, and the sensor tracks this closely. Line represents the mean value of ${x_{{\text{base}}}}$ for generator and sensor over a blocksize of 25 with 40 replicates (1000 trials), whereas the shading represents the standard error of the mean over replicates.
Download figure:
Standard image High-resolution image

**Figure 4.** Sensor model tracking the generator's output and unobserved internal states. (A) Comparison between emitted generator RTs and sensor's prediction of those RTs. This panel shows the first 150 out of 1000 simulated trials. As expected, the sensor output is a smoothed (expected value) version of the generator output. (B)–(C), sensor state estimates compared to generator ground truth. There is accurate tracking of both states after about 20 trials, with this 'lead in' period necessary because ${x_0}$ of the sensor and generator models are initialized to different random numbers. (D) Generator and sensor steady state response for a single patient. At trial 0, stimulation switches between the two values of ${b_{{\text{ss}}}}$ , applying a strong forcing. The generator's internal state settles to steady state in roughly ten trials, and the sensor tracks this closely. Line represents the mean value of ${x_{{\text{base}}}}$ for generator and sensor over a blocksize of 25 with 40 replicates (1000 trials), whereas the shading represents the standard error of the mean over replicates.
Download figure:
Standard image High-resolution image

Similarly, the model structure means that the tracked state variable ${x_{{\text{base}}}}$ does not change immediately when stimulation is applied. Since we assume a linear dynamical system with a substantial autoregressive component, it takes 5–10 trials for the effect of a continuous perturbation to settle to a new steady state (figure 4(D)). In general, the number of trials to reach steady state is dependent on the stimulation effect strength, i.e. stronger perturbations will take longer, but also will be more easily detected despite small stochastic variations. Regardless, this result argues that the effects of stimulation can only be evaluated over blocks of trials, not on individual trials.

3.4. Bayesian optimization

Having validated the generator and sensor model and identified settling time as a consideration, we evaluated which optimization algorithms could best identify optimal stimulation sites for improving MSIT performance (decreasing task RTs). We measured the accuracy (probability of identifying a known global optimum) of multiple algorithms as a function of (1) blocksize for stimulation, (2) number of stimulation sites ('problem size'), (3) number of trials performed, (4) distance between the stimulation effects of the stimulation sites, (5) SNR of stimulation effect relative to state noise, and (6) number of times an algorithm is repeated during an ensemble strategy.

3.5. Blocksize for stimulation

The accuracy of all algorithms improves with increasing blocksize, but plateaus around a blocksize of 10 (figure 5(A) and tables 6, S2). This is visible across all algorithms. This is consistent with the observation in figure 4 that the stimulation effect is not instantaneous, but takes a few trials to manifest in ${x_{{\text{base}}}}$ . Further, at these higher blocksizes, UCB1 is generally the best algorithm. When we compared all algorithms across problem sizes and blocksizes, UCB1 was the most likely to converge to the known optimum (tables 6 and S2). Bayesian algorithms were the best for small blocksizes and single trial simulations, but lost this advantage in part because as the blocksize increases, the estimate of ${x_{{\text{base}}}}$ tends to the actual ${x_{{\text{base}}}}$ . At the same time, most algorithms consistently outperform BF, emphasizing the overall value of more advanced optimization strategies.

Table 6. Different algorithm performance with different blocksize for problem sizes of 4 and 8 sites. The bold values show algorithm accuracies that exceed brute force (BF). The final column, 'Wins', indicates the number of times that a given algorithm was best across all blocksizes. For all problem sizes, UCB1 had the most wins, i.e. it was more likely than other algorithms to find the best solution.

Problem size 4	Blocksize
Algorithms	1	5	10	15	20	25	30	35	40	45	50	Wins
BF	0.307	0.47	0.602	0.615	0.642	0.614	0.625	0.644	0.66	0.637	0.61	2
Greedy	0.28	0.392	0.538	0.582	0.576	0.598	0.638	0.602	0.612	0.615	0.651	0
ε-greedy	0.398	0.522	0.554	0.586	0.603	0.612	0.637	0.619	0.63	0.616	0.637	0
UCB1	0.439	0.589	0.675	0.664	0.679	0.637	0.67	0.677	0.654	0.628	0.679	6
Bayes-UCB	0.475	0.573	0.631	0.618	0.652	0.662	0.649	0.663	0.631	0.629	0.636	2
TS-Bernoulli	0.417	0.542	0.639	0.619	0.612	0.635	0.593	0.603	0.61	0.582	0.641	0
TS-Poisson	0.459	0.53	0.591	0.586	0.603	0.643	0.612	0.633	0.619	0.626	0.647	0
TS-Normal	0.4	0.601	0.612	0.623	0.619	0.618	0.631	0.623	0.611	0.595	0.594	1
C-TS	0.356	0.527	0.522	0.553	0.582	0.555	0.556	0.55	0.593	0.583	0.588	0
Problem size 8	Blocksize
Algorithms	1	5	10	15	20	25	30	35	40	45	50	Wins
BF	0.163	0.334	0.4	0.388	0.403	0.4	0.39	0.387	0.363	0.379	0.362	0
Greedy	0.162	0.33	0.419	0.457	0.455	0.439	0.467	0.42	0.454	0.407	0.427	1
ε-greedy	0.279	0.387	0.443	0.441	0.42	0.449	0.444	0.437	0.474	0.439	0.408	3
UCB1	0.24	0.37	0.456	0.5	0.472	0.456	0.437	0.41	0.429	0.423	0.439	5
Bayes-UCB	0.343	0.394	0.412	0.406	0.423	0.394	0.382	0.353	0.418	0.4	0.367	2
TS-Bernoulli	0.267	0.368	0.384	0.403	0.383	0.379	0.377	0.405	0.374	0.374	0.382	0
TS-Poisson	0.322	0.372	0.416	0.434	0.411	0.391	0.388	0.385	0.381	0.35	0.389	0
TS-Normal	0.204	0.368	0.4	0.406	0.405	0.409	0.368	0.348	0.352	0.367	0.372	0

Note: Underlined values show the best algorithm for the given blocksize.

Table 7. Different algorithm performance with different convergence criteria. UCB1 outperforms all other algorithms across problem sizes (most Wins). Convergence and accuracy are linearly related.

Problem size 4	Convergence criteria
Algorithms	100	200	300	400	500	600	700	800	900	1000	Wins
BF	0.36	0.438	0.518	0.548	0.577	0.615	0.626	0.692	0.687	0.7	0
Greedy	0.388	0.484	0.5	0.556	0.532	0.563	0.573	0.576	0.587	0.597	1
ε-greedy	0.355	0.444	0.495	0.562	0.581	0.608	0.607	0.614	0.647	0.649	0
UCB1	0.402	0.467	0.54	0.57	0.628	0.66	0.664	0.701	0.705	0.751	9
Bayes-UCB	0.373	0.453	0.502	0.568	0.609	0.608	0.65	0.691	0.72	0.723	0
TS-Bernoulli	0.362	0.439	0.485	0.527	0.589	0.607	0.629	0.673	0.655	0.694	0
TS-Poisson	0.344	0.474	0.511	0.537	0.579	0.619	0.631	0.629	0.646	0.695	0
TS-Normal	0.362	0.454	0.487	0.532	0.584	0.657	0.653	0.675	0.703	0.713	0
C-TS	0.35	0.431	0.48	0.512	0.523	0.542	0.589	0.586	0.595	0.621	0
Problem size 8	Convergence Criteria
Algorithms	100	200	300	400	500	600	700	800	900	1000	Wins
BF	0.103	0.266	0.302	0.314	0.363	0.365	0.444	0.457	0.459	0.512	0
Greedy	0.12	0.269	0.357	0.395	0.405	0.481	0.45	0.477	0.477	0.508	3
ε-greedy	0.132	0.266	0.327	0.395	0.454	0.425	0.467	0.507	0.515	0.524	2
UCB1	0.137	0.294	0.313	0.367	0.434	0.449	0.483	0.523	0.548	0.559	5
Bayes-UCB	0.138	0.247	0.281	0.321	0.379	0.402	0.417	0.499	0.508	0.493	1
TS-Bernoulli	0.12	0.245	0.303	0.322	0.365	0.387	0.413	0.458	0.456	0.473	0
TS-Poisson	0.107	0.275	0.27	0.317	0.363	0.384	0.423	0.459	0.507	0.542	0
TS-Normal	0.135	0.244	0.274	0.339	0.355	0.404	0.436	0.475	0.493	0.541	0
C-TS	0.104	0.228	0.292	0.322	0.311	0.331	0.348	0.38	0.378	0.396	0

Note: Underlined values show the best algorithm for the given Convergence criteria.

3.6. Problem size

Accuracy of all algorithms decreases with increases in the number of stimulation sites. This is a consequence of a decrease in the available samples for a fixed convergence criterion (number of trials) and blocksize. Most algorithms outperformed BF in most cases as the problem size (number of sites) increased, and UCB1 in particular consistently dominated BF (figures 5(C) and S3). However, UCB1 also maintained its advantage over the Bayesian and greedy algorithms (tables 6 and S2). The decrease in accuracy affects the Bayesian algorithms more since they are based on inferring the underlying distribution of the data, whereas greedy algorithms comparatively start performing better.

3.7. Convergence criterion

Accuracy increases with increase in the convergence criterion (number of allowed trials). However, in practice, the number of trials is limited, as a participant can only tolerate a certain number of trials/certain time on task on a given day before fatiguing. In our clinical experience, a 64-trial block takes a bit over 5 min, and it is difficult for a participant to perform more than 9–10 such blocks in close succession. Hence, we emphasized 600 trials as an approximation of this upper limit. In this analysis UCB1 again outperformed other algorithms (tables 7 and S3). It specifically maintained its advantage over BF at all convergence criteria above 500 trials (figure 5(B)), again demonstrating the benefits of a more efficiently exploring optimizer. Importantly, the number of trials collected is the sole measure of an algorithm's efficiency. The tested algorithms do vary in the time needed to compute each trial's outcomes, but all are far faster than the available inter-trial time (table S4).

3.8. Discriminability of different stimulation effects

Accuracy depends on the closeness of local optima to the global optimum. When given two stimulation choices to optimize, BF and UCB1 could identify the optimum with probability above chance, even in very hard problems (figure 5(D)). They only fell to chance in the degenerate case b7, when the two sites were identical. In this specific analysis, there was no advantage for UCB1 or other algorithms over BF; both had almost identical performance as discriminability fell. All algorithms are compared in table S5.

3.9. SNR

Accuracy of the algorithms is sensitive to noise. In other words, if there is more noise in the data generator, it will mask the stimulation effect and the overall accuracy will decrease. We compared algorithm performance across different levels of SNR, defined based on ratio of the stimulation effect to the state noise process variance. UCB1's advantage was less clear in this case; Bayesian and even greedy algorithms could outperform it in specific cases, although UCB1 was still either first or second across problem sizes (tables 8 and S6).

Table 8. Algorithm performance across varying SNRs. UCB1 outperformed or tied all other algorithms at problem sizes 4 and 6 (table S6), whereas $\epsilon$ -greedy performed better for the most challenging problem of 8 sites.

Problem size 4	SNR(dB)
Algorithms	−24.44	−22.5	−20	−16.48	−10.46	−4.44	−2.5	0	3.52	9.54	15.56	Wins
BF	0.372	0.381	0.394	0.468	0.549	0.602	0.619	0.649	0.668	0.708	0.715	0
Greedy	0.326	0.353	0.371	0.41	0.493	0.538	0.551	0.557	0.571	0.638	0.651	0
ε-greedy	0.337	0.366	0.385	0.442	0.497	0.585	0.611	0.61	0.643	0.647	0.7	0
UCB1	0.408	0.405	0.399	0.466	0.534	0.607	0.635	0.699	0.737	0.717	0.778	4
Bayes-UCB	0.322	0.403	0.427	0.432	0.529	0.649	0.642	0.653	0.701	0.761	0.811	4
TS-Bernoulli	0.342	0.368	0.401	0.438	0.557	0.56	0.583	0.679	0.665	0.728	0.77	0
TS-Poisson	0.369	0.385	0.395	0.44	0.507	0.59	0.615	0.623	0.681	0.714	0.784	0
TS-Normal	0.362	0.37	0.401	0.469	0.549	0.607	0.616	0.652	0.682	0.763	0.807	2
C-TS	0.34	0.347	0.355	0.388	0.479	0.505	0.534	0.545	0.621	0.696	0.701	0
Problem size 8	SNR(dB)
Algorithms	−24.44	−22.5	−20	−16.48	−10.46	−4.44	−2.5	0	3.52	9.54	15.56	Wins
BF	0.212	0.209	0.221	0.259	0.348	0.375	0.397	0.401	0.438	0.474	0.492	0
Greedy	0.221	0.236	0.255	0.303	0.387	0.451	0.458	0.476	0.52	0.533	0.548	3
ε-greedy	0.246	0.239	0.263	0.275	0.378	0.436	0.44	0.489	0.53	0.534	0.563	5
UCB1	0.232	0.245	0.24	0.32	0.369	0.44	0.454	0.453	0.502	0.529	0.577	3
Bayes-UCB	0.214	0.224	0.229	0.26	0.349	0.38	0.4	0.43	0.476	0.487	0.507	0
TS-Bernoulli	0.209	0.199	0.214	0.229	0.337	0.34	0.409	0.401	0.459	0.499	0.504	0
TS-Poisson	0.19	0.217	0.259	0.264	0.343	0.403	0.411	0.437	0.441	0.495	0.539	0
TS-Normal	0.207	0.219	0.226	0.259	0.322	0.382	0.403	0.431	0.457	0.466	0.509	0
C-TS	0.183	0.2	0.221	0.256	0.287	0.316	0.343	0.371	0.401	0.425	0.476	0

Note: Underlined values show the best algorithm for the given SNR.

3.10. Ensemble method for increased accuracy

Ensemble optimization further improved performance (figure 6). Even in the most difficult problem of eight stimulation sites, UCB1 maintained accuracy of nearly 80%, as compared to 47% when simulated once. In simpler problems, e.g. four stimulation sites as was originally tested in [37], UCB1's accuracy exceeded 90%. Performance slightly improved as the ensemble process lengthened from 3 to 5 d, but only asymptotically. UCB1 performed better than almost all algorithms over the ensemble process (table S7). It was also possible to improve performance above 90%, even on difficult problems, by accepting a 'good enough' local minimum (e.g. the 2nd best rather than optimal stimulation site). This improvement in accuracy was observed across algorithms. TS in particular improved compared to its accuracy in finding the single global optimum. Greedy algorithms moved closer to the performance of UCB1, but UCB1 still held a slight advantage over all algorithms. All alternate algorithms performed better than BF. (Tables S8 and S9).

4. Discussion

We have demonstrated an online approach to optimizing DBS programming for psychiatric disorders, and specifically, selection of the optimal site during a monopolar survey in the VCVS target. This approach is based on a robust framework for modeling and tracking individual behavior during a cognitive conflict task and evaluating the effect of different DBS contacts/settings on task performance. We have shown that models in this framework can rapidly be fit/adapted to individual participants, and that the most challenging parameter, the state noise covariance, does not need to be patient-specific. Using that general covariance, we have specifically shown that our model framework can track known ground truth, and can detect the effects of stimulation changes within a few task trials. Further, any individual can be precisely replicated and simulated by appropriate model parameter tuning. Finally, we showed that combining this modeling framework with adaptive optimization algorithms is an effective approach to identifying stimulation sites/parameters for a patient. The UCB1 algorithm in particular outperforms the BF approach described in [37] and a variety of alternate algorithms. There were specific situations where other algorithms outperformed UCB, but given that the best algorithm for a patient will not be known in advance, UCB was the most reliable performer and thus would be the logical choice for clinical use. When deployed in an ensemble, UCB can reliably identify a known optimum under realistic assumptions of problem size and time available for optimization with an accuracy of about 80% even for difficult problems. If the problem space can be narrowed, e.g. if a clinician has a prior hypothesis regarding specific contacts being more likely to be helpful, the chance of convergence increases. More generally, specific choices within the optimization (block size, number of trials per run, ensemble structure) provide a trade space for customizing optimization to a particular problem or stimulation effect size. For instance, if a participant could tolerate more trials per day, or if the stimulation effect were larger/more distinguished from a contact's neighbors, then fewer ensemble days might be needed.

In the process of validating those optimization algorithms, we developed a simulation framework for exercising and evaluating optimizers more generally, a framework that could in theory apply to other DBS optimization problems. We show that this simulation engine is an accurate model of real-world patients. It is able to replicate individual patients' data, to generate new patient-like data, and to recover known parameters. For model parameters that are sensitive to the poor estimation caused by the curse of dimensionality, such as the state noise variance, we demonstrate that one can estimate the value from a small range of values that work well across actual patients. This provides a starting point for future investigation and the formulation of plans for enhancing DBS programming using an objective methodology. These simulations were directly based on empirical observations in humans, potentially increasing the likelihood of their replication in vivo.

UCB1's relative superiority over other algorithms, particularly the more sophisticated parametric approach of TS, may initially be surprising. This result is likely an effect of the blocked stimulation paradigm we emphasize, where stimulation parameters are held constant for some number of trials, and where the total number of samples is highly constrained. In addition to allowing the filtered variable ${x_{{\text{base}}}}$ to reach steady state, the blocked approach averages out stochastic noise, meaning that the RT or state estimate for each block is a reliable reporter of stimulation's true effect. In the presence of reliable information/lowered noise, UCB1 and even greedy algorithms can outperform parametric approaches, because they do not require sufficient trials to reliably estimate a parametric distribution. In another scenario, with single trial measurements and more rapid changes in stimulation, TS might be superior as evident in tables 6 and S2 for blocksize 1. If the search space is not exclusively restricted to one-dimensional problems as it was here, TS might also produce better outcomes. Finally, UCB1 may have been the best-performing algorithm in this study because we specifically assumed the effects of the stimulation sites to be uncorrelated, to formulate a MAB problem. In reality, DBS leads have spatial structure, particularly newer leads that fractionate individual contacts into 'segments'. In these higher-density leads, it is possible and in fact likely that the effects of nearby sites are correlated. If we explicitly modeled that correlation, e.g. by propagating information between sites via a spatial kernel, TS or other algorithms might be more accurate.

The overall approach may address the target engagement problem that has limited the use of DBS in non-motor applications [2, 19]. In the most straightforward example, the framework described above could be used to optimize VCVS DBS for depression or OCD. Current practice asks patients to rate their subjective mood as a clinician tests different contacts/settings [15, 16]. In our proposed framework, patients would instead perform the MSIT or a similar cognitive task. Our prior work shows that patients can tolerate doing so for several hundred trials if given adequate breaks [36, 37]. Even if the ensemble approach is needed, the current standard in VCVS DBS optimization is a 2+ day office procedure taking 8–10 h [15]. Our proposed approach could actually take less time. Further, because task performance and task-based optimization could easily be performed in a clinical/ office setting, it would require little change to the clinical workflow and no specialized equipment. Verification of target engagement through this task-based paradigm may lead to more robust clinical trial designs for new DBS indications. For instance, the algorithms above all can identify stimulation sites that do not improve RT/ ${x_{{\text{base}}}}$ . Stimulation settings that do and do not improve cognitive control could be tested against each other in an ABAB or similar within-subjects design. Those designs have higher statistical power than the between-groups designs that failed in [59, 60].

In the longer term, an objective assessment for behavioral target engagement could also inform surgical practice. For instance, multiple studies argue that clinical response to DBS, as measured with standard rating scales, is correlated with activation of specific white matter bundles [21, 22, 61, 62]. This is likely also true of cognitive response—more dorsal fibers within the VCVS have larger behavioral effects [37], and specific cognitive domains are differentially linked to different sub-parts of the cortico-striato-thalamic circuitry that runs through VCVS [63]. By retrospective analysis, it may be possible to identify VCVS sub-bundles associated with the cognitive response described here, and to then target DBS electrodes specifically to those bundles.

The framework of optimizing DBS to engage a specific, objective, task-measured behavior may also lead to more robust animal models. The availability of rodent and non-human primate models for movement disorders DBS has enabled more rapid advances, because novel stimulation sequences and closed-loop paradigms can be more easily tested [64–66]. This would likely also be true for psychiatric DBS, but current animal models of mental disorders, grounded in human symptoms, have limited predictive power for clinical outcomes [67]. Cognitive assays, on the other hand, might be more reliably translated across species [68]. The method described here, using UCB1 to optimize between discrete stimulation sites, could equally be used to test among discrete types/paradigms of stimulation applied to the same electrode. Similar to [66], many different approaches might be compared sequentially in an animal cognitive assay, and the few reliable performers advanced to human pilots.

Conversely, the concept of combining a Bayesian optimizer with a filtering/ latent state model could be applied on longer timescales consistent with clinical outcomes, in both psychiatric disorders and neighboring applications such as chronic pain. Multiple authors have argued that DBS is difficult to develop for non-motor applications because the clinical outcomes are self reports, which are overly influenced by expectancy, environmental events, and other non-structured variations [2, 19, 20, 69–71]. If self report could be made more reliable, there are case reports of successful neurostimulator programming through optimization algorithms [46, 47]. Non-DBS-related changes in self-report outcomes can be considered as stochastic noise, in principle similar to the variations in RT that we smooth out through the filtering described in equations (1) and (2). The same approach could be applied to patient self report, e.g. collecting multiple noisy ratings at dense timescales and filtering to identify the underlying 'true' rating. In cases where there are known environmental influences (e.g. a patient whose pain is influenced by time of day or by emotional stressors), these could be modeled as inputs, similar to how we modeled interference effects through ${x_{{\text{conflict}}}}$ .

5. Limitations

The largest limitation of the current system is that it can optimize only a single discrete parameter (stimulation site). In practice, clinically effective DBS requires finding the correct combination of site(s), frequency, pulse width, amplitude, and temporal patterning (e.g. cycling). Algorithms exist for optimizing these continuous-valued parameters [41], but efficient approaches to multivariate, discrete-continuous hybrid optimization are not yet developed. As more variables are simultaneously considered, the response space explodes combinatorially and it may be difficult to converge in clinically feasible times. Stimulation parameters may need to be optimized sequentially by univariate algorithms, or multivariate optimization might be constrained, e.g. by stronger prior probabilities derived from prior patients. In real-world situations, directly attempting to optimize the full joint parameter space may lead to a reduction of the performance of the algorithms. However, the effect in [37] was specifically achieved using a fixed amplitude. Thus, sequential optimization should be feasible: it should be possible to use the technique used here to identify the optimal contact and then fine-tune the amplitude manually or using an algorithm [41]. Further, if the algorithm fails to converge, it still produces a response surface. That information may be useful to the clinician in selecting DBS settings.

The current approach does not consider adverse effects. DBS at VCVS is well documented to cause negative side effects, including immediate high anxiety [14, 15, 72] and slower-onset impulsivity and mania [13, 73]. Optimization algorithms should avoid settings known to cause these adverse effects, but the optimizers described here have no way to represent extreme negative outcomes. They also do not provide feedback as to whether the difference between two sites can be considered statistically significant, which may be clinically important if more than one site is close to the global optimum. All of these factors would need to be re-considered in optimizing DBS for a different cognitive effect, presumably with a different behavioral assay. The underlying COMPASS toolkit is flexible and can model most standard tasks [48], but model structure would still need to be customized for each new application, as would the noise covariance assumptions. This poses a challenge for DBS targets beyond VCVS [22, 74]. It is not yet known which tasks/assays can measure engagement at other targets, which limits the use of this approach as a general-purpose tool in psychiatric DBS.

More broadly, the current approach is an indirect read-out of DBS target engagement. DBS ultimately exerts its effects by changing neurophysiology [2, 75]. Other approaches to DBS optimization in psychiatry focus on identifying stimulation parameters that directly produce a desired physiological change [34], or on responding to changes in a target biomarker to leverage the state dependence of DBS effects [18, 26]. We cannot inform responsive DBS with our proposed approach, nor can we match the precision of neurophysiology. Those more advanced approaches, however, require correspondingly advanced inpatient monitoring and analytic capabilities. The method proposed here requires only commodity computer hardware, and thus may be more cost-effective and scalable.

6. Conclusion

In summary, we have demonstrated a proof of concept system for psychiatric DBS programming and online optimization, using cognitive control as the measure of effective target engagement. This may lead to more reliable clinical response and/or more robust design for clinical trials of the specific VCVS target studied here. Future work will include extending optimization to multiple DBS parameters and handling of stimulation-induced adverse events. Above all, any potential clinical use would require optimization software that was tested for usability and design robustness [76], a critical next step for validating our proposed approach.

Acknowledgments

This work was supported by the Minnesota's Discovery, Research, and Innovation Economy (MnDRIVE) initiative, the Minnesota Medical Discovery Team on Addictions, and the National Institutes of Health (R01MH124687, R01NS120851). The opinions herein are the position of the authors, not any governmental body or funding agency.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://github.com/tne-lab/DBSParameterOptimization2022.git. Data will be available from 31 August 2022.

Code availability

At the time of publication, all code and data required for performing the above-described simulations will be made available in a GitHub repository https://github.com/tne-lab/DBSParameterOptimization2022.git.

Conflict of interest

A S W, A Y, and T I N have unlicensed intellectual property in the area of brain stimulation optimization, including patent applications related to the subject of this paper. T I N holds equity in StimSherpa, a company developing optimization methods for neuromodulation. All other authors declare no financial conflicts of interest.