Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Use of machine learning to examine disparities in completion of substance use disorder treatment

  • Aaron Baird ,

    Contributed equally to this work with: Aaron Baird, Yichen Cheng, Yusen Xia

    Roles Conceptualization, Data curation, Supervision, Visualization, Writing – original draft, Writing – review & editing

    abaird@gsu.edu

    Affiliation Institute of Health Administration, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America

  • Yichen Cheng ,

    Contributed equally to this work with: Aaron Baird, Yichen Cheng, Yusen Xia

    Roles Formal analysis, Validation, Visualization, Writing – review & editing

    Affiliation Institute for Insight, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America

  • Yusen Xia

    Contributed equally to this work with: Aaron Baird, Yichen Cheng, Yusen Xia

    Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – review & editing

    Affiliation Institute for Insight, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America

Abstract

The objective of this work is to examine disparities in the completion of substance use disorder treatment in the U.S. Our data is from the Treatment Episode Dataset Discharge (TEDS-D) datasets from the U.S. Substance Abuse and Mental Health Services Administration (SAMHSA) for 2017–2019. We apply a two-stage virtual twins model (random forest + decision tree) where, in the first stage (random forest), we determine differences in treatment completion probability associated with race/ethnicity, income source, no co-occurrence of mental health disorders, gender (biological), no health insurance, veteran status, age, and primary substance (alcohol or opioid). In the second stage (decision tree), we identify subgroups associated with probability differences, where such subgroups are more or less likely to complete treatment. We find the subgroups most likely to complete substance use disorder treatment, when the subgroup represents more than 1% of the sample, are those with no mental health condition co-occurrence (4.8% more likely when discharged from an ambulatory outpatient treatment program, representing 62% of the sample; and 10% more likely for one of the more specifically defined subgroups representing 10% of the sample), an income source of job-related wages/salary (4.3% more likely when not having used in the 30 days primary to discharge and when primary substance is not alcohol only, representing 28% of the sample), and white non-Hispanics (2.7% more likely when discharged from residential long-term treatment, representing 9% of the sample). Important implications are that: 1) those without a co-occurring mental health condition are the most likely to complete treatment, 2) those with job related wages or income are more likely to complete treatment, and 3) racial/ethnicity disparities persist in favor of white non-Hispanic individuals seeking to complete treatment. Thus, additional resources may be needed to combat such disparities.

Introduction

According to the 2020 National Survey on Drug Use and Health (NSDUH), 58.7% (or 162.5 million people) were current users of tobacco, alcohol, or an illicit drug [1]. A total of 14.5 percent (or 40.3 million people) were found to have a substance use disorder [1]. 1.4 percent (or 4.0 million people) aged 12 or older in the U.S. “received any substance use disorder treatment in the past year, and 1.0 percent (or 2.7 million people) received substance use disorder treatment at a specialty facility in the past year” [1]. Further, it is well known that treatment can effectively reduce substance dependence and improve related factors, such as associated mental health conditions, criminal behavior, and access to employment [2, 3].

Unfortunately, though, health care is subject to disparities [4] and, specific to our study, substance use disorder treatment outcomes can vary between subgroups [1, 511]. Work in this area has found that African Americans often wait longer to receive substance use disorder treatment service for opioids than their white counterparts [12, 13], and are less likely to complete treatment [14, 15]. It has also been found that racial disparities can differ by substance used, such as for methamphetamines vs. alcohol [5, 16]. Further, the most recent National Healthcare Quality and Disparities Report (2019) noted that income and being uninsured, in addition to race, were significant underlying factors in the presence of disparities in health quality [17]. Other studies have found that health care quality disparities persist by gender and age [18] and for those with mental health disease [19, 20]. More generally, disparities including income and race have been shown to be associated with disparities in health care quality and outcomes, but were also shown to be improving (i.e., less disparities) between 2006 and 2012 [4] and persisting in some areas but decreasing in others more recently [17]. Thus, we have an opportunity to comprehensively assess disparities in substance use disorder treatment completion, inclusive of a variety of determinants as well as to examine how present findings compare to prior findings. We also have an opportunity to apply state-of-the-art methods.

One methodological approach with a lot of promise is the use of machine learning (ML). ML has been receiving a lot of attention lately in the context of health care [21, 22]. ML has been applied specifically to analysis of substance use disorder treatment, resulting in interesting findings [23] with improved granularity and accuracy in many cases [9]. For instance, using two-stage virtual twins method (random forest + logistic regression), racial disparities were found to be present in wait times for treatment of opioid users [12]. Using a series of XGBoost models, one study found a number of complex interactions in factors associated with treatment completion, such as longer treatment times typically improving chances of treatment success yet success probability attenuating somewhat when frequency of substance use was at the 75th percentile or higher [24]. Finally, using data on Medicare beneficiaries with emergency department admissions for opioid overdoses, ML has been used to develop accurate opioid overdose prediction models [25].

While excellent research has been conducted in this area, we claim: 1) the types of disparities considered should be inclusive of not only race and ethnicity, but also other determinants, 2) application of ML-based methods may help to more accurately identify subgroups more likely to complete treatment, and 3) the use of counterfactual research designs can help to establish causality. Given these claims, the objective of this study is to determine which subgroups are the mostly likely in the U.S. to complete substance use disorder treatment, using a method that combines the strengths of ML with the strengths of counterfactual research designs.

Conceptually, our work builds upon a growing body of social determinants of health [26] and disparities research [2729] seeking to understand how systematic differences in subgroups, identified by intersections of characteristics [30], result in health outcome variations. We specifically consider how patient demographics, substance use characteristics, and treatment characteristics, impact treatment completion. This approach is consistent with work seeking to understand where heterogenous treatment effects are present, especially in observational data [3133]. Conceptually, we assume heterogeneity in health care processes and outcomes, where some subgroups experience more favorable outcomes than others. We also assume, however, that heterogenous treatment effects with respect to disparities are either not immediately obvious to those providing care or are not always evaluated in depth. We use the most recently available national data (2017–2019), without restricting by substance type [12], region [14, 24], or only focusing on racially-based disparities [15]. Our findings contribute to research by leveraging a causal ML approach, applied to a number of disparities, and ultimately elucidating where disparities are persisting. Our findings contribute to practice by helping those who provide care to identify and be cognizant of what types of treatment episodes are more likely to result in completed treatment, even if not immediately obvious to care givers.

Data and methods

Study design

This study utilizes a counterfactual research design toward identification of subgroup differences, specifically designed to blend causal inference and ML methods [12, 31, 34]. This type of analysis allows for identification of subgroups with heterogenous treatment effects as well as identification of factors causing such differences [12]. This study was approved as exempt by an IRB as the data is anonymized and publicly available.

Data source and sample

The data for this study comes from the publicly available, nationally representative TEDS-D datasets from SAMHSA for years 2017–2019. SAMHSA provides aggregated data for both admissions and discharges from substance use disorder treatment programs, for participating states (e.g., in 2019, all U.S. states participated except for Oregon, Washington, and West Virginia) and the District of Columbia and Puerto Rico. We selected the discharge data, as opposed to the admission data, to assess effects on treatment completion. Each observation is for one discharge, rather than one individual, which means that one individual may have multiple observations in the data. For the years of data we analyzed, a total of 649,479 discharges were included in our analyses. This sample represents 13.3% of the data. Missing data is explained in the S1 Appendix.

Measures

Dependent variable (First stage).

Reason for discharge (REASON) from a substance use disorder treatment program was chosen as the primary dependent variable for the first stage estimation. This variable is categorical with seven categories: 1) treatment completed, which was coded as ‘1’ in our analysis for successful completion of treatment, and 2) six other categories coded as ‘0’ in our analysis to capture unsuccessful completion (i.e., dropped out of treatment, terminated by facility, transferred to another treatment program or facility, incarcerated, death, and other that captures a life circumstance change, such as hospitalization or change of residence).

Disparity variables.

While prior studies have typically focused on one type of disparity at a time, such as disparities related to race and ethnicity [12, 14, 15], we evaluate multiple disparities. As depicted in Fig 1, we evaluate disparities relative to: race/ethnicity (white non-Hispanic vs. rest), income source (incomes from wages or salary vs. rest), no mental health disease co-occurrence (no mental health disease co-occurrence vs. discharges for patients with mental health disease co-occurrence), gender (biological) (male vs. female), no health insurance (no health insurance vs. having health insurance), veteran status (yes a veteran vs. not a veteran), age (<35 years old vs. > = 35 years old), and primary substance (alcohol vs. rest; opioid vs. rest). Only some of these results are reported in this paper, with the rest included in the S1 Appendix, as some did not result in disparities being found or only at very low levels.

Explanatory variables.

Three types of variables were included as explanatory variables. Some variables available in the TEDS-D data were dropped due to collinearity. See the S1 Appendix for details. Explanatory variables that were retained include:

  • Patient demographics: Age at admission, gender (biological), race, ethnicity, marital status, education level, employment status at admission/discharge, veteran status, living arrangement at admission/discharge, primary income source, and arrests in the past month prior to the discharge.
  • Substance use characteristics: Primary/secondary substance use, frequency of use at admission/discharge, and primary substance type reported at admission.
  • Treatment characteristics: Type of treatment/service setting at admission/discharge, length of stay in treatment, referral source, detailed criminal justice referral, and previous substance use treatment episodes.

Descriptive analyses

We generated descriptive statistics for the entire sample as well as for those who successfully completed treatment vs. those who did not. Additional descriptions, such as for missing data, was also generated and is provided in the S1 Appendix.

Disparities analyses

In disparities research [12], we cannot observe different effects for the same observation for immutable characteristics. In the virtual twins approach, in the first stage estimation, a probability is determined for an outcome for every observation, which is treatment success (completion) in this study [34]. To estimate this probability, we follow prior work in this area [12, 24] and apply a machine learning approach. Specifically, we estimated probability of successful treatment completion with a random forest, XGBoost, a neural network, and a logistic regression. We applied a 70% training and 30% testing random data split with 10 iterations through the procedures to address variation due to randomness. We used the R package “h2o” to implement all the methods. For the neural network, we set the number of hidden cells to be (64,64), where the first 64 is the number of neurons in the first hidden layer and the second 64 for the second hidden layer. For all the other methods, we used the default settings. Random forest had the highest accuracy, AUC, and F1 and was selected as the finalist for the first stage estimations. Random forest is an ensemble method based on multiple decision trees. The model takes input covariate values (Xi, Ti), where Ti is the binary indicator variable (subgroup variable) for whether an observation received and treatment or not, and the output is P(Yi = 1), i.e, the probability of successful completion, for that set of covariate values. For discharge i, we denote the probability as P1i if the discharge is in the treatment group, and P0i if otherwise.

To establish a counterfactual or a “virtual twin,” a second probability is calculated for every observation with the subgroup variable switched to its opposite value. The difference of these two probabilities is then calculated per observation (e.g., P(white non-Hispanic)–P(not white non-Hispanic)). This procedure was repeated for every disparity type evaluated, reported earlier. That is, we create a new variable for each discharge, defined as the difference in the probability for assuming discharge i is from the treatment group vs control group: Zi = P1iP0i.

This difference is the primary variable in the second stage. In the second stage, we apply a decision tree to determine which factors, i.e., the same independent variables in the first stage other than the disparity variable under consideration, cause the probability difference [34, 35].

Results

Data description

The full dataset is described in Table 1. Some highlights are that the highest Reason for Discharge in the full sample was “treatment completed” (33.8%). The next highest categories are “transferred to another treatment program” (29.2%) and “dropped out of treatment” (22.8%). In the treatment not completed subgroup, 34.5% of discharges are for “dropped out of treatment,” while 44.1% are for “transferred to another treatment program.” Most of the treatment discharges in the full sample were from ambulatory outpatient centers (15.2% for intensive outpatient and 47.0% for non-intensive outpatient). 57.0% of discharges were for lengths of stay for between 1 and 30 days (57.0%).

thumbnail
Table 1. TEDS-D sample description (2017–2019) including differences for substance use disorder treatment completed vs. not completed subgroups.

https://doi.org/10.1371/journal.pone.0275054.t001

Regarding the disparities reported in this paper, starting with race and ethnicity, white patients make up 70.1% of the full sample, Black or African American patients make up 17.6%, and the remainder of races identified make up 12.3% of the sample. Non-Hispanic patients make up 88.9% of the sample. For primary income source, in the full sample, 27.1% of discharges were associated with patients who had income from wages/salary, 44.3% did not have or did not report a primary income source, and the remainder received income from public assistance, retirement/pension or disability, or other sources. For co-occurrence of a mental health disorder, 56.5% of discharges were associated with patients with at least one co-occurring mental health disorder while 43.5% were associated with patients without a co-occurring mental health disorder.

Full missing data details are reported in the S1 Appendix, with largest amount of missing data (>20%) occurring within variables for DSM diagnosis (DSMCRIT), frequency of use at discharge (primary) (FREQ_D) and living arrangement at discharge (LIVARAG_D).

Virtual twins: First stage results

The resulting feature importance from this first stage random forest were as follows, with the scaled importance in parenthesis, where 1 is the most important: type of service discharged from (1.00), frequency of use of primary substance at discharge (0.58), DSM diagnosis (0.49), length of stay (0.47), age (0.29), secondary substance used (0.23), primary substance used (0.22), referral source (18.2), frequency of use of primary substance at admission (0.16), and employment status at discharge (0.15).

Virtual twins: Second stage results

For the second stage results, we report the decision trees developed using R (package: h2o) applied to the disparity in question. All left branches mean “yes” the branching condition was met. All right branches mean “no” the branching condition was not met. The decimal values represent the increased probability of completing substance use disorder treatment due to being in the subgroup identified by the branching conditions. When higher, these decimal values indicate greater likelihood of completing treatment. The hues represent lower (lighter) or higher (darker) probabilities of completing treatment. The percentage indicates percentage of the discharges in the sample represented by the specific node.

Fig 2 depicts the decision tree for race/ethnicity disparity, where the probability difference was calculated as P1i (white non-Hispanic)–P0i (all other races and ethnicities). Thus, the nodes represent the increased (or decreased) probability of completing treatment successfully when white non-Hispanic. Overall, the highest probability is 2.7% (representing 9% of the sample) when the service is rehab/residential, long term (>30 days), which is the only service not in the list of services specified in the branching node. This suggests that a racial disparity exists particularly for longer-term treatment. On the other end of the spectrum, we find that completing treatment successfully is 12% less likely for white non-Hispanic patients when admitted to ambulatory detox, but the percentage of the sample represented is near 0%, suggesting that this difference applies to few discharges. Disparities for other subgroups identified are less than 1%.

thumbnail
Fig 2. Race/ethnicity decision tree (P1i = white non-Hispanic).

https://doi.org/10.1371/journal.pone.0275054.g002

Fig 3 depicts the decision tree for income source disparity, where the probability difference was calculated as P1i (wages/salary)–P0i (all other income sources). The nodes represent the increased (or decreased) probability of completing treatment successfully when a regular source of job-related income is available. Overall, all the probabilities are positive, suggesting that those with job-related income are more likely to successfully complete treatment. Those who have not used in the past month (54% of the sample) have a 3.5% higher probability of completing treatment if their income source is from wages or salary. Further, one of the highest probabilities is 4.3%, representing 28% of the sample, for those with no use in the past month and using either drugs only or drugs in addition to alcohol use. The other highest probability is 4.2%, for those with no use in the past month, are only alcohol users, and who are either discharged from Detox 24-hour free-standing residential or any of the rehab/residential types of programs. We also note that these probabilities (4.3% and 4.2% respectively) are higher than the highest probability associated with racial disparities (2.7%), suggesting that income source disparities are somewhat higher than race/ethnicity disparities, for some subgroups.

Fig 4 depicts the decision tree for no co-occurring mental health disorder disparity, where the probability difference was calculated as P1i (no co-occurring substance use and mental health disorder)–P0i (co-occurring). We note that this decision tree was grown for discharges where PSYPROB (co-occurring mental and substance use disorders) is equal to “No.” We mention this as the TEDS-D data also includes a variable called DSMCRIT (i.e., DSM diagnosis), that includes options for values for both substance use and mental health diagnoses, but each discharge is only assigned one primary diagnosis within this variable. Thus, it is impossible to tell with this variable if there is a co-occurring substance use and mental health diagnosis. The PSYPROB variable is a Yes/No variable that captures whether there are co-occurring substance and mental health diagnoses. While there is some overlap between PSYPROM and DSMCRIT, we based the tree on the PSYPROB variable, as it accurately reflects dual diagnoses.

thumbnail
Fig 4. No co-occurring mental health disorder decision tree (P1i = no co-occurring mental health disorder).

https://doi.org/10.1371/journal.pone.0275054.g004

The nodes in the tree represent the increased (or decreased) probability of completing treatment successfully when one does not have co-occurring substance use and mental health disorders. Overall, all the probabilities are positive, suggesting that one is more likely to successfully complete treatment if not also diagnosed with a mental health disorder. The highest probability of 10% (representing 10% of the sample) is for the subgroup of those discharged from ambulatory, outpatient services (either intensive or non-intensive), have not used in the past month, and with primary a DSMCRIT diagnosis of a (alcohol-induced disorder), b (substance-induced disorder), c (alcohol intoxication), d (alcohol dependence), i (alcohol abuse), j (cannabis abuse), n (anxiety disorders), o (depressive disorders), p (schizophrenia/other psychotic disorders), q (bipolar disorders), or r (attention deficit/disruptive behavior Disorders). We note that this probability (10.0%) is the highest observed in this study (when >1% of the sample is represented). We also note that even higher up in the tree, for the node with a 4.8% probability representing 62% of the sample (which is for those discharged from an ambulatory outpatient treatment program) is also higher than the probabilities observed in the other decision tree results (race and income from wages/job), when >1% of the sample is represented. These results are consistent with the results from the robustness checks (Fig 3 in S1 Appendix) Thus, we conclude that those with no mental health co-occurrence have the highest probability of completing treatment successfully.

Robustness

First, we evaluated the potential impacts of imbalanced data, associated with our first stage dependent variable, by assessing accuracy as well as AUC, precision, recall and F1 scores for the balanced data using the Synthetic Minority Over-sampling Technique (SMOTE) (see the statistics in the S1 Appendix). We find that the out-of-sample statistics using SMOTE are very similar to the statistics resulting for the analyses run using the original data. Given that both precision and recall are high and consistent with each other, we conclude that the results of the prediction models are not imbalanced in favor of only one class (or a minority of classes).

Second, a potential issue with our first stage dependent variable (treatment completion) is that some of the unsuccessful completion categories, such as transfer, incarcerated, death, or other, may not reflect a disparity in treatment completion, but rather changes or issues that occurred outside of the control of the individual or treatment program. Some studies have addressed this issue by only focusing on planned discharges [15, 24] or by dropping detox related readmissions [15]. Thus, for robustness, we re-ran the analyses with a subset of the data for only two categories: treatment completed (coded as 1), and both dropped out of treatment and terminated by facility coded as 0, with all other observations for other reasons dropped. As can be seen in the S1 Appendix, while there are some minor differences in the results for these robustness checks, the probabilities and subgroups identified are very similar to the main analyses. We do note two differences, however. In the race/ethnicity disparity robustness check, for nodes representing >1% of the sample, one of the subgroups has a -1.1% probability (representing 49% of the sample) of successfully completing treatment. This suggests that disparities may be present in the other direction (i.e., in favor of minorities) in some cases. For the income source disparity robustness check, the highest disparity is 3.4% (representing 55% of the sample), which is a full percentage point lower than the highest reported income source disparity in the main results and is similar to the highest race/ethnicity probability percentage in its respective robustness check results (3.3%; representing 5% of the sample). Although these highest probabilities for income source and race/ethnicity disparity are similar, given that the robustness check for income source disparity represents a much higher percentage of the sample (55% vs. 5%), we maintain that disparities evaluated in this study, for certain subgroups, occur in this order: no mental health co-occurrence, income source (job related wages), racial/ethnicity (white non-Hispanic).

Discussion

Leveraging a national dataset of substance use disorder treatment discharges for 2017–2019 in the U.S., this study has examined disparities in substance use disorder treatment completion. After evaluating several potential disparities, the three most prominent disparities found are: no co-occurrence of substance use and mental health disorders, income source, and race/ethnicity. Through application of a virtual twins method, which is a counterfactual approach used to identify subgroups subject to differences in outcomes, we find that disparities are indeed present and should be considered in more depth by researchers and practitioners alike.

Our primary finding is that, for the disparities considered in this study, the highest probability for successfully completing treatment when the subgroup represents more than 1% of the sample, is for one of the subgroups within the no mental health condition co-occurrence (10% more likely to complete treatment; representing 10% of the sample). The second highest is for a subgroup within the income source from wages/job decision tree (4.3% more likely to complete treatment; representing 28% of the sample). The third highest is for a white non-Hispanic subgroup in the race/ethnicity decision tree (2.7% more likely to complete treatment; representing 9% of the sample).

Prior studies have shown that racial disparities are present in substance use disorder treatment completion [14, 15]. Our findings confirm that racial/ethnic disparities persist, particularly when admitted to residential, long-term (>30 days) treatment programs. This finding suggests that disparities may exist when decisions are made as to which type of program to admit a patient to or retain within. This implies that biases associated with race or ethnicity should be particularly examined in the process of determining which program to refer or admit patients into as well as in treatment continuation decisions.

We also find that other disparities exist, that also require practitioner and policy maker attention. Subgroups associated with having job related income or not having a co-occurring mental health condition have the largest probabilities of successfully completing treatment completion, in this study. Prior work has shown that disparities are present for those with mental health conditions [36] and that co-occurrence of substance use disorder and mental health conditions is often associated with barriers to sufficient care [37]. Prior work has also shown that racial/ethnic minorities with lower income often lack equitable access to substance use disorder treatment [38]. However, to our knowledge, the heterogenous treatment effects associated with income source disparities and co-occurrence of substance use disorder and mental health disorders have not yet been fully considered in relation to substance use disorder treatment completion. Thus, we contribute by identifying additional subgroups for whom treatment completion is more or less likely.

Regarding mental health disorder co-occurrence, the highest treatment completion probabilities for this subgroup were for who were discharged from ambulatory (non-detox) services. Specifically, this suggests that more investments are likely needed in services for patients with dual-diagnoses and, if dual diagnosis patients are routed to ambulatory services, specialized programs or tailored resources may be needed to reduce this disparity.

Regarding income source, those with job-related income and who had not used their primary substance in the last 30 days upon discharge were the most likely to complete treatment. This suggests that job retention or placement programs, for individuals who are willing and able to work, may reduce disparities in completion treatment. This may require that substance treatment also include either social programs or readily available connections to those offering such programs. Further, as is the case throughout health care, more emphasis on coordination between achieving treatment goals as well as achieving social goals may be required by those assisting patients in treatment.

This study is primarily limited by two data issues: missing data and data not submitted by some U.S. states (e.g., Georgia, Oregon, Washington, and West Virginia did not submit data to SAMHSA in some years). We sought to address these issues by analyzing available data across the entire U.S. (i.e., not just for specific states). Secondarily, this study is limited by not being able to observe effects for immutable characteristics for the same discharge (e.g., being white non-Hispanic and another race or ethnicity at the same time). The virtual twins analysis counterfactual design was specifically selected to address this issue.

Overall, this study has shown that disparities exist and persist in substance use disorder treatment completion. Given that this study is based on a national sample, substance use disorder treatment programs can use these results apply customized approaches toward mitigating disparity risk. For instance, while race/ethnicity is an important disparity to continue to consider, we also find that other types of disparities are present, suggesting that policy makers and practitioners consider at least income and co-occurring diagnoses, in addition to race and ethnicity, when making resource allocation and programmatic design decisions.

Supporting information

References

  1. 1. Richesson D, Hoenig JM. Key Substance Use and Mental Health Indicators in the United States: Results from the 2020 National Survey on Drug Use and Health. Substance Abuse and Mental Health Services Administration (SAMHSA), 2021 PEP21-07-01-003.
  2. 2. McLellan AT, Luborsky L, O’Brien CP, Woody GE, Druley KA. Is treatment for substance abuse effective? JAMA. 1982;247(10):1423–8. pmid:7057531
  3. 3. McCarty D, Braude L, Lyman DR, Dougherty RH, Daniels AS, Ghose SS, et al. Substance abuse intensive outpatient programs: assessing the evidence. Psychiatric Services. 2014;65(6):718–26. pmid:24445620
  4. 4. Fiscella K, Sanders MR. Racial and ethnic disparities in the quality of health care. Annual review of public health. 2016;37:375–94. pmid:26789384
  5. 5. Mennis J, Stahler GJ. Racial and ethnic disparities in outpatient substance use disorder treatment episode completion for different substances. Journal of Substance Abuse Treatment. 2016;63:25–33. pmid:26818489
  6. 6. Stahler GJ, Mennis J, DuCette JP. Residential and outpatient treatment completion for substance use disorders in the US: Moderation analysis by demographics and drug of choice. Addictive Behaviors. 2016;58:129–35.
  7. 7. Suntai Z. Substance use among women who are pregnant: Examining treatment completion by race and ethnicity. Journal of Substance Abuse Treatment. 2021;131:108437. pmid:34098297
  8. 8. Sahker E, Toussaint MN, Ramirez M, Ali SR, Arndt S. Evaluating racial disparity in referral source and successful completion of substance abuse treatment. Addictive behaviors. 2015;48:25–9. pmid:25935719
  9. 9. Acion L, Kelmansky D, van der Laan M, Sahker E, Jones D, Arndt S. Use of a machine learning framework to predict substance use disorder treatment success. PloS ONE. 2017;12(4):e0175383. pmid:28394905
  10. 10. Marotta PL, Tolou-Shams M, Cunningham-Williams RM, Washington DM Sr, Voisin D. Racial and ethnic disparities, referral source and attrition from outpatient substance use disorder treatment among adolescents in the United States. Youth & Society. 2022;54(1):148–73.
  11. 11. Lagisetty PA, Ross R, Bohnert A, Clay M, Maust DT. Buprenorphine treatment divide by race/ethnicity and payment. JAMA psychiatry. 2019;76(9):979–81. pmid:31066881
  12. 12. Kong Y, Zhou J, Zheng Z, Amaro H, Guerrero E. Using Machine Learning to Advance Disparities Research: Subgroup Analyses of Access to Opioid Treatment. Health Services Research. 2021;Forthcoming:1–11. pmid:34657287
  13. 13. Schiff DM, Nielsen T, Hoeppner BB, Terplan M, Hansen H, Bernson D, et al. Assessment of racial and ethnic disparities in the use of medication to treat opioid use disorder among pregnant women in Massachusetts. JAMA Network Open. 2020;3(5):e205734–e. pmid:32453384
  14. 14. Guerrero EG, Marsh JC, Duan L, Oh C, Perron B, Lee B. Disparities in Completion of Substance Abuse Treatment between and within Racial and Ethnic Groups. Health Services Research. 2013;48(4):1450–67. pmid:23350871
  15. 15. Saloner B, Cook BL. Blacks and Hispanics are less likely than whites to complete addiction treatment, largely due to socioeconomic factors. Health affairs. 2013;32(1):135–45. pmid:23297281
  16. 16. Racial/ethnic differences in substance use, substance use disorders, and substance use treatment utilization among people aged 12 or older (2015–2019) Rockville, MD: Center for Behavioral Health Statistics and Quality, 2021 PEP21-07-01-001).
  17. 17. 2019 National Healthcare Quality and Disparities Report. Rockville, MD: Agency for Healthcare Research and Quality, 2020.
  18. 18. Steinberg MB, Akincigil A, Delnevo CD, Crystal S, Carson JL. Gender and age disparities for smoking-cessation treatment. American journal of preventive medicine. 2006;30(5):405–12. pmid:16627128
  19. 19. Safran MA, Mays RA Jr, Huang LN, McCuan R, Pham PK, Fisher SK, et al. Mental health disparities. American Journal of Public Health. 2009;99(11):1962–6. pmid:19820213
  20. 20. Alegría M, NeMoyer A, Falgàs Bagué I, Wang Y, Alvarez K. Social determinants of mental health: where we are and where we need to go. Current psychiatry reports. 2018;20(11):1–13. pmid:30221308
  21. 21. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317–8. pmid:29532063
  22. 22. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. New England Journal of Medicine. 2019;380(14):1347–58. pmid:30943338
  23. 23. Barenholtz E, Fitzgerald ND, Hahn WE. Machine-learning approaches to substance-abuse research: emerging trends and their implications. Current Opinion in Psychiatry. 2020;33(4):334–42. pmid:32304429
  24. 24. Nasir M, Summerfield NS, Oztekin A, Knight M, Ackerson LK, Carreiro S. Machine learning–based outcome prediction and novel hypotheses generation for substance use disorder treatment. Journal of the American Medical Informatics Association. 2021;28(6):1216–24. pmid:33570148
  25. 25. Lo-Ciganic W-H, Huang JL, Zhang HH, Weiss JC, Wu Y, Kwoh CK, et al. Evaluation of machine-learning algorithms for predicting opioid overdose risk among medicare beneficiaries with opioid prescriptions. JAMA Network Open. 2019;2(3):e190968–e. pmid:30901048
  26. 26. Kino S, Hsu Y-T, Shiba K, Chien Y-S, Mita C, Kawachi I, et al. A scoping review on the use of machine learning in research on social determinants of health: Trends and research prospects. SSM-population health. 2021;15:100836. pmid:34169138
  27. 27. Pampel FC, Krueger PM, Denney JT. Socioeconomic disparities in health behaviors. Annual Review of Sociology. 2010;36:349–70. pmid:21909182
  28. 28. Mantwill S, Monestel-Umaña S, Schulz PJ. The relationship between health literacy and health disparities: a systematic review. PLoS One. 2015;10(12):e0145455. pmid:26698310
  29. 29. Ganju KK, Atasoy H, McCullough J, Greenwood B. The role of decision support systems in attenuating racial biases in healthcare delivery. Management Science. 2020;66(11):5171–81.
  30. 30. Horner-Johnson W, Fujiura GT, Goode TD. Promoting a new research agenda: Health disparities research at the intersection of disability, race, and ethnicity. Medical Care. 2014;52:S1–S2.
  31. 31. Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association. 2018;113(523):1228–42.
  32. 32. Seligman B, Tuljapurkar S, Rehkopf D. Machine learning approaches to the social determinants of health in the health and retirement study. SSM-population health. 2018;4:95–9. pmid:29349278
  33. 33. Xie Y, Brand JE, Jann B. Estimating heterogeneous treatment effects with observational data. Sociological methodology. 2012;42(1):314–47. pmid:23482633
  34. 34. Foster JC, Taylor JM, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in Medicine. 2011;30(24):2867–80. pmid:21815180
  35. 35. Lu M, Sadiq S, Feaster DJ, Ishwaran H. Estimating individual treatment effect in observational data using random forest methods. Journal of Computational and Graphical Statistics. 2018;27(1):209–19. pmid:29706752
  36. 36. Creedon TB, Cook BL. Access to mental health care increased but not for substance use, while disparities remain. Health Affairs. 2016;35(6):1017–21. pmid:27269017
  37. 37. Harris KM, Edlund MJ. Use of mental health care and substance abuse treatment among adults with co-occurring disorders. Psychiatric Services. 2005;56(8):954–9. pmid:16088012
  38. 38. Priester MA, Browne T, Iachini A, Clone S, DeHart D, Seay KD. Treatment access barriers and disparities among individuals with co-occurring mental health and substance use disorders: an integrative literature review. Journal of substance abuse treatment. 2016;61:47–59. pmid:26531892