Methods for testing theory and evaluating impact in randomized field trials: Intent-to-treat analyses for integrating the perspectives of person, place, and time

https://doi.org/10.1016/j.drugalcdep.2007.11.013Get rights and content

Abstract

Randomized field trials provide unique opportunities to examine the effectiveness of an intervention in real world settings and to test and extend both theory of etiology and theory of intervention. These trials are designed not only to test for overall intervention impact but also to examine how impact varies as a function of individual level characteristics, context, and across time. Examination of such variation in impact requires analytical methods that take into account the trial's multiple nested structure and the evolving changes in outcomes over time. The models that we describe here merge multilevel modeling with growth modeling, allowing for variation in impact to be represented through discrete mixtures—growth mixture models—and nonparametric smooth functions—generalized additive mixed models. These methods are part of an emerging class of multilevel growth mixture models, and we illustrate these with models that examine overall impact and variation in impact. In this paper, we define intent-to-treat analyses in group-randomized multilevel field trials and discuss appropriate ways to identify, examine, and test for variation in impact without inflating the Type I error rate. We describe how to make causal inferences more robust to misspecification of covariates in such analyses and how to summarize and present these interactive intervention effects clearly. Practical strategies for reducing model complexity, checking model fit, and handling missing data are discussed using six randomized field trials to show how these methods may be used across trials randomized at different levels.

Introduction

Randomized field trials (RFTs) provide a powerful means of testing a defined intervention under realistic conditions. Just as important as the empirical evidence of overall impact that a trial provides (Flay et al., 2005), an RFT can also refine and extend both etiologic theory and intervention theory. Etiologic theory examines the role of risk and protective factors in prevention, and an RFT formally tests whether changes in these hypothesized factors lead to the prevention of targeted outcomes. Theories of intervention characterize how change in risk or protective factors impact immediate and distal targets and how specific theory driven mediators produce such changes (Kellam and Rebok, 1992, Kellam et al., 1999). The elaborations in theory that can come from an RFT draw on understanding the interactive effects of individual level variation in response over time to different environmental influences. An adolescent drug abuse prevention program that addresses perceived norms, for example, may differentially affect those already using substances compared to nonusers. This intervention's effect may also differ in schools that have norms favoring use compared to schools with norms favoring nonuse. Finally, the impact may differ in middle and high school as early benefits may wane or become stronger over time.

This paper presents a general analytic framework and a range of analytic methods that characterize intervention impact in RFTs that may vary across individuals, contexts, and time. The framework begins by distinguishing the types of research questions that RFTs address, then continues by introducing a general three-level description of RFT designs. Six different RFTs are described briefly in terms of these three levels, and illustrations are used to show how to test theoretically driven hypotheses of impact variation across persons, place, and time. In this paper, we focus on intent-to-treat (ITT) analyses that examine the influence of baseline factors on impact, and leave all post-assignment analyses, such as mediation analysis, for discussions elsewhere. This separation into two parts is for pragmatic and space considerations only, as post-assignment analyses provide valuable insights into ITT results and are generally included in major evaluations of impact. For these intent-to-treat analyses, we present standards for determining which subjects should be included in analyses, how missing data and differences in intervention exposure should be handled, and what causal interpretations can be legitimately drawn from the statistical summaries. We present the full range of different modeling strategies available for examining variation in impact, and we emphasize those statistical models that are the most flexible in addressing individual level and contextual factors across time. Two underutilized methods for examining impact, generalized additive mixed models (GAMM) and growth mixture models (GMM), are presented in detail and applied to provide new findings on the impact of the Good Behavior Game (GBG) in the First Generation Baltimore Prevention Program trial.

We first define a randomized field trial and then describe the research questions it answers. An RFT uses randomization to test two or more defined psychosocial or education intervention conditions against one another in the field or community under realistic training, supervision, program funding, implementation, and administration conditions. All these conditions are relevant to evaluating effectiveness or impact within real world settings (Flay, 1986). In contrast, there are other randomized trials that test the efficacy of preventive interventions in early phases of development. These efficacy trials are designed to examine the maximal effect under restricted, highly standardized conditions that often reduce individual or contextual variation as much as possible. Testing efficacy requires that the intervention be implemented as intended and delivered with full fidelity. The interventions in efficacy trials are delivered by intervention agents (Snyder et al., 2006) who are carefully screened and highly trained. In efficacy trials, they are generally professionals who are brought in by an external research team. By contrast, the intervention agents of RFTs are often parents, community leaders, teachers or other practitioners who come from within the indigenous community or institutional settings (Flay, 1986). The level of fidelity in RFTs is thus likely to vary considerably, and examining such variation in delivery can be important in evaluating impact (Brown and Liao, 1999). Both types of trials are part of a larger strategy to build new interventions and test their ultimate effects in target populations (Greenwald and Cullen, 1985).

As a special class of experiments, RFTs have some unique features. Most importantly, they differ from efficacy trials on the degree of control placed on implementation of the intervention. They are designed to address questions other than those of pure efficacy, and they often assess both mediator and moderator effects (Krull and MacKinnon, 1999, MacKinnon and Dwyer, 1993, MacKinnon et al., 1989, Tein et al., 2004). Also, they often differ from many traditional trials by the level at which randomization occurs as well as the choice of target population. These differences are discussed below starting with comments on program implementation first.

Program implementation is quite likely to vary in RFTs due to variation in the skills and other factors that may make some teachers or parents more able to carry out the intervention than others even when they receive the same amount of training. These trials are designed to test an intervention the way it would be implemented within its community, agency, institutional, or governmental home setting. In such settings, differences in early and continued training, support for the implementers, and differences in the aptitude of the implementers can lead to variation in implementation. The intervention implementers, who are typically not under the control of the research team the way they are in efficacy trials, are likely to deliver the program with varied fidelity, more adaptation, and less regularity than that which occurs in efficacy trials (Dane and Schneider, 1998, Domitrovich and Greenberg, 2000, Harachi et al., 1999). Traditional intent-to-treat analyses which do not adjust for potential variations in implementation, fidelity, participation, or adherence, are often supplemented with “as-treated” analyses, mediation analysis, and other post-assignment analyses described elsewhere (Brown and Liao, 1999, Jo, 2002, MacKinnon, 2006).

A second common difference between RFTs and controlled efficacy trials is that the intervention often occurs at a group rather than individual level; random assignment in an efficacy trial is frequently at the level of the individual while that for an RFT generally occurs at levels other than the individual, such as classroom, school, or community. Individuals assigned to the same intervention cluster are assessed prior to and after the intervention, and their characteristics, as well as characteristics of their intervention group may serve in multilevel analyses of mediation or moderation (Krull and MacKinnon, 1999). In addition, levels nested above the group level where intervention assignment occurs, such as the school in a classroom randomized trial, can also be used in assessing variation in intervention impact. Examples of six recent multilevel designs are presented in Table 1; these are chosen because random assignment occurs at different levels ranging from the individual level to the classroom, school, district, and county level. This table describes the different levels in each trial as well as the individual level denominators that are used in intent-to-treat analyses, a topic we present in detail in Section 2.2. We continue to refer to these trials in this paper to illustrate the general approach to analyzing variation in impact for intent-to-treat, as treated, and other analyses involving post-assignment outcomes.

Finally, RFTs often target heterogeneous populations, whereas controlled experiments routinely use tight inclusion/exclusion criteria to test the intervention with a homogenous group. Because they are population-based, RFTs can be used to examine variation in impact across the population, for example to understand whether a drug prevention program in middle school has a different impact on those who are already using substances at baseline compared to those who have not yet used substances. This naturally offers an opportunity to examine the impact by baseline level of risk, and thereby examine whether changes in this risk affect outcomes in accord with etiologic theory.

We are often just as interested in examining variation in impact in RFTs as we are in examining the main effect. For example, a universal, whole classroom intervention aimed proximally at reducing early aggressive, disruptive behavior and distally at preventing later drug abuse/dependence disorders may impact those children who were aggressive, disruptive at baseline but have little impact on low aggressive, disruptive children. It may work especially well in classes with high numbers of aggressive, disruptive children but show less impact in either classrooms with low numbers of aggressive, disruptive children or in classrooms that are already well managed. Incorporating these contextual factors in multilevel analyses should also increase our ability to generalize results to broader settings (Cronbach, 1972, Shadish et al., 2002). Prevention of or delay in later drug abuse/dependence disorders may also depend on continued reduction in aggressive, disruptive behavior through time. Thus our analytic modeling of intervention impact or RFTs will often require us to incorporate growth trajectories, as well as multilevel factors.

RFTs, such as that of the Baltimore Prevention Program (BPP) described in this issue of Drug and Alcohol Dependence (Kellam et al., 2008), are designed to examine the three fundamental questions of a prevention program's impact on a defined population: (1) who benefits; (2) for how long; (3) and under what conditions or contexts? Answering these three questions allows us to draw inferences and refine theories of intervention far beyond what we could do if we only address whether a significant overall program impact was found. The corresponding analytical approaches we use to answer these questions require greater sophistication and model checking than would ordinarily be required of analyses limited to addressing overall program impact. In this paper, we present integrative analytic strategies for addressing these three general questions from an RFT and illustrate how they test and build theory as well as lead to increased effectiveness at a population level. Appropriate uses of these methods to address specific research questions are given and illustrated on data related to the prevention of drug abuse/dependence disorders from the First Baltimore Prevention Program trial and other ongoing RFTs.

The prevention science goal in understanding who benefits, for how long, and under what conditions or contexts draws on similar perspectives from both theories of human behavior and from methodology that characterize how behaviors change through time and context. In the developmental sciences, for example, the focus is on examining how individual behavior is shaped over time or stage of life by individual differences acting in environmental contexts (Weiss, 1949). In epidemiology, which seeks to identify the causes of a disorder in a population, we first start descriptively by identifying the person, place, and time factors that link those with the disorder to those without such a disorder (Lilienfeld and Lilienfeld, 1980).

From the perspective of prevention methodology, these same person, place, and time considerations play a fundamental roles in trial design (Brown and Liao, 1999, Brown et al., 2006, Brown et al., 2007a, Brown et al., 2007b) and analysis (Brown et al., 2008, Bryk and Raudenbush, 1987, Goldstein, 2003, Hedeker and Gibbons, 1994, Muthén, 1997, Muthén and Shedden, 1999, Muthén et al., 2002, Raudenbush, 1997, Wang et al., 2005, Xu and Hedeker, 2001). Randomized trial designs have extended beyond those with individual level randomization to those that randomize at the level of the group or place (Brown and Liao, 1999, Brown et al., 2006, Donner and Klar, 2000, Murray, 1998, Raudenbush, 1997, Raudenbush and Liu, 2000, Seltzer, 2004). Randomization also can occur simultaneously in time and place as illustrated in dynamic wait-listed designs where schools are assigned to receive an intervention at randomly determined times (Brown et al., 2006). Finally, in a number of analytic approaches used by prevention methodologists that are derived from the fields of biostatistics, psychometrics, and the newly emerging ecometrics (Raudenbush and Sampson, 1999), there now exist ways to include characteristics of person and place in examining impact through time.

There has been extensive methodologic work done to develop analytic models that focus on person, place, and time. For modeling variation across persons, we often use two broad classes of modeling. Regression modeling is used to assess the impact of observed covariates that are measured on individuals and contexts that are measured without error. Mixed effects modeling, random effects, latent variables, or latent classes are used when there is important measurement error, when there are unobserved variables or groupings, or when clustering in contexts produces intraclass correlation. For modeling the role of places or context, multilevel modeling or mixed modeling is commonly used. For models involving time, growth modeling is often used, although growth can be examined in a multilevel framework as well. While all these types of models—regression, random effects, latent variable, latent class, multilevel, mixed, and growth modeling—have been developed somewhat separately from one another, the recent trend has been to integrate many of these perspectives. There is a growing overlap in the overall models that are available from these different perspectives (Brown et al., 2008, Gibbons et al., 1988), and direct correspondences between these approaches can often be made (Wang et al., 2005). Indeed, the newest versions of many well-known software packages in multilevel modeling (HLM, MLWin), mixed or random effect modeling (SAS, Splus, R, SuperMix), and latent variable and growth modeling (Mplus, Amos), provide routines that can replicate models from several of the other packages.

Out of this new analytic integration come increased opportunities for examining complex research questions that are now being raised by our trials. In this paper, we provide a framework for carrying out such analyses with data from RFTs in the pursuit of answers to the three questions of who benefits, for how long, and under what conditions or contexts. In Section 2, we describe analytic and modeling issues to examine impact of individual and contextual effects on a single outcome measure. In this section, we deal with defining intent-to-treat analyses for multilevel trials, handling missing data, theoretical models of variation in impact, modeling and interpreting specific estimates as causal effects of the intervention, and methods for adjusting for different rates of assignment to the intervention. The first model we describe is a generalized linear mixed model (GLMM), which models a binary outcome using logistic regression and includes random effects as well. We conclude with a discussion of generalized additive mixed models, which represent the most integrative model in this class. Some of this section includes technical discussion of statistical issues; non-technical readers can skip these sections without losing the meaning by attending to the concluding sentences that describe the findings in less technical terms, as well as the examples and figures.

In Section 3, we discuss methods to examine intervention impact on growth trajectories. We discuss representing intervention impact in our models with specific coefficients that can be tested. Because of their importance to examining the effects of prevention programs, growth mixture models are highlighted, and we provide a causal interpretation of these parameters as well as discuss a number of methods to examine model fit. Again, non-technical readers can skip the equations and attend to introductory statements that precede the technical discussions.

Section 4 returns to the use of these analyses for testing impact and building theory. We also describe newer modeling techniques, called General Growth Mixture Models (GGMM), that are beginning to integrate the models described in Sections 2 Using an RFT to determine who benefits from or is harmed by an intervention on a single outcome measure, 3 Analytical strategies for examining variation in intervention impact over time.

Section snippets

Using an RFT to determine who benefits from or is harmed by an intervention on a single outcome measure

This question is centrally concerned with assessing intervention impact across a range of individual, group, and context level characteristics. We note first that population-based randomized preventive field trials have the flexibility of addressing this question much more broadly than do traditional clinic-based randomized trials where selection into the clinic makes it hard to study variation in impact. With classic pharmaceutical randomized clinical trials (P-RCT's), the most common type of

Analytical strategies for examining variation in intervention impact over time

In this section, we summarize how growth modeling can characterize the patterns of change in repeated measures over time due to an intervention compared to control. We consider many of these models as intent-to-treat analyses, and for some trials a growth model analysis may provide the primary analysis of impact, just as in P-RCT's the primary analysis can be based on the rates of change in a repeated measure for intervention versus control (Muthén, 1997, Muthén, 2003, Muthén, 2004, Muthén, in

Discussion

RFTs are designed to answer research questions that examine interventions delivered in real world settings. The main question we address in ITT analyses involves assessing an intervention's effectiveness, in order to characterize conditions under which outcomes improve or worsen relative to a community standard. The methods described in this paper address standards for conducting ITT analyses, analytic tools that incorporate clustering and nonlinearity in the modeling, methods to handle

Role of funding source

Funding for this study was provided by NIMH through grants R01 MH 40859, R01 MH 42968, P50 MH 38725, R01 MH068423, T32 MH018834, R34 MH071189, R01 MH076158, P30 MH068685 and NIDA R01 DA015409, R01 DA019984-02S1, P20 DA017592 as well as support from NIDA on each of the first three of these grants; NIAAA for K02 AA 00230, and Robert Wood Johnson Foundation Grant number 040371. None of these funding sources were involved in interpretation of data or in the writing of the report.

Conflict of Interest

Author Muthén is a co-developer of Mplus, which is discussed in this paper. There are no conflicts of interest.

Acknowledgements

The authors are colleagues in the Prevention Science and Methodology Group (PSMG) which has had many helpful discussions that have shaped not only this paper but our fundamental approaches to understanding impact of preventive interventions over the last 18 years. We thank our colleagues who have conducted these preventive trials and shared their perspectives with PSMG. This paper has been heavily influenced by many leaders in the prevention and early intervention field, including Drs. Rick

References (148)

  • Asparouhov, T., Muthèn, B.O. Multilevel mixture models. In: Hancock, G.R., Samuelsen, K.M. (Eds.), Advances in Latent...
  • S.G. Baker et al.

    Simple adjustments for randomized trials with nonrandomly missing or censored outcomes arising from informative covariates

    Biostatistics

    (2006)
  • K. Bandeen-Roche et al.

    Latent variable regression for multiple discrete outcomes

    J. Am. Stat. Assoc.

    (1997)
  • J.D. Bauer et al.

    Distributional assumptions of growth mixture models: implications for overextraction of latent trajectory classes

    Psychol. Methods

    (2003)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • N. Breslow et al.

    Approximate inference in generalized linear mixed models

    J. Am. Stat. Assoc.

    (1993)
  • J. Brooks-Gunn et al.

    Do neighborhoods influence child and adolescent development?

    Am. J. Sociol.

    (1993)
  • C.H. Brown

    Comparison of mediational selected strategies and sequential designs for preventive trials: comments on a proposal by Pillow et al

    Am. J. Community Psychol.

    (1991)
  • C.H. Brown

    Analyzing preventive trials with generalized additive models

    Am. J. Community Psychol.

    (1993)
  • C.H. Brown

    Statistical methods for preventive trials in mental health

    Stat. Med.

    (1993)
  • C.H. Brown et al.

    Data analytic frameworks: analysis of variance, latent growth, and hierarchical models

  • C.H. Brown et al.

    Power calculations for data missing by design: applications to a follow-up study of lead exposure and attention

    J. Am. Stat. Assoc.

    (2000)
  • C.H. Brown et al.

    Prevention of aggressive behavior through middle school using a first grade classroom-based intervention

  • C.H. Brown et al.

    Principles for designing randomized preventive trials in mental health: an emerging developmental epidemiology paradigm

    Am. J. Community Psychol.

    (1999)
  • Brown, C.H., Wang, W., Guo, J., 2007b. Modeling variation in impact in randomized field trials. Technical report,...
  • Brown, C.H., Wang, W., Sandler, I., 2007c. Examining how context changes intervention impact: the use of effect sizes...
  • C.H. Brown et al.

    The role of randomized trials in testing interventions for the prevention of youth suicide

    Int. Rev. Psychiatry

    (2007)
  • C.H. Brown et al.

    Dynamic wait-listed designs for randomized trials: new designs for prevention of youth suicide

    Clin. Trials

    (2006)
  • A.S. Bryk et al.

    Application of hierarchical linear models to assessing change

    Psychol. Bull.

    (1987)
  • J.B. Carlin et al.

    A case study on the choice, interpretation and checking of multilevel models for longitudinal, binary outcomes

    Biostatistics

    (2001)
  • P. Chamberlain

    Treating Chronic Juvenile Offenders: Advances Made through the Oregon Multidimensional Treatment Foster Care Model

    (2003)
  • L.M. Collins et al.

    A comparison of inclusive and restrictive strategies in modern missing-data procedures

    Psychol. Methods

    (2001)
  • L.J. Cronbach

    The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles

    (1972)
  • T.J. Dishion et al.

    Peer group dynamics associated with iatrogenic effects in group interventions with high-risk young adolescents

    New Dir. Child Adolesc. Dev.

    (2001)
  • T.J. Dishion et al.

    When interventions harm: peer groups and problem behavior

    Am. Psychol.

    (1999)
  • C.E. Domitrovich et al.

    The study of implementation: current findings from effective programs that prevent mental disorders in school-aged children

    J. Ed. Psychol. Consult.

    (2000)
  • A. Donner et al.

    Design and Analysis of Cluster Randomization Trials in Health Research

    (2000)
  • P. Fisher et al.

    Diagnostic Interview Schedule for Children Users’ Manual

    (1992)
  • B.R. Flay et al.

    Standards of evidence: criteria for efficacy, effectiveness and dissemination

    Prev. Sci.

    (2005)
  • B.R. Flay et al.

    Historical review of school-based randomized trials for evaluating problem behavior prevention programs

    Annals Amer. Acad. Polit. Soc. Sci.

    (2005)
  • Forgatch, M.S., DeGarmo, D.S. Accelerating recovery from poverty: prevention effects for recently separated mothers. J....
  • C.E. Frangakis et al.

    Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes

    Biometrika

    (1999)
  • C.E. Frangakis et al.

    Principal stratification in causal inference

    Biometrics

    (2002)
  • L.M. Friedman et al.

    Fundamentals of Clinical Trials

    (1998)
  • R.D. Gibbons et al.

    Random effects probit and logistic regression models for three-level data

    Biometrics

    (1997)
  • R.D. Gibbons et al.

    Random regression models: a comprehensive approach to the analysis of longitudinal psychiatric data

    Psychopharmacol. Bull.

    (1988)
  • H. Goldstein

    Multilevel Statistical Models

    (2003)
  • M.S. Gould et al.

    Evaluating iatrogenic risk of youth suicide screening programs: a randomized controlled trial

    JAMA

    (2005)
  • J.W. Graham et al.

    How many imputations are really needed? Some practical clarifications of multiple imputation theory

    Prev. Sci.

    (2007)
  • J.W. Graham et al.

    Planned missing data designs in psychological research

    Psychol. Methods

    (2006)
  • Cited by (140)

    View all citing articles on Scopus
    View full text