Elsevier

Child Abuse & Neglect

Volume 73, November 2017, Pages 71-88
Child Abuse & Neglect

Full length article
Predicting child maltreatment: A meta-analysis of the predictive validity of risk assessment instruments

https://doi.org/10.1016/j.chiabu.2017.09.016Get rights and content

Abstract

Risk assessment is crucial in preventing child maltreatment since it can identify high-risk cases in need of child protection intervention. Despite widespread use of risk assessment instruments in child welfare, it is unknown how well these instruments predict maltreatment and what instrument characteristics are associated with higher levels of predictive validity. Therefore, a multilevel meta-analysis was conducted to examine the predictive accuracy of (characteristics of) risk assessment instruments. A literature search yielded 30 independent studies (N = 87,329) examining the predictive validity of 27 different risk assessment instruments. From these studies, 67 effect sizes could be extracted. Overall, a medium significant effect was found (AUC = 0.681), indicating a moderate predictive accuracy. Moderator analyses revealed that onset of maltreatment can be better predicted than recurrence of maltreatment, which is a promising finding for early detection and prevention of child maltreatment. In addition, actuarial instruments were found to outperform clinical instruments. To bring risk and needs assessment in child welfare to a higher level, actuarial instruments should be further developed and strengthened by distinguishing risk assessment from needs assessment and by integrating risk assessment with case management.

Introduction

Child maltreatment is a widespread phenomenon affecting the lives of millions of children all over the world (Stoltenborgh, Bakermans-Kranenburg, Alink, & IJzendoorn, 2015). In case of (suspected) child maltreatment, child welfare staff are asked to make extremely difficult decisions about whether, and how best, to intervene so that a child’s welfare is safeguarded (Arad-Davidson & Benbenishty, 2008; DePanfilis & Girvin, 2005; Munro, 1999; Pfister & Böhm, 2008). Identifying risks of maltreatment is of paramount importance in these decisions. In recent years, there has been a shift from using mainly unstructured clinical risk assessment to the widespread use of standardized risk assessment instruments (Munro, 2004, Tatara, 1996). Despite this shift, the development and evaluation of risk assessment instruments in the field of child protection is in its infancy. Risk assessment instruments are frequently implemented without proper empirical evaluation, and thus limited knowledge is available about their validity and effectiveness (Barlow, Fisher, & Jones, 2012; Knoke & Trocmé, 2005). Moreover, the child protection field is currently engaged in an intense debate about the most effective approach to assessing risks. However, the average performance of (different approaches to) risk assessment instruments is unknown, because meta-analyses evaluating the predictive accuracy of these instruments have not yet been performed in the child protection field. Therefore, the aim of the current study was to examine the overall predictive validity of risk assessment instruments for child maltreatment and to examine whether the overall predictive validity is influenced by study and instrument characteristics.

Currently, there are two main approaches to risk assessment in child welfare: the clinical and the actuarial (statistical) approach. In the actuarial approach, conclusions are based solely on empirically established relationships between risk factors and child maltreatment, whereas in the clinical approach, conclusions are based on the judgment of a professional who combines and weighs information in a subjective manner (Dawes, Faust, & Meehl, 1989). Clinical approaches can be further divided into consensus-based instruments and structured clinical judgment (SCJ) instruments. With consensus-based instruments, clinical professionals rate characteristics that are deemed relevant because of consensus among experts. Next, the professionals process these ratings in a subjective manner and come to a conclusion using their own judgment. Structured clinical judgment is a more recently developed method in which variables identified as risk factors in empirical research are assessed, but in which the weighting of risk factors as well as coming to the final decision is left to the professional. Several validation studies indicate that many implemented instruments perform questionably, especially instruments that are based on the clinical approach to risk assessment (see for example, Barlow, Fisher, & Jones, 2012; D’Andrade, Austin, & Benton, 2008; Knoke & Trocmé, 2005). Some studies have even shown that clinical methods, which are widely used in practice, do not perform better than chance, meaning that in half of the cases an incorrect risk estimate is made (Baird & Wagner, 2000; Barber, Shlonsky, Black, Goodman, & Trocmé, 2008; Van der Put, Assink, & Stams, 2016b). This leads to many inappropriate clinical decisions, resulting in unjustified out-of-home placements or recurrence of maltreatment. Therefore, it is essential to gain insight into which types of instruments perform well and which instrumental characteristics influence the predictive validity either positively or negatively.

The development of risk assessment instruments in the field of child welfare lags behind other disciplines, such as the field of criminal (youth) justice. In criminal justice, the literature identifies four generations of risk assessment instruments (Andrews & Bonta, 2010). Clinical instruments are considered the first generation of instruments and actuarial instruments the second generation. Third generation actuarial instruments have been developed incorporating dynamic as well as static risk factors, so that risk assessment can be distinguished from needs assessment. The newest, fourth generation actuarial risk assessment instruments serve not only as a guide for the professional in determining appropriate goals for intervention, but also as a guide in case management planning by offering the possibility of linking re-assessments to the initial assessment, service plans, and service delivery (Andrews & Bonta, 2010). Instruments used in child welfare can be classified into either the first or the second generation of instruments. In most of these instruments, risk assessment is not discriminated from needs assessment. Moreover, the needs assessment instruments that are available have mainly been developed on the basis of expert consensus and have not been subjected to sound empirical validation (Schwalbe, 2008).

As mentioned, there is an intense debate about which risk assessment approach is most effective in assessing the risk of child maltreatment, also referred to as the “risk assessment wars” (Johnson, 2006a; Johnson, 2006b; Morton, 2003; White & Wash, 2006). Earlier review studies on the predictive validity of risk assessment instruments for child maltreatment showed mixed results. D'andrade et al. (2008) summarized findings of research on seven risk assessment instruments and concluded that actuarial instruments appear to have greater predictive validity and inter-rater reliability than consensus-based instruments. Barlow et al. (2012) conducted a systematic review on the accuracy of risk assessment instruments for child maltreatment and identified 13 different tools. These authors concluded that there is currently limited evidence for the effectiveness of risk assessment instruments in the field of child protection. However, there is evidence supporting the use of one specific actuarial tool, the California Family Risk Assessment, particularly at referral or during initial assessment (Barlow et al., 2012). Bartelink, Van Yperen, and Ten Berge (2015) conducted a review of studies in which a comparison was made between the predictive accuracy of a) different risk assessment instruments or b) a risk assessment instrument and unstructured clinical judgment (i.e., not using an instrument at all). Based on this review, the authors concluded that: (a) actuarial instruments performed slightly better than consensus-based instruments, and that (b) the predictive validity of actuarial instruments did not outperform unstructured clinical judgment. However, the review of Bartelink and colleagues has been criticized by Van der Put, Assink, and Stams (2016a) because their decision to exclude articles reporting on the performance of individual instruments seems too restrictive. After all, studies comparing the predictive accuracy of at least two instruments for risk assessment using the same populations and outcome criteria are hardly available, as are studies in which the performance of a risk assessment instrument is compared to unstructured clinical judgment.

Until today, only qualitative reviews have examined the predictive accuracy of risk assessment instruments used in child protection. Because these reviews lack meta-analysis of quantitative data, it is not yet known how these instruments perform on average. Furthermore, some primary studies report very low predictive accuracies (see, for instance, Barber et al., 2008; Ondersma, Chaffin, Mullins, & LeBreton, 2005), whereas others report far better predictive accuracies (see, for instance, Loman & Siegel, 2004; De Ruiter, Hildebrand, & Van der Hoorn, 2012). Given this rather wide range, synthesizing data in a quantitative manner is essential to get insight in the overall predictive accuracy of risk assessment instruments. A second merit of a quantitative review is that it can reveal variables (such as instrument characteristics) that increase or decrease the overall accuracy, and thus act as moderators. Identifying moderators yields important knowledge that can be used in developing and/or improving risk assessment instruments. Therefore, the aim of the present study was to conduct a meta-analysis, in which we estimate the average predictive accuracy and identify variables that may influence this accuracy, such as approach to and focus of risk assessment. We believe that such a meta-analysis contributes to improving decision-making strategies in child welfare, and thus more effective child protection practices.

The following instrument characteristics were examined: type of risk assessment approach (actuarial, consensus-based, structured clinical judgment), length of instrument (number of items), type of assessor (professional, client (i.e., self-report), researcher, or computer system (i.e., automatic risk calculation based on variables stored in a computer database)), focus of risk assessment (recurrence of child maltreatment, onset of maltreatment, both/not specified) and related to focus. In addition, the following study design characteristics were examined: study design (retrospective versus prospective design), type of sample (clinical or non-clinical sample), sample used for validation (validation versus construction sample), length of follow-up (in months) and type of follow-up (number of months after assessment, number of months after case closure, both/not specified), type of outcome measure (for which the categories were derived from outcomes reported in primary studies), type of maltreatment (multiple forms, physical abuse, neglect, maltreatment not specified), publication year, sample size, and percentage of cultural minorities in the sample. Below, we elaborate on the rationale for testing these specific characteristics.

We expected actuarial methods to outperform clinical methods (both consensus-based and SCJ instruments) for two reasons. First, the mathematical features of actuarial methods ensure not only that solely variables with predictive value are part of the instrument, but also that these variables are weighted in accordance with their independent contribution to the outcome of interest (Dawes et al., 1989). Earlier studies showed that it is difficult for professionals to accurately predict an outcome of interest using their clinical judgment, because professionals are unable to focus on the most important factors nor to properly weigh the observed risk factors (Dawes, 1994, Dawes et al., 1989). Second, the reliability of actuarial instruments is higher than that of clinical methods and hence the actuarial prediction is more consistent and accurate (e.g., Dawes et al., 1989, Gambrill and Shlonsky, 2000). That is because risk factors in actuarial prediction are scored according to a fixed algorithm, meaning that professionals use the same objective scoring rules, regardless of the expertise of the professional. On the other hand, scoring risk factors in clinical methods is done subjectively (e.g., Dawes et al., 1989; Gambrill & Shonsky, 2000). Further, we expected SCJ instruments to outperform consensus-based instruments, because a sound empirical basis is lacking for the latter, whereas the former is partly based on empirical evidence.

The number of items a risk assessment instrument is comprised of was examined because the predictive validity may vary with the length of the instrument. Schwalbe (2007) conducted a meta-analysis on juvenile justice risk assessment instruments and found that brief instruments yielded smaller effect sizes than other types of instruments. In line with this result, we expected to find a negatively moderating effect of the number of items risk assessment instruments comprise of, as briefer instruments may be less capable of assessing all relevant risk factors than instruments of longer length. After all, both juvenile delinquency (Loeber, Slot, & Stouthaer-Loeber, 2008) and child maltreatment (Belsky, 1993) are determined by the presence and absence of multiple and varying risk and protective factors in children and different environmental systems around children.

Predictive validity may vary depending on the type of assessor (by a professional, by self-reporting, by a researcher, or automatic risk calculation based on variables stored in a computer database). We exploratively examined whether there was an effect of assessor type on predictieve validity, because no clear moderating evidence was found in previous studies.

Two types of risk assessment instruments can be distinguished: 1) instruments screening for maltreatment in the general population (onset of maltreatment); and 2) instruments assessing the risk of recurrence of maltreatment in populations already investigated by child protection services. The predictive validity may vary depending on the focus of an instrument, since the populations assessed, their risk of maltreatment, and (effects of) risk factors within populations may differ (Cash, 2001). Screening aims to assess the risk of child maltreatment in the general population in which the risk of child maltreatment is relatively small, whereas risk assessment aims to assess the risk of (repeated) child maltreatment in high-risk groups, such as families involved in child protection services. In scientific literature, there is particular emphasis on instruments assessing the risk of recurrence of child maltreatment, whereas screening instruments for assessing the risk of child maltreatment in the general population get far less attention (Barlow et al., 2012). The reason is that assessing the risk of recurrence of child maltreatment is the most commonly employed prognostic process in child welfare services. However, screenings instruments can be of great value for early prevention of child maltreatment. Related to this, we also tested whether estimates of predictive validity obtained in clinical samples differ from estimated obtained in non-clinical samples.

Whether a study has a prospective or retrospective design may influence predictive validity. Some researchers have argued that risk assessment instruments can be examined retrospectively, using file information from sources such as institutional files, psychological reports, and/or court reports (e.g., De Vogel, De Ruiter, Hildebrand, Bos, & Van de Ven, 2004). On the contrary, other researchers have argued that prospective research is required to adequately examine the predictive validity of a risk assessment tool (Caldwell, Bogat, & Davidson, 1988). Therefore, we examined the effect of study design on predictive validity.

In some studies, the predictive validity of an instrument is examined in the same sample that was used to construct the instrument, whereas in other studies, the predictive validity is examined in a sample independent of the construction sample. We expected the predictive validity to be lower in validation samples than in construction samples, because random sampling error arising from testing an instrument in a sample that differs from a construction sample, results in reduced predictive validity estimates. In fact, models built in a construction (or training) sample tend to “overfit” the data (i.e., capitalizing on random variation). Thus, predictive validity estimates reported for construction samples are commonly inflated.

The potential moderating effect of the follow-up length was examined, because the predictive validity may vary over time and differences in follow-up length are frequently observed between studies. As studies also use different types of follow-up (assessing the time after assessment, the time after case closure, or both/not specified), we also examined follow-up type as a potential moderator.

Studies on the predictive validity of risk assessment instruments vary in the outcome that is predicted. We examined whether the predictive validity of instruments is influenced by type of outcome (new reports, investigations, substantiated maltreatment, supervision orders, out-of-home placements, recidivism/relapse), and type of abuse that is predicted (physical abuse, neglect, sexual abuse, and child abuse in general).

A number of additional variables were exploratively tested as potential moderating variables of the predictive validity of risk assessment instruments for child maltreatment. These variables were: the type of maltreatment assessed in primary validity studies, the publication year of primary studies, the size of the samples used in primary studies, and the percentage of cultural minorities in samples of primary studies.

In summary, despite the widespread use of risk assessment instruments in child welfare, it is unclear how well these instruments generally perform and whether the predictive validity is influenced by study and instrument characteristics. This knowledge is not only scientifically important, but also for clinical practice, as it provides guidance on implementing the most effective risk assessment tools. Consequently, this review may contribute to decreasing the number of inappropriate decisions in child protection, resulting in less unjustified out-of-home placements and les recurrences of maltreatment. A three-level random-effects meta-analysis was performed to estimate the overall predictive validity of risk assessment instruments for child maltreatment and to identify variables that moderate this predictive validity.

Section snippets

Review protocol

The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Statement (Moher, Liberati, Tetzlaff, & Altman, 2009) was followed in the present meta-analysis.

Sample of studies

For selecting relevant studies, several criteria were formulated. First, we selected studies that examined the predictive validity of risk assessment instruments that were specifically developed for the prediction of one or more forms of child maltreatment (physical abuse, sexual abuse, and neglect) in the (near) future.

Descriptive characteristics, central tendency, and variability

The present study included 30 studies (k) published between 1978 and 2016 (median publication year is 2005). In total, these studies reported on validation research of 27 different risk assessment instruments, from which 67 effect sizes could be extracted. Each effect size represented the discriminative accuracy of a particular risk assessment instrument or a statistical predictive model that was used for the purpose of risk assessment. An overview of all risk assessment instruments that have

Discussion

This meta-analysis investigated the predictive validity of risk assessment instruments for child maltreatment, and whether this is influenced by characteristics of instruments, studies, and samples. Overall, a significant medium effect was found (AUC = 0.681), indicating a moderate predictive accuracy of risk assessment instruments. This overall effect is comparable with effects sizes found in meta-analyses on risk assessment instruments in (juvenile) justice settings. For example, Schwalbe (2007)

Conclusion

The present study is the first meta-analysis on the predictive validity of risk assessment instruments for child maltreatment, with the aim to learn more about the general effectiveness of these instruments and about the characteristics that influence the predictive validity. This study showed that the discriminative accuracy of actuarial instruments is better than the discriminative accuracy of both consensus-based instruments and structured clinical judgment instruments, and therefore we

References* (110)

  • D. DePanfilis et al.

    Investigating child maltreatment in out-of-home care: Barriers to good decision-making

    Children & Youth Services Review

    (2005)
  • S. Dorsey et al.

    Caseworker assessments of risk for recurrent maltreatment: Association with case-specific risk factors and re-reports

    Child Abuse & Neglect

    (2008)
  • D. Finkelhor et al.

    Lifetime assessment of poly-victimization in a national sample of children and youth

    Child Abuse & Neglect

    (2009)
  • E. Gambrill et al.

    Risk assessment in context

    Children and Youth Services Review

    (2000)
  • R.M. Gershater-Molko et al.

    Assessing child neglect

    Aggression and Violent Behavior

    (2003)
  • H. *Horikawa et al.

    Development of a prediction model for child maltreatment recurrence in Japan: A historical cohort study using data from a Child Guidance Center

    Child Abuse & Neglect

    (2016)
  • B.Q. Jenkins et al.

    The complexity of child protection recurrence: The case for a systems approach

    Child Abuse & Neglect

    (2017)
  • W. *Johnson et al.

    Child abuse/neglect risk assessment under field practice conditions: Tests of external and temporal validity and comparison with heart disease prediction

    Children and Youth Services Review

    (2015)
  • W. Johnson

    Post-battle skirmish in the risk assessment wars: Rebuttal to the response of Baumann and colleagues to criticism of their paper: Evaluating the effectiveness of actuarial risk assessment models

    Children and Youth Services Review

    (2006)
  • W.L. *Johnson

    The validity and utility of the California Family Risk Assessment under practice conditions in the field: A prospective study

    Child Aabuse & Neglect

    (2011)
  • G. *Lealman et al.

    Prediction and prevention of child abuse—an empty hope?

    The Lancet

    (1983)
  • H.L. MacMillan et al.

    Reported contact with child protection services among those reporting child physical and sexual abuse: results from a community survey

    Child Abuse & Neglect

    (2003)
  • E. Munro

    Common errors of reasoning in child protection work

    Child Abuse & Neglect

    (1999)
  • E. Munro

    A simpler way to understand the results of risk assessment instruments

    Children and Youth Services Review

    (2004)
  • S. *Murphy et al.

    Prenatal prediction of child abuse and neglect: A prospective study

    Child Abuse & Neglect

    (1985)
  • R. Rosenthal

    The file drawer problem and tolerance for null results

    Psychological Bulletin

    (1979)
  • C.S. Schwalbe

    Strengthening the integration of actuarial risk assessment with clinical judgment in an evidence based practice framework

    Children and Youth Services Review

    (2008)
  • A. Shlonsky et al.

    The next step: Integrating actuarial risk assessment and clinical judgment into an evidence-based practice framework in CPS case management

    Children and Youth Services Review

    (2005)
  • I.I. *Staal et al.

    Risk assessment of parents’ concerns at 18 months in preventive child health care predicted child abuse and neglect

    Child Abuse & Neglect

    (2013)
  • S.A. Stowman et al.

    Assessing child neglect: A review of standardized measures

    Aggression and Violent Behavior

    (2005)
  • R. *Vaithianathan et al.

    Children in the public benefit system at risk of maltreatment: Identification via predictive modeling

    American Journal of Preventive Medicine

    (2013)
  • S. Aegisdóttir et al.

    The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction

    The Counseling Psychologist

    (2006)
  • D.G. Altman et al.

    Diagnostic tests 3: Receiver operating characteristic plots

    British Medical Journal

    (1994)
  • D.A. Andrews et al.

    The psychology of criminal conduct

    (2010)
  • B. Arad-Davidson et al.

    The role of workers' attitudes and parent and child wishes in child protection workers' assessments and recommendation regarding removal and reunification

    Children and Youth Services Review

    (2008)
  • M. Assink et al.

    Fitting three-level meta-analytic models in R: A step-by-step tutorial

    The Quantitative Methods for Psychology

    (2016)
  • M. *Assink et al.

    The development and validation of the youth actuarial care needs assessment tool for non-Offenders (Y-ACNAT-NO)

    BMC Psychiatry

    (2015)
  • J.G. *Barber et al.

    Reliability and predictive validity of a consensus-based risk assessment tool

    Journal of Public Child Welfare

    (2008)
  • J. Barlow et al.

    Systematic review of models of analyzing significant harm

    (2012)
  • C. *Bartelink et al.

    Betrouwbaarheid en validiteit van de LIRIK: Eindrapport LIRIK Valideringsonderzoek

    (2015)
  • J. Belsky

    Etiology of child maltreatment: A developmental ecological analysis

    Psychological Bulletin

    (1993)
  • R.A. Caldwell et al.

    The assessment of child abuse potential and the prevention of child abuse and neglect: A policy analysis

    American Journal of Community Psychology

    (1988)
  • M.J. *Camasso et al.

    Prediction accuracy of the Washington and Illinois risk assessment instruments: An application of receiver operating characteristic curve analysis

    Social Work Research

    (1995)
  • M.W.L. Cheung

    Modeling dependent effect sizes with three-level meta-analyses: A structural equation modeling approach

    Psychological Methods

    (2014)
  • A. D'andrade et al.

    Risk and safety assessment in child welfare: Instrument comparisons

    Journal of Evidence-Based Social Work

    (2008)
  • E.W. *Dankert et al.

    risk assessment validation: A prospective study

    (2014)
  • R.M. Dawes et al.

    Clinical versus actuarial judgment

    Science

    (1989)
  • R.M. Dawes

    House of cards: Psychology and psychotherapy built on myth

    (1994)
  • C. *De Ruiter et al.

    Gestructureerde risicotaxatie bij kindermishandeling: De Child Abuse Risk Evaluation-Nederlandse versie [Structured risk assessment for child maltreatment: The Dutch version of the Child Abuse Risk Evaluation (CARE-NL)]

    Psychologie

    (2012)
  • V. De Vogel et al.

    Type of discharge and risk of recidivism measured by the HCR-20: A retrospective study in a Dutch sample of treated forensic psychiatric patients

    International Journal of Forensic Mental Health

    (2004)
  • Cited by (53)

    • Child maltreatment

      2023, Encyclopedia of Mental Health, Third Edition: Volume 1-3
    View all citing articles on Scopus
    *

    References marked with an asterisk were included in the meta-analysis

    View full text