Scolaris Content Display Scolaris Content Display

Diagnostic instruments for autism spectrum disorder (ASD)

This is not the most recent version

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

Primary objective

Secondary objective

  • To compare the diagnostic test accuracy of clinical interviews with parents/primary caregivers or affected adults with the diagnostic test accuracy of direct observation measures. In addition, we will compare whether a combination of interview and direct observation measures improves diagnostic test accuracy.

Background

Autism spectrum disorders (ASDs) are neurodevelopmental disorders that are characterised by the core symptoms of impairment in social interaction and communication as well as restricted, repetitive behaviours (APA 2013; Remschmidt 2012). High phenotypic variability exists within the spectrum with regard to, for example, language or cognitive abilities, and severity of core symptoms. This heterogeneity is relevant for intervention planning and long‐term outcomes (see, for example, Howlin 2014; Stahmer 2011). In addition, a high rate of comorbid disorders is found (see, for example, Levy 2010; Melville 2008; Simonoff 2008).

Prevalence estimates for ASD have been rising constantly over recent years, a finding which has been attributed mainly to the inclusion of all disorders of the spectrum into autism prevalence studies (Elsabbagh 2012). This inclusion of the spectrum is strongly justified by studies on comparable long‐term outcomes of, for example, high‐functioning autism and Asperger's syndrome (Howlin 2003); similar therapeutic options for all ASD (see, for example, Freitag 2014a; Oono 2013; Reichow 2012b); and, despite within‐disorder heterogeneity, overlapping aetiology and underlying neurobiology across the diagnoses of the International Classification of Diseases, Tenth Revision (ICD‐10; WHO 1992), and the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM‐IV TR; APA 2000) (see, for example, Willsey 2015). Given the nature of ASDs as pervasive developmental disorders, diagnostic instruments need to take into account developmentally‐caused behaviour changes, from toddler to adulthood.

Since the 1980s, several diagnostic instruments have been developed, which are the focus of this review. Professionals as well as patients and parents are left with numerous single diagnostic validity studies of often insufficient quality. In clinical practice, this can result in incorrect use and interpretation of the results of diagnostic instruments. A valid, early and time‐saving diagnostic process is clinically important for several reasons:

  1. To allow families and children the necessary access to (early) intervention;

  2. To produce a minimum number of false positives, which avoids unnecessary intervention as well as unnecessary apprehension and fear among parents and patients; this is particularly important because ASDs are regarded as lifelong disorders; and

  3. To use resources effectively and allow for additional cognitive and language testing as well as diagnosis of comorbid psychiatric and medical disorders. The latter are not covered by this review, but need to be included in a comprehensive diagnostic work‐up (Baird 2011; Pilling 2012).

Thus, to guide clinicians with regard to the best available diagnostic instruments regarding core ASD symptoms, the objective of this systematic review is: to determine which diagnostic instruments, alone or in combination, show the highest diagnostic accuracy for ASD in children, adolescents, and adults.

Target condition being diagnosed

Autism, Asperger's syndrome, and atypical autism are severe neurodevelopmental disorders characterised by varying degrees of impairment in social interaction, communication, and stereotyped and repetitive behaviours and interests (Volkmar 2009). The disorders lie on a continuum of severity, and diagnostic criteria overlap to a great extent (Klin 2005). Therefore, all disorders of the autism spectrum are included in this review.

According to the ICD‐10, autism spectrum disorders (ASDs) comprise "F84.0 Childhood autism", "F84.1 atypical autism", and "F84.5 Asperger's syndrome" (WHO 1992). The former DSM‐IV‐TR diagnoses of "299.00 Autistic Disorder", "299.10 Childhood Disintegrative Disorder", "299.80 Asperger’s Disorder", and "299.80 Pervasive Developmental Disorder ‐ Not Otherwise Specified" (PDD‐NOS) (APA 2000), are combined into one diagnosis of "Autism Spectrum Disorder" in the current DSM‐5 (APA 2013). This systematic review on available diagnostic instruments for ASD will take the ICD‐10 diagnoses of "F84.0 Childhood autism", "F84.1 Atypical autism", and "F84.5 Asperger's syndrome" as the primary reference standard (WHO 1992). It will also include studies based on ICD‐9 (WHO 1979), DSM‐III‐R (APA 1987), and DSM‐IV‐TR (APA 2000) diagnoses of the above‐mentioned disorders, because there has been a strong continuum of diagnostic criteria from ICD‐9 to ICD‐10, and from DSM‐III‐R to DSM‐IV‐TR, the latter of which are strongly comparable to the ICD‐10 (WHO 1992; see Appendix 1). Currently, diagnostic instruments are being developed for the DSM‐5 interpretation of ASD (APA 2013), based on instruments for DSM‐IV‐TR or ICD‐10 defined disorders (APA 2000; WHO 1992 respectively). Given that many of the ICD‐10 or DSM‐IV‐TR criteria can be related to DSM‐5 criteria (see Appendix 1; Freitag 2014b), the present systematic review will also help to improve the diagnostic standard for the DSM‐5 (APA 2013).

A recent worldwide epidemiological review by Elsabbagh 2012 estimated a prevalence between 2.8 to 94 per 10,000 people with a median of 17 per 10,000. An older study estimated a combined prevalence of about 1.2% for autism, Asperger's syndrome, and atypical autism or PDD‐NOS, with about 45% of 8‐ to 10‐year‐old children with ASD showing an intelligence quotient (IQ) higher than 70 (Baird 2006). A similar prevalence rate (> 1%) was observed for adults living in the community in the United Kingdom (UK) (Brugha 2011). Prevalence estimates have been rising constantly, and this has been attributed predominantly to the broadening of diagnostic criteria, better awareness of the disorders, and, to a lesser extent, to a true increase of the disorders in the general population (Fombonne 2006).

Pharmacological therapeutic options for ASD target stereotyped behaviour and comorbid behavioural problems, for example, hyperactivity, aggression or sleeping problems (Freitag 2014a). Non‐pharmacological therapeutic options cover behaviorally‐based early interventions for toddlers and preschool‐aged children; group‐based therapy for school‐aged children, adolescents and adults; and psychoeducational parent training (NCCMH 2013; NICE 2012; Reichow 2012a; Reichow 2012b; Warren 2011).

ASD are, as a rule, diagnosed by trained clinicians via direct behavioural observation or interviews with parents/primary caregivers or adult patients, or both. Baird 2006 found high (weighted agreement 93%) inter‐rater reliability of trained clinicians’ diagnoses of autism and autism spectrum with that of a research team's diagnoses according to ICD‐10, which supports clinical diagnosis as a valid means of defining the target condition. The differential diagnosis of ASD and intellectual disability, attention‐deficit hyperactivity disorder (ADHD), conduct disorder, social phobia, selective mutism and other anxiety disorders, obsessive compulsive disorder or different personality disorders is particularly complex. However, good diagnostic instruments should be able to differentiate ASD from other psychiatric disorders.

Due to the complexity and severity of ASD, it is of utmost importance that professionals providing diagnostic and therapeutic services to individuals with autism, Asperger's syndrome or atypical autism and their families, use appropriate instruments and know how to correctly interpret the results of diagnostic procedures. A systematic review (including quality criteria and a meta‐analysis) on all available parent/caregiver interviews, patient interviews, and direct observation instruments will therefore have a direct impact on clinical practice and services for individuals with autism, Asperger's syndrome or atypical autism, and their families.

Index test(s)

In this planned systematic review, we will include all diagnostic instruments for autism, Asperger's syndrome, atypical autism or autism spectrum disorder (ASD).

In preparation for this review, a pilot systematic search for instruments was done, in order to identify the list of all eligible index tests. The pilot search was conducted in two steps: first, we searched for all instruments concerning ASD, and second we searched for test accuracy studies for each of the instruments found by our first search (Appendix 2). The first search, concerning available diagnostic instruments, was restricted to English and German articles. This pilot search (conducted in June 2013) resulted in a list of 14 diagnostic instruments for ASD. Eleven of these instruments will be included in this review. The remaining three instruments are not used for diagnostic purposes, but rather to measure specific ASD symptom severity or report the longitudinal course of the disorders. Of the 11 diagnostic instruments (which are either direct observation measures or interviews with parents/primary caregivers or adult patients), the most current versions will be included in the review.

Below, we present the instruments that were derived from the search for diagnostic instruments and which will be included in this systematic review. Different versions (short form, other algorithms, etc.) are also reported below the main version.

Interview with parent/primary caregiver

1. The Developmental, Dimensional, and Diagnostic Interview

The Developmental, Dimensional, and Diagnostic Interview (3di; Skuse 2004) is a computer‐ and investigator‐based interview with parents/caregivers that is performed by trained clinicians. It contains 740 items, which are answered in a modular fashion; some items are obligatory and some can be skipped by objective criteria. It comprises 183 items concerning demographic background, 266 items concerning ASD symptoms, and 291 items that are related to possible comorbidity. The interview is neither fully standardised nor semi‐structured but a hybrid of both. Answers are coded as 0 = "no such behaviour", 1 = "minimal evidence of such behaviour", and 2 = "definite or persistent evidence of such behaviour". Due to computerisation, a full report containing detailed quantified symptom profiles is available immediately after completion (Skuse 2004). Questions relate to social interaction, communication abilities, and stereotyped behaviour. Other modules are concerned with comorbidity, and are therefore optional. The diagnostic algorithm for ASD is calculated on the basis of 112 items (Santosh 2009).

As the 3di is a third‐party assessment it can be used for individuals with ASD from early childhood through to adulthood. Length of assessment varies between 1.5 and 2 hours. In order to save time, a pre‐interview package was designed, reducing interview duration to 45 minutes only.

The Developmental, Dimensional, and Diagnostic Interview ‐ Short Version

Due to the time‐intensive assessment of over 700 items, a short version of the 3di was developed (3di‐sv; Santosh 2009). It contains 53 items and considers ASD items only. The 3di‐sv shows high agreement to the original 3di and the ADI‐R (Lord 1994). There are two studies that examined best score cut‐offs (Chuthapisith 2012; Santosh 2009). Table 1 illustrates their results.

Table 1: Best cut‐off scores for the 3di‐sv

Santosh 2009

Chuthapisith 2012

Social interaction

11.5

10

Communication

8

8

Stereotyped behaviour

5

3

2. The Autism Diagnostic Interview ‐ Revised

The Autism Diagnostic Interview ‐ Revised (ADI‐R; Lord 1994) is a standardised, investigator‐based semi‐structured interview containing 93 items related to behaviour at different ages, which are scored between 0 and 3, plus additional scoring possibilities. The rating scale includes the following options:

  • 0 = "Behaviour of the type specified in the coding is not present";

  • 1 = "Behaviour of the type specified is present in an abnormal form, but not sufficiently severe or frequent to meet the criteria for two";

  • 2 = "Definite abnormal behaviour"; or

  • 3 = "Extreme severity of the specified behaviour".

In some cases there are these additional options too:

  • 7 = "Definite abnormality in the general area of the coding, but not of the type specified";

  • 8 = "Not applicable"; or

  • 9 = "Not known or asked".

The ADI‐R provides dimensional measures with cut‐offs for four domains: reciprocal social interaction, language and communication, stereotyped repetitive behaviours or interests, and age‐of‐onset criteria. Cut‐off scores are 10 points for the social interaction subscale, 8 points for language and communication if the child is verbal and 7 points if the child is non‐verbal, 3 points for restricted and repetitive behaviour, and 1 point for age of onset (Lord 1994). Meeting these criteria qualifies for a diagnosis of "F84.0 Childhood autism" (WHO 1992) or "299.00 Autistic Disorder" (APA 2000).

The interview takes two to three hours to complete (McClintock 2011). It can be used for individuals with a mental age of at least 24 months.

3. The Asperger Syndrome Diagnostic Interview

The Asperger Syndrome (and high‐functioning autism) Diagnostic Interview (ASDI; Gillberg 2001) is a short, standardised interview that is usually conducted with a parent or caregiver. The interviewee should know the person in question very well and should have known him or her during infancy. It includes 20 different items, which are divided into 6 groups:

  1. Impairment in social interaction (four items);

  2. Restricted interests (three items);

  3. Routines (two items);

  4. Verbal and speech problems (five items);

  5. Non‐verbal communication problems (five items); and

  6. Motor clumsiness (one item).

It is a highly structured measure which scores answers between one and three (1 = "does not apply", 2 = "applies sometimes or somewhat", 3 = "definitely applies"). A later version of the interview, mostly used in clinical practice, combines the last two categories into one (0 = "does not apply" and 1 = "applies to some degree or very much"). For a diagnosis of Asperger's syndrome or high‐functioning autism, at least one item must be positively rated in each of the symptom groups two, three, five and six, along with two items in group one and three items in group four.

4. The Diagnostic Interview for Social and Communication Disorders

The Diagnostic Interview for Social and Communication Disorders (DISCO; Leekam 2002; Maljaars 2011; Wing 2002) is a semi‐structured interview which is used with parents and caregivers. It consists of more than 300 questions but only 93 are taken into account for diagnosis. A numerical code is used for rating: 0 = "severe problem", 1 = "minor problem" and 2 = "no problem" – lower test scores therefore indicate more symptoms. Thirty‐eight items refer to impairments in social interaction, 15 items to impairments in communication, and 29 items refer to stereotyped behaviours. The interview is used to diagnose autistic disorder, Asperger’s syndrome, psychiatric disorders, and other developmental disabilities. The DISCO does not have age or ability restrictions. The application time takes two to three hours.

5. Autism Spectrum Disorder – Diagnosis Scale for Intellectually Disabled Adults

The Autism Spectrum Disorder – Diagnosis Scale for Intellectually Disabled Adults (ASD‐DA; Matson 2007a) is a structured interview that is used especially to distinguish adults with intellectual disability (ID) and ASD from adults with ID only. It is administered with direct‐care staff who have worked with the individual for at least six months (Matson 2007a). The scale measures autism, PDD‐NOS and Asperger's syndrome across the life span (Matson 2008), and is designed for individuals aged at least 16 years (Belva 2012; Matson 2008). The ASD‐DA consists of 31 items scored as either 0 = "not different, no impairment" or 1 = "different, some impairment" (Matson 2008). The staff member should compare the assessed individual to others of similar age living in the community. The factor structure corresponds with the three core symptom groups of ASD (social impairment, impairment in communication, and restricted interests/bizarre sensory responses; Matson 2007a). A cut‐off score ≥ 19 indicates comorbid ASD while a score < 19 indicates that the adult has ID without ASD (Matson 2007b). Scores of 11 on Factor I (social impairment) and 8 on Factor III (restricted behavior) were established as cut‐offs for distinguishing between autism and PDD‐NOS; the PDD‐NOS group having lower scores because of milder symptomatology (Matson 2007b). The test can be administered in approximately 10 minutes.

6. The Autistic Behavior Interview

The Autistic Behavior Interview (ABI; Cohen 1993) is a structured, diagnostic interview that is conducted by a trained interviewer with a parent or caregiver. It consists of 28 subscales, each with 6 items (168 items in total), which cover a range of appropriate and inappropriate behaviours. The interviewer rates how often the patient displays these behaviours on a scale of zero to three (0 = "never", 1 = "rarely emerging", 2 = "sometimes/partially", 3 = "often/typically").

Interview with the affected adolescent or adult

The Adult Asperger Assessment

The Adult Asperger Assessment (AAA; Baron‐Cohen 2005) is a semi‐structured interview developed for use in the diagnosis of Asperger's syndrome or high functioning autism (HFA) in adults. It is a clinical interview with the affected individual and a relative/informant that consists of three parts and is linked with two screening instruments: the Autism Spectrum Quotient (AQ) and the Empathy Quotient (EQ). The AQ and the EQ should be completed by the patient before the clinician‐conducted interview, when the item answers of the AQ and EQ are validated by examples. The AAA interview itself includes five sections, which describe a group of symptoms (A to D; 18 items), and a final part with five prerequisites (E). The symptom groups are:

  • A: Qualitative impairment in social interaction (four APA 1994 symptoms plus one extra symptom);

  • B: Restricted, repetitive and stereotyped behaviour interaction (four APA 1994 symptoms plus one extra symptom);

  • C: Qualitative impairment in verbal or non‐verbal communication (five symptoms, neither one DSM‐IV criteria); and

  • D: Impairments in imagination (not listed as criterion for autistic disorder in ICD‐10 (WHO 1992), or APA 1994).

Each AAA item is scored on a dichotomous scale (yes or no). In order to meet the criteria for a APA 1994 diagnosis of Asperger's syndrome at least three of the five symptoms from the first three subscales of the AAA, along with one of the three symptoms from the imagination subscale, and all five prerequisites must be rated as present (Baron‐Cohen 2005). The assessment can be used from 16 years of age and takes around three hours to complete.

Direct behavioural observation by a trained clinician

The Autism Diagnostic Observation Schedule ‐ Generic (ADOS‐G; Lord 1989; Lord 2000), published by Western Psychological Services (WPS) as ADOS, is a semi‐structured, observer‐based instrument that can only be performed by a trained clinician. Standardised play and communication sessions are administered, and the observed behaviour is rated between zero and three by the trained professional. Four different modules are available, which are tailored to the individual's level of expressive language skills and cognitive development. Module one (29 items) is administered with children who use little or no phrase speech, whereas module two (28 items) is used with children that do use phrase speech, but not fluently. Module three (28 items) can be used with children and young adolescents that are fluent, and module four (31 items) with verbally fluent adolescents and adults. The diagnosis of autism or autism spectrum disorder is achieved by module‐specific algorithms (see Appendix 3). The behaviours are coded in the ADOS as:

  • A: Language and communication;

  • B: Reciprocal social interaction;

  • C: Imagination;

  • D: Stereotyped behaviours and restricted interests; and

  • E: Other abnormal behaviours.

The ADOS ‐ Revised Algorithms for modules one to three were developed to increase specificity in classifying ASD versus non‐ASD in lower functioning populations (Gotham 2007). The resulting improvement was taken up in the algorithms of the second version (ADOS‐2; Lord 2012a).

The ADOS ‐ Toddler Module (ADOS‐T; Luyster 2009; Lord 2012b), and the ADOS ‐ Revised Algorithms for modules one to three coupled with the original module four of the ADOS were combined into ADOS‐2.

  • Toddler Module: for children between 12 and 30 months of age who do not consistently use phrase speech;

  • Module one: for children aged 31 months and older who do not consistently use phrase speech (34 items);

  • Module two: for children of any age who use phrase speech but are not verbally fluent (29 items);

  • Module three: for verbally fluent children and young adolescents (29 items); and

  • Module four: for verbally fluent older adolescents and adults (32 items).

The Toddler Module has a similar basic structure to module one of the original ADOS and was adapted for use in children under 30 months old with a non‐verbal mental age of at least 12 months. The Toddler Module has 11 items, each coded on a 4‐point numerical scale, ranging from 0 ("no evidence of abnormality related to autism") to 3 ("definite evidence") (Luyster 2009).

The different modules take between 30 and 60 minutes to conduct. Behaviours are ordered similarly to the ADOS‐G, from A (social interaction) to E (other abnormal behaviours); cut‐offs are shown in Appendix 3.

2. The Autism Spectrum Disorder ‐ Observation for Children

The Autism Spectrum Disorder ‐ Observation for Children (ASD‐OC; Neal 2012a; Neal 2012b) is a semi‐structured observation scale consisting of 45 items that can be used to observe and code autistic symptoms. All core features of ASD are included: socialisation impairments, communication deficits and repetitive or restricted interests, including sensory abnormalities. After a period of interactive play, the observer rates the child’s behaviour, comparing it to that of his/her age group. Each item is given a numerical score of '0' (indicating no impairment), '1' (indicating mild impairment), or '2' (indicating severe impairment) (Neal 2014). The assessment of the ASD‐OC does not require specialist training, but the clinician should at least have good knowledge of, and experience with, symptoms of ASD.

3. Behaviour Observation Scale for Autism

Behaviour Observation Scale for Autism (BOS; Freeman 1978; Freeman 1980) is a direct observation measure for trained clinicians that can be used with autistic children, children who are intellectually disabled, and typically developing children. It is designed to distinguish between following subgroups: autistic children and children with other syndromes such as intellectual disability, language impairment, and specific sensory deficits (Freeman 1978). The BOS includes 67 defined behaviours, which are presented in checklist form and completed in nine, three‐minute intervals by an observer. The occurrence of behaviours is rated as follows: 0 = "the behaviour did not occur in the three‐minute period", 1 = "it occurred once", 2 = "it occurred twice", or 3 = "the child engaged in the behaviour more or less continuously throughout the three‐minute interval". Factor analyses have shown that the majority of items can be assigned to the following four categories: interaction with people, interaction with objects, solitary behaviours, and response to stimuli (Freeman 1980).

Combination of interview and direct observation

Childhood Autism Rating Scale ‐ Second Edition

The original Childhood Autism Rating Scale ‐ Second Edition Standard Version (CARS‐2‐ST; Schopler 1980; Vaughan 2011) is a behaviour rating scale that consists of 15 items that are scored on a continuum from normal to severely abnormal. The standard version is used with children younger than six years of age, or those with communication difficulties or below average estimated IQs.

In order to develop a test with a higher diagnostic accuracy for children with high‐functioning autism or Asperger's syndrome, the High Functioning (HF) autism scale was added to the second version (CARS‐2‐HF; Vaughan 2011). It is used in the assessment of verbally fluent individuals aged 6 years or older with an IQ higher than 80.

Both forms include 14 domains: nonverbal communication, verbal communication, relating to people, anxiety reaction, adaptation to change, imitation, affect, use of body, relation to non‐human objects, visual responsiveness, auditory responsiveness, near receptor responsiveness, activity level, and intellectual functioning. It was designed to distinguish between autism and other developmental disorders (Klinger 2000). A score of “1” is an indicator for behaviour within appropriate limits for the child’s age group, a score of “2” for mildly inappropriate, “3” for moderately unsuitable, and “4” for severely unsuitable behaviour (Schopler 1980). The total score can classify the child as not autistic (below 30), mild or moderately autistic (30 to 36.5), or severely autistic (above 36.5).

The CARS‐2 can be completed by a clinician, teacher or parent. It is based on subjective observations of the child's behaviour. Ratings can be made from various settings such as psychological testing or classroom participation, parent reports about their children, comprehensive clinical records, or a combination of these sources (Schopler 2010). The CARS‐2 can be used with children that are at least two years old. Administration takes between 5 and 15 minutes.

Clinical pathway

The standard clinical pathway for children, adolescents, and adults differs across health systems (see, for example, NCCMH 2013; NICE 2011; NICE 2012). In some countries, the general practitioner refers patients to specialised services after screening. At these specialised services, the instruments described in this review are often used. In other countries, where doctors or other health professionals can be approached directly by the patient and her/his family, many more different professions use these diagnostic instruments, often without previous screening information. Patients with developmental delay, language delay, social interaction problems, and stereotyped, repetitive and restrictive behaviours, and also sometimes aggressive behaviours, are usually seen to rule out or confirm a diagnosis of ASD. As a rule, one or two of the index tests of this review are performed to diagnose autism, Asperger's syndrome or atypical autism, but there is no standardisation of instruments and settings across different healthcare systems or professionals. A recent Swedish study showed that a multidisciplinary team assessment leads to more accurate clinical diagnoses than individual clinical judgement (Westman 2013), and therefore this approach is currently recommended as best practice for the diagnostic process of ASD (see also Filipek 2000; Mahoney 1998; NCCMH 2013; NICE 2011; NICE 2012; SIGN 2007). Furthermore, a combination of parent/caregiver information, ideally assessed by a standardised instrument, with results from standardised direct observation and other additional, independent information from school teachers, partner or public authorities is common (see, for example, Baird 2006). ASD cannot yet be diagnosed by objective methods such as biomarkers. A diagnosis of ASD in most countries will allow patients to seek and obtain ASD‐specific therapeutic interventions.

Alternative test(s)

As we aim to cover all interviews and observational diagnostic tests, questionnaires are the only alternative tests that remain (for example, Asperger Syndrome Diagnostic Scale (ASDS, Myles 2000); Autism Spectrum Disorder ‐ Diagnosis for Child (ASD‐DC, Matson 2009); Social Communication Questionnaire (SCQ, Berument 1999); Social Responsiveness Scale (SRS, Constantino 2005)). Questionnaires are often used as screening instruments because they take less time to administer than interviews and clinical observations. Recent clinical guidelines that examined the validity, benefit and applicability of screening instruments for clinical practice showed that there was no questionnaire that could be recommended as a diagnostic test, considering the test accuracy and quality of the respective studies (NICE 2011; NICE 2012; SIGN 2007). In addition, these guidelines also recommended a combination of: (1) parent/caregiver or spouse information and self‐report, with (2) direct behavioural observation. The latter is not provided by questionnaires.

Rationale

The latest comprehensive review of psychometric screening and diagnostic tests was published in 1999 (Filipek 1999). Since then, several additional studies on existing and new diagnostic instruments for autism, Asperger's syndrome, and atypical autism have been published. No systematic review assessing the clinical utility and quality criteria of test accuracy studies, comparing different diagnostic tests, and performing meta‐analyses on test accuracy data, has been performed to date. A systematic review of the diagnostic accuracy of some of the instruments included in this review is planned by Samtani 2011, but only for diagnosing toddlers. Furthermore, that review differs from our approach by only including the following diagnostic interviews: ADI‐R (Lord 1994), CARS (Schopler 1980), DISCO (Leekam 2002; Maljaars 2011; Wing 2002), and 3di (Skuse 2004); and the following behavioural observation methods: ADOS‐G (Lord 1989; Lord 2000), and CARS (Schopler 1980). In contrast, we will cover all diagnostic interviews and observational measures that have been published thus far. We aim to provide practitioners with information on, and comparison of, all existing diagnostic tools, to enable clinicians to make the best evidence‐based decisions on training and use of respective diagnostic instruments. Given the varying duration of and training needed to implement the different instruments, it is very important for clinicians to be trained in, and to implement, the best and most cost‐effective diagnostic assessment in their clinical practice. In addition, we will include the most recent version of the instruments. Therefore, we will report data on ADOS‐2 (including the new Toddler Module, Vaughan 2011), CARS‐2 (Vaughan 2011), and the DISCO‐11 (Leekam 2002; Maljaars 2011; Wing 2002). Finally, in contrast to Samtani 2011, we will not only include data for preschool‐aged children but for all participants above one year of age, including adolescents and adults (Samtani 2011). Given the improvements in diagnosing toddlers based on ADOS‐2, a measure which has not been included in the Samtani 2011 review, we chose to cover this broad age group in our comprehensive review.

This review will provide the scientific evidence for the most appropriate choice of diagnostic instrument given the clinical setting, the age and abilities of the patient, and the question of differential diagnosis with respect to other psychiatric or (neuro)developmental disorders. In light of the broad spectrum of autism, Asperger's syndrome, atypical autism and complex differential diagnoses, this systematic review will provide clinicians, patients and parents with the latest, scientifically‐based evidence, and ultimately lead to improved diagnostic assessment and a better service for the affected individuals.

Objectives

Primary objective

Secondary objective

  • To compare the diagnostic test accuracy of clinical interviews with parents/primary caregivers or affected adults with the diagnostic test accuracy of direct observation measures. In addition, we will compare whether a combination of interview and direct observation measures improves diagnostic test accuracy.

Methods

Criteria for considering studies for this review

Types of studies

We will include studies on the test accuracy of diagnostic instruments for ASD (parent/caregiver or patient interviews, direct observation measures); with a minimum sample size of 10 participants per comparison group to minimise the risk of bias. A high risk of selection bias is likely for studies with very small sample sizes. ASD are heterogeneous disorders, and the smaller the sample size the higher the risk of drawing conclusions based on non‐representative subsamples of the disorders. The minimum sample size was chosen according to the NICE guideline (NICE 2012, for example, Chapter 5.3.5, Table 10). This inclusion criterion seems to be reasonable in order to minimise clinical heterogeneity and to achieve minimum precision for estimating test accuracy in each included study.

We will include:

  • Cross‐sectional or prospective cohort studies, including individuals suspected of having, or diagnosed with, autism, Asperger's syndrome, atypical autism or ASD, and individuals from the general population, as well as individuals suspected of having, or diagnosed with, developmental delay, intellectual disability, language delay or another psychiatric disorder in which all individuals are classified as ASD or non‐ASD by the same reference standard;

  • Randomised direct comparisons in which participants are randomised to one of two (or more) tests and individuals are verified by the same reference standard;

  • Case‐control studies that recruit a group of individuals with autism, Asperger's syndrome, atypical autism or ASD, and one or more groups without these disorders.

Participants

We will primarily include studies on individuals with a clinically‐derived ICD‐10 diagnosis of childhood autism, Asperger's syndrome or atypical autism (WHO 1992). Because of the very comparable diagnostic criteria, we will also include diagnoses of Autistic Disorder, Asperger’s Disorder, and PPD‐NOS as based on DSM‐IV‐TR‐criteria (APA 2000). Some instruments, which are still used extensively, have been developed based on the ICD‐9 (WHO 1979) or DSM‐III‐R (APA 1987); if the latest version of the instrument was validated with ICD‐9 diagnoses of "299.0 Autistic disorder", "299.80 Other specified pervasive developmental disorders" or "299.90 Unspecified pervasive developmental disorders", or both, or with DSM‐III‐R diagnoses of Autistic Disorder and PDD–NOS (APA 1987), then these studies will also be included.

We will include studies of children (aged over 12 months) and adults with an IQ above 25 or a mental age equivalent to above 12 months (as it is currently not possible to diagnose autism, Asperger's syndrome or atypical autism in the first year of life, or to differentiate severe ID from autism in individuals with an IQ lower than 25 or a mental age less than or equal to 12 months). If possible, the population will be stratified into the following groups: children less than 2 years of age; toddlers aged 2 to 5 years; children aged 6 to approximately 12 years; adolescents aged (approximately) 13 to 18 years; and adults (older than 18 years of age).

Exclusion criteria

We will exclude individuals with schizophrenia, bipolar disorder, primary neurological or neurodegenerative disorder, sensory impairments (hearing, vision), and ID without ASD.

Comparison population

We will include any kind of systematically sampled, assessed, and diagnosed comparison population. These inclusion criteria are deliberately broad and reflect the typical clinical population which presents with a possible diagnosis of autism, Asperger's syndrome or atypical autism. Three different kinds of non‐ASD groups are of interest (these groups are also indicators of diagnostic level (level I: comparison to general population; level II: comparison with individuals selected for developmental or psychiatric disorders; Filipek 1999):

  1. Healthy individuals of the same age range,

  2. Individuals of the same age range suspicious of a developmental disorder, and

  3. Individuals of the same age range with another psychiatric disorder, for example ADHD, oppositional defiant disorder or conduct disorder, anxiety disorder, schizophrenia, affective disorder, personality disorder.

Comparison groups (2) and (3) are of strong clinical relevance and need to be diagnosed according to standardised criteria (ICD‐9, WHO 1979; ICD‐10, WHO 1992; DSM‐III‐R, APA 1987; DSM‐IV‐TR, APA 2000).

In the following text, we will often speak of 'comparison groups', meaning the control group in case‐control studies and cohort/cross‐sectional studies; the comparison group is comprised of individuals who do meet criteria for an ASD diagnosis according to the reference standard that will be applied during the study.

Index tests

We will examine the following index tests: 3di (Skuse 2004), AAA (Baron‐Cohen 2005), ABI (Cohen 1993), ADI‐R (Lord 1994), ADOS (Lord 1989; Lord 2000), ASD‐DA (Matson 2007a), ASD‐OC (Neal 2012a; Neal 2012b), ASDI (Gillberg 2001), BOS (Freeman 1978;), CARS (Vaughan 2011), and the DISCO‐11 (Leekam 2002; Maljaars 2011; Wing 2002). The tests are described under Index test(s) in the Background section and in Appendix 4. All cited tests were developed to diagnose autism, Asperger's syndrome, atypical autism or PPD‐NOS.

Target conditions

The target conditions are autism spectrum disorders (ASDs), comprising autism, Asperger's syndrome, atypical autism, or PPD‐NOS in infants/children (under 2 years of age), toddlers/preschool‐aged children (aged 2 to 5 years), school‐aged children (aged 6 to 12 years), adolescents (aged 13 to 18 years), and adults (older than 18 years of age). We will only include studies using one of the established classification systems (described in Appendix 2). ASDs are regarded as pervasive developmental (or neurodevelopmental) disorders, which are characterised by persistent deficits in social interaction and communication, as well as restricted repetitive behaviours or interests.

Reference standards

The reference standard will be a clinical diagnosis of "F84.0 Childhood autism", "F84.1 Atypical autism" or "F84.4 Asperger's syndrome", according to the ICD‐10 (WHO 1992). If the instrument was developed according to DSM‐IV‐TR criteria (APA 2000), the target conditions are "299.00 Autistic Disorder", "299.80 Asperger’s Disorder", and "299.80 PDD‐NOS", which are comparable to the respective ICD‐10 diagnoses. If the latest version of the diagnostic instrument was developed based on a clinical diagnosis of "299.80 Autistic disorder" or other specified pervasive developmental disorders, which include "299.80 Asperger's syndrome" according to the ICD‐9 (WHO 1979), or based on a clinical diagnosis of "299.00 Autistic disorder" or "299.80 PDD‐NOS" according to the DSM‐III‐R (APA 1987), then these diagnoses will also be taken as a reference standard because criteria are comparable across these classification systems, especially for autistic disorder/autism. These are valid and well operationalised reference criteria, which have been used in almost all diagnostic studies on autism, Asperger's syndrome, or atypical autism/PDD‐NOS. Additionally, some studies use ADI‐R (Lord 1994) and ADOS (Lord 1989; Lord 2000) criteria as the reference standard. As a diagnosis derived from either of these instruments implements DSM‐IV‐TR or ICD‐10 criteria (APA 2000; WHO 1992 respectively), we will also include studies using these instruments as a reference standard. All of these reference standards are considered equal (see Appendix 1).

Search methods for identification of studies

Electronic searches

We developed the search strategy according to the guidelines in Chapter 7.4 of the Cochrane Handbook for Diagnostic Test Accuracy Studies (De Vet 2008). We will search for studies and relevant systematic reviews in the databases listed below. We will search Ovid MEDLINE using the strategy in Appendix 5, which we will adapt for all other databases. We will limit our searches to the years following the publication of ICD‐9 and DSM‐III‐R (1980 onwards). We will not apply language limits to the searches and we will seek translations where our resources permit. If translations cannot be obtained, studies will be listed as Awaiting Classification.

  1. Ovid MEDLINE (1946 to present).

  2. Ovid MEDLINE In‐Process & Other Non‐Indexed Citations (current issue).

  3. Embase (1980 to present; Ovid).

  4. PsycINFO (1887 to present; EBSCOhost).

  5. PsycARTICLES (1894 to present; EBSCOhost).

  6. PSYNDEX: Literature and Audiovisual Media with PSYNDEX Tests (1945 to present; EBSCOhost).

  7. Cochrane Database of Systematic Reviews (CDSR; latest issue, The Cochrane Library).

  8. Database of Abstracts of Reviews of Effects (DARE; latest issue, The Cochrane Library).

  9. ARIF (arif.bham.ac.uk; 1996 to present).

  10. Epistomonikos (epistemonikos.org; all available years).

Searching other resources

We will search the catalogues of test publishers Hogrefe and Western Psychological Services. We will also contact the authors of the respective tests or publications, for relevant data which are not fully published or if errors are detected in the report. Furthermore, we will search for additional studies in the reference lists of included studies, guidelines, and reviews found by our electronic searches.

Data collection and analysis

Selection of studies

LV, SH, ML, MM, and MS will select studies in a two‐phase process. First, one of the named authors will screen titles and abstracts; any articles that do not deal with ASD and/or diagnostic instruments for ASD will be classified as fails. Studies for potential inclusion will then be read in full, and if necessary, discussed for definitive inclusion by at least two of the named authors; any disagreements will be resolved by discussion with a third author. The screening and selection process will be reported in a study flow diagram.

Data extraction and management

For each diagnostic instrument, we will extract aggregate data on diagnostic test accuracy from publications on a specific standardised data extraction form (developed by CF and KJ; copy available on request). Two authors (pairs from LV, SH, ML, MM, MS) will independently extract study data. Disagreements will be resolved by discussion and consultation with a third review author (primarily CF). The form will include at least the following items, if available from the publication:

  • General information on the study: title, authors, contact address, funding sources, language, publication status, year of publication, place(s) and year(s) in which study conducted;

  • Type of instrument assessed (for example, rating scale and rater; interview; direct observation schedule); number of items, scoring of items (dichotomous, Likert, etc.);

  • Detailed information on diagnostic criteria for autism, Asperger's syndrome, atypical autism or PDD‐NOS (reference standard); number of individuals per diagnosis included;

  • Criteria for inclusion into comparison group, number of comparison individuals included;

  • Sex, IQ, and age distribution of patient and comparison sample; and

  • Cut‐off values, sensitivity, specificity; Area under the curve (AUC) of the receiver operating characteristic (ROC) curve etc.; the original classification data (2 x 2 table) at the recommended/published cut‐off threshold will be extracted or reconstructed for further use in the meta‐analyses.

For missing data, we will try to contact the authors of the single studies and ask them for the specific values.

Assessment of methodological quality

The QUADAS‐2 (Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies) will be a part of the data extraction form (Whiting 2011). It too will be rated independently by two authors (pairs from LV, SH, ML, MM, MS) with any disagreements resolved by discussion or involving a third independent review author (primarily CF). Our operationalisation of the QUADAS‐tool is shown in Appendix 6. We will report the results of the QUADAS‐2 assessment for each included study in a table and categorise the risk of bias of each included study as low, moderate, or high. For each index test, providing it is possible to pool diagnostic measures, a sensitivity analysis (Sensitivity analyses) will be done according to these three categories.

Statistical analysis and data synthesis

For each index test, we will describe, in detail, the patient and comparison population of the corresponding studies. We will summarise test accuracy measures for the three different kinds of non‐ASD groups separately (non‐ASD groups as described in the "Participants" section). We will also pool, if possible, test accuracy measures for each kind of diagnosis separately (autism, Asperger's Syndrome, atypical autism, and PDD‐NOS). We will display the QUADAS‐2 results for each included study in a table with our assessments of low, high or unclear risk of bias (concerns) regarding the applicability of each domain.

For each index test, and for each type of non‐ASD group as well as for each type of diagnosis, we will produce a paired forest plot for sensitivity and specificity as well as a plot of sensitivity versus specificity in the ROC space. In the ROC space the circle for each pair of sensitivity and specificity will be sized according to the total number of individuals in the corresponding study.

We will calculate meta‐analyses of pairs of sensitivity and specificity using bivariate random‐effects models (Arends 2008; Chu 2006; Reitsma 2005). This will enable the calculation of pooled sensitivity and specificity estimates while accounting for variation within and between studies, and for potential correlation between sensitivity and specificity. The bivariate model is preferred as compared to the HSROC (hierarchical summary receiver operating characteristic) model because only published thresholds are investigated and, therefore, the thresholds are expected to be homogeneous for each single meta‐analysis. If the bivariate model fails to converge, sensitivity and specificity will be pooled separately as proportions, without taking their correlation into account.

One objective of the present review is to compare test accuracy data of diagnostic instruments if respective comparison studies are available. For comparison of index tests, we will only include studies evaluating tests in the same individuals, or having randomised individuals undergo one or other of the tests being compared. Again, assuming homogeneous thresholds across studies, a bivariate random‐effects model will be used for comparison of index tests by including a binary covariate for test type. If more than two index tests are compared, a categorical covariate for test type will be included. Indirect comparisons of index tests as well as combinations of direct and indirect comparisons will not be addressed by statistical methods.

Reliability measures will be listed individually per study. Only if clinically justified, if the same reliability measures are used, and if sufficient information is provided to estimate the standard errors, will we consider conducting a meta‐analysis (see, for example, Sun 2011 for Cohen’s kappa). We will perform all statistical analyses and produce figures using the most up‐to‐date version of Review Manager software (RevMan 2014). We will use the statistical software R (mada, DiagMeta packages) or SAS (Statistical Analysis System; version 9.3; SAS macro: metadas) to conduct statistical analyses that cannot be performed in RevMan. Where necessary, pooling sensitivity and specificity without taking their correlation into account will be done by the R package meta using the function metaprop.

Investigations of heterogeneity

Potential sources of heterogeneity that could affect outcomes in this review include: differences between the study‐specific patient and control samples with regard to chronological age, gender, IQ or developmental age, severity of diagnosis, presence or absence of language delay, dimensional measures of autistic behaviour severity, other psychiatric disorders, and other medical conditions. In addition, country of origin, study type, and study quality may influence study outcome.

Poor overlap of confidence intervals for sensitivity and – separately – for specificity (visualised in paired forest plots) indicates the presence of statistical heterogeneity. Where possible, given the number of included studies, we plan to explore between‐study heterogeneity formally, by adding a covariable with respect to one source of heterogeneity at a time to the bivariate model, in order to obtain adjusted overall sensitivity and specificity estimates (for example, study type, percentage of males, mean IQ, percentage of high degree of severity).

Sensitivity analyses

When there are sufficient data, we plan to conduct sensitivity analyses to determine the impact of study quality on the robustness of the overall test accuracy measures and on the comparison of different index tests. We will focus particularly on the impact of a high risk of bias with regard to patient selection (QUADAS‐2, domain one), and on the impact of a high risk of bias with regard to knowledge of the reference standard, when interpreting the index test results (QUADAS‐2, domain two). See Appendix 6.