Introduction

Autism spectrum conditions (ASC; including autistic disorder, Asperger syndrome [AS] and pervasive developmental disorder not otherwise specified [PDD-NOS]) are now commonly conceptualized as dimensional, representing the extreme end of one or more continuously distributed traits in the general population (Constantino & Todd 2003). The dimensional approach to autism inspired the development of the Autism-Spectrum Quotient (AQ), a 50-item self-report questionnaire assessing autistic traits in individuals with normal intelligence (Baron-Cohen et al. 2001). Studies using the AQ in the UK (Baron-Cohen et al. 2001), Japan (Wakabayashi et al. 2006) and the Netherlands (Hoekstra et al. 2008) reported significantly elevated AQ scores in participants diagnosed with ASC compared to the general population and a student sample. AQ scores in a small ASC sample were also significantly higher than in people with social anxiety disorder or obsessive compulsive disorder (Hoekstra et al. 2008), suggesting that a high score on the AQ is specific to ASC rather than to psychiatric problems in general. Moreover, the AQ has been found to predict diagnosis of AS in a clinic sample (Woodbury-Smith et al. 2005). Whilst most studies—including the present report—have been conducted using the adult version of the AQ (AQ-Adult), which is self-report, parental-report adolescent (AQ-Adolescent; Baron-Cohen et al. 2006) and child (AQ-Child; Auyeung et al. 2008) versions also exist.

The English (Baron-Cohen et al. 2001), Japanese (Wakabayashi et al. 2006), and Dutch (Hoekstra et al. 2008) versions of the AQ-Adult all show good test-retest reliability and acceptable internal consistency. In the original version of the AQ, the 50 items were divided into 5 empirically derived subdomains (Baron-Cohen et al. 2001) measuring Social skills, Communication, Imagination, Attention to detail, and Attention switching. Subsequent factor analytic studies in non-clinical populations confirmed that the AQ has a multifactorial structure. Austin (2005) and Hurst et al. (2007) studied the structure of the AQ-Adult in student samples using principal component analysis. Both studies suggested three factors, representing Social skills, Details/patterns, and Communication/mindreading. Auyeung et al. (2008) explored the structure of the AQ-Child and reported 4 factors, measuring Social skills, Attention to detail, Mindreading, and Imagination. In a previous report, Hoekstra et al. (2008) explored the factor structure of the AQ-Adult in a general population and a student sample using confirmatory factor analysis and identified 2 higher-order factors measuring Social interaction and Attention to detail. Altogether these different studies suggest that the AQ is multifactorial and encompasses at least one factor pertaining to social behaviors and one factor assessing non-social traits. The discrepancy between the findings from the different studies are most likely due to the use of different study samples (children, students, or general population adults), which could result in slightly different psychometric qualities of the AQ, and to the use of different statistical techniques to evaluate the factor structure (principal component analysis or confirmatory factor analysis).

Together, the studies so far suggest that the AQ is a reliable instrument to quantify the autism phenotype. The promising results of the AQ have prompted its use in large population studies (Hoekstra et al. 2007a, b) and in studies of the cognitive (Bayliss et al. 2005; Lombardo et al. 2007), neural (Gomot et al. 2008), genetic (Chakrabarti et al. 2009), and hormonal (Auyeung et al. 2009) correlates of autistic traits. However, the full 50-item version of the AQ may often be too lengthy to be included in large comprehensive studies. The aim of the current study was to construct a shortened version of the original 50-item AQ-Adult, whilst retaining as much information as possible. Furthermore, we aimed to retain a scale with a clear factor structure that would allow univocal interpretation.

Methods

Participants

The data included in this study comprised 4 independent samples. The first data set (Dutch controls reference sample, n = 1,263) consisted of a Dutch general population sample and a student group (respectively, n = 302 and 961) that both have been described in more detail elsewhere (Hoekstra et al. 2008). The students were registered at either the VU University in Amsterdam or the University of Twente in Enschede and were asked to fill out the Dutch translation of the full-scale AQ-Adult during the tea break of one of their classes. The general population sample was recruited on an information day for parents of multiples. Participants were asked to fill out the full-scale AQ on the same day or return the questionnaire to the research group by mail. Mean age of the students was 21.19 years (SD = 3.69), mean age of the general population sample was 35.68 years (SD = 6.33). The combined sample included 502 men and 739 women (22 cases sex unknown). Previous analyses of these data showed no significant age effect on AQ-scores (Hoekstra et al. 2008).

The second sample (Dutch controls replication sample, n = 1,121; 485 men, 363 women; age: Mean = 45.63 years, SD = 14.74) included adults from the Dutch general population who participated in a large extended twin family study on the influences of genes and environment on cognition and behavior. In this study, the AQ-Short was part of a larger questionnaire that was sent to the participants by mail and could be returned by mail or during a test session.

The third sample (English controls sample, n = 1,838; 737 men, 1,101 women; age: Mean = 20.90 years, SD = 2.47) comprised students from the University of Cambridge. They were recruited via several routes including email, post, newspaper adverts and notices around the university, and invited to complete the full-scale AQ using an online version. Participants who reported a history of psychiatric difficulties (depression, ASC, bipolar illness, psychosis or anorexia) were excluded from the analysis.

The fourth sample encompassed individuals with a formal AS diagnosis (the English AS sample, n = 274; 156 men, 117 women, 1 sex unknown; age: Mean = 35.37 years, SD = 13.05). These participants, all volunteers registered in the Cambridge Autism Research Centre database (see www.autismresearchcentre.com), filled out the full-scale AQ online. All participants were diagnosed by experienced clinicians according to DSM-IV or ICD-10 criteria. The large majority was diagnosed with AS in adulthood.

Data from the Dutch controls reference sample were used to develop the short version of the AQ. Data from the Dutch controls replication sample and the English controls were used to verify the factor structure, whilst the AS sample was included to examine AQ-Short scores in a clinical sample.

Materials

The full-scale AQ comprises 50 descriptive statements assessing personal preferences and habits. Participants respond to each statement on a 4-point Likert scale, with answer categories “1 = definitely agree”; “2 = slightly agree”; “3 = slightly disagree” and “4 = definitely disagree”. The scoring is reversed for items in which an “agree” response is characteristic for autism (24 out of the 50 items). Item scores are summed, resulting in a minimum AQ score of 50 (indicating no autistic traits) and a maximum score of 200 (full endorsement of all autistic traits).Footnote 1

The Dutch controls reference sample and both English samples filled out the full-scale AQ. The Dutch controls replication sample filled out the AQ-Short. Following the criteria set out in the original AQ paper (Baron-Cohen et al. 2001), if more than 10% of the total number of items were missing (>5 in the full-scale AQ, >3 in the AQ-Short), the questionnaire was considered unreliable and the data were discarded.

Procedure

The aim was to reduce the number of items of the AQ, whilst retaining as much information as possible, and ultimately obtaining a shortened questionnaire with a clear factor structure. To this end, we used the following 5-step approach on the data of the Dutch controls reference sample:

1. Inspection of the item content of all 50 items. Based on these inspections, some items were flagged as possibly problematic. For instance, items that are very similar in content and phrasing may disrupt the factor structure of a larger number of items.

2. Exploratory factor analysis (EFA) on all 50 items. The fit of different factor structures was evaluated and compared to the 5-domain model originally proposed by Baron-Cohen et al. (2001). If the item loading was high (>.45) on another domain than the one suggested by Baron-Cohen et al. (2001), the item was added to this other scale and omitted from its original domain. Items were allowed to load on multiple factors if cross-loadings were high (>.45).

3. Domain-specific EFAs on all items included in each domain. If the items flagged in step 1 were indeed causing problems (e.g., the exploratory factor structure looked very different when this item was omitted), the item was removed. This step was used to get an idea of whether the empirically derived factors in step 2 were uni-dimensional, or should be further divided into smaller subscales. Reduction of the total number of items was achieved by removing items with low (<.30) factor loadings.

4. Confirmatory factor analysis (CFA) per domain. The factor structure suggested by the EFAs in step 3 was replicated, and it was examined whether the fit of the resulting model was sufficient or could be improved by introducing minor adjustments, using modification indicesFootnote 2 as guidelines. The fit of these models was evaluated using the Comparative Fit Index (CFI) and the Root Mean Square Error of Approximation (RMSEA). The CFI should ideally be above .90, whilst the RMSEA should be below .08 (Schermelleh-Engel et al. 2003). Note that the CFI is calculated as the difference in fit between the independence model (i.e., all items are uncorrelated) and the hypothesized model. If items are ordinal and item intercorrelations are low, the fit of the independence model will not be very bad. In that case, the difference between the fit of this model and the hypothesized model, i.e., the CFI, will thus not become large even if the model describes the data well.Footnote 3 Several intercorrelations were <.20 in our AQ data. We therefore somewhat relaxed the criterion for CFI, but kept the norm <.08 for the RMSEA.

5. CFA combining all factors identified under step 4. Again, model fit was evaluated using CFI and the RMSEA.

All model fitting was performed using Mplus (Muthén & Muthén 2007) on the raw (ordinal) data using option ‘Categorical’ and weighted least squares estimation with mean and variance adjusted Chi-squares (wlsmv procedure) and Geo-min rotation. Following construction of the AQ-Short using the Dutch reference sample, the same factor structure of the AQ-Short was evaluated in the Dutch controls replication sampleFootnote 4 and the English replication sample. The factor structure was not evaluated in the English AS sample due to limited sample size. This latter sample was included to evaluate the correlation between the full-scale AQ and the AQ-Short in a clinical sample, and to test for group differences in mean AQ-Short scores. Lastly, test accuracy was examined by ROC analysis, and a cut-off score is suggested.

Results

Construction AQ-Short

The 5-step procedure described in the “Method” section was followed to arrive at an abridged version of the AQ, using the data from the Dutch controls reference sample. Inspection of the item content (step 1) revealed three pairs of items (items 40 & 50, 29 & 49, and 17 & 38) with similar content or phrasing. An EFA on all 50 items (step 2) showed that a 5-factor model fitted reasonably well and roughly coincided with the 5 domains originally proposed by Baron-Cohen et al. (2001). This EFA also indicated that items 36 and 45 (originally part of the Social skills domain) clustered together with items from the Imagination domain. As both items concern empathic imagination (see Table 1) they fit well in the Imagination domain and were therefore moved to this factor. Similarly, item 41 loaded highly on the Attention to detail domain, and as this domain fitted the item’s content (see Table 1), the item was moved from the Imagination scale to the Attention to detail domain. Of the items flagged in step 1, items 17, 29, 40 and 49 were omitted in step 3 (domain-specific EFA) as they either disrupted the factor structure (17 and 40) or showed overall low factor loadings (29 and 49). Items 38 and 50 were retained in the domains Communication (38) and Imagination (50).

Table 1 Item content of the 5 factors in the best fitting structure (item number in the full-scale AQ in parentheses)

Domain-specific EFA and subsequent CFA on the data from the Dutch controls reference sample (step 4) showed that for the Social skills domain, a 1-factor model encompassing 7 items fitted reasonably well (CFI = .86, RMSEA = .06) (see Table 1 for item content). EFA and CFA showed that the original Attention switching domain was not uni-dimensional. A 2-factor solution proved more appropriate (CFI = .96, RMSEA = .05), with one factor assessing ‘Routine’ (4 items) and another factor measuring ‘Switching’ ability (4 items). For the Imagination domain, a 1-factor model comprising 8 items fitted well (CFI = .97, RMSEA = .05). Similarly, for the Communication domain, a 1-factor model including 8 items provided a good fit (CFI = .94, RMSEA = .05). EFA on the original Attention to detail domain showed low factor loadings for some of the items. These items were omitted and CFA on the remaining 5 items showed a well fitting 1-factor model (CFI = .99, RMSEA = .06) specifically assessing a fascination for ‘Numbers/patterns’.

Next, the data from the Dutch controls reference sample were used to conduct CFA on all six newly formed subscales (step 5). The resulting model indicated that the scale Communication correlated strongly with the scales Social skills (r = .79) and Imagination (r = .85), whilst the two latter scales correlated moderately with each other (r = .47). Since it was our aim to retain as much information as possible with a minimal number of items, all Communication items were eliminated from the AQ-Short. CFA on the remaining five factors showed that the factors Social skills, Routine, Switching and Imagination correlated substantially with each other (r between .43 and .74). These factors could be subsumed under a higher-order factor ‘Social behavior’. The factor Numbers/patterns only correlated modestly with the other factors (higher-order factorial correlation = .20). The final factor model, depicted in Fig. 1, fitted the data of the Dutch controls reference sample well (CFI = .87, RMSEA = .06).

Fig. 1
figure 1

Factor structure of the AQ-Short, including factor correlation and factor loadings as estimated in the Dutch controls reference sample

In sum, the 50-item AQ was shortened to 28 items. These 28 items can be assigned to five clearly defined factors, assessing difficulties with social skills (‘Social skills’), a preference for routine (‘Routine’), attention switching difficulties (‘Switching’), difficulties with imagination (‘Imagination’), and a fascination for numbers/patterns (‘Numbers/patterns’) (see Table 1 for item content).

Factor Structure Replication and Validation AQ-Short

Subsequently, the best fitting model identified using the data from the Dutch controls reference sample was tested in respectively the Dutch controls replication sample and the English controls. Application of the hierarchical factor model to the data of the Dutch controls replication sample resulted in a reasonable model fit (CFI = .87, RMSEA = .07). Inspection of the modification indices (MI) suggested that the fit could not be appreciably improved by changing one or more of the parameters (all MI < 28). Next, the same model was fitted to the English controls sample. The MI showed that the fit would improve considerably if one item (item 46, “New situations make me anxious”) was allowed to load on the factor Social skills in addition to its loading on the factor Routine. The model including this cross-loading showed acceptable fit (CFI = .86, RMSEA = .07). The differential functioning of item 46 in the Dutch and the English samples might be due to a slight interpretational difference of the word ‘anxious’ in Dutch. The connotation of the Dutch translation (‘angstig’) is closer to ‘fearful’ than to ‘worried’. This relatively strong expression of fear may have resulted in the item loading more strongly on the Routine factor and not on the Social factor in the Dutch samples. Note that both scales are part of the higher-order Social behavior factor. As in the Dutch controls reference sample, the factorial correlation between Social behavior and Numbers/patterns was only modest in the Dutch controls replication sample (r = .10) and in the English controls sample (r = .16).

The distribution of the AQ-Short scores was approximately normal in all three control samples, and slightly skewed to the left in the English AS sample (see Fig. 2). In the three samples for which full-scale AQ data were available, Pearson’s correlations were calculated between sumscores based on the full-scale AQ and the AQ-Short. The correlations were very high and significant in all samples (Dutch controls reference sample: r = .93, p < .001; English controls sample: r = .94, p < .001; English AS sample: r = .95, p < .001). Table 2 displays the internal consistency of the five scales and the higher-order factor Social behavior in each of the independent samples. The Cronbach’s alpha values indicate acceptable to good internal consistency for the total AQ-Short (α between .77 and .86), the broad Social behavior factor (α between .79 and .86) and the Numbers/patterns factor (α between .67 and .73). The internal consistency for the scales Routine and Switching are somewhat low, but this is probably due to the small number of items in both scales (α values usually increase with more items).

Fig. 2
figure 2

Distribution of mean AQ-Short scores in all samples

Table 2 Internal consistency coefficients (Cronbach’s alpha) of the factors making up the AQ-Short in all samples (number of items in each scale in parentheses)

Table 3 shows the mean AQ-Short scores and the scores on the factors Social Behavior and Numbers/patterns in the two Dutch control samples, the English control sample, and the English AS sample. In all three control samples, men obtained significantly higher scores than women on the total AQ-Short, and on the Numbers/patterns factor. The sex difference was also significant for the Social behavior factor in the Dutch controls reference sample and the English controls sample. Against expectation, this effect failed to be significant in the Dutch controls replication sample. Analyses of the English AS sample showed that women with AS scored significantly higher on the AQ-Short than men with AS. This sex difference was mainly reflected in the Social behavior factor and was not significant for Numbers/patters. The sex differences in this clinical sample should be interpreted with care, as selection bias may have played a role. The ratio of women to men participating in this study (1:1.3) is much higher than the typical sex ratio reported for AS/high-functioning autism (1:5.5; Fombonne 2006). As expected, people with an AS diagnosis scored significantly higher on the AQ-Short than controls (see also Fig. 2).

Table 3 Mean AQ-Short and factor scores in all samples (SD in parentheses)

The accuracy of the AQ-Short in distinguishing individuals with AS from controls was evaluated using ROC analysis. The area under the curve was .97, indicating excellent test accuracy. A score >65 had a sensitivity of .97 and a specificity of .82. With a more stringent cut-off of ≥70, the sensitivity and specificity were .94 and .91, respectively.

Discussion

This paper described the development and validation of an abridged version of the AQ, using data from four independent samples from two different countries. We found that the AQ can be shortened from 50 to 28 items; the resulting AQ-Short correlates very highly (r between .93 and .95) with the full-scale AQ. The AQ-Short has a clear factor structure, comprising two higher-order factors assessing broad difficulties in social functioning and a fascination for numbers and patterns. The Social behavior factor can be further decomposed into four lower-order factors assessing Social skills, Routine, Switching and Imagination. We suspect that Routine and Switching are tapping into Social Behaviour simply because of the cognitive demands of social interaction. Social Behaviour invariably requires rapid attentional switching between people (especially in a group and in conversation), and is invariably novel, rather than scripted or routine. The factor structure of the AQ-Short is in line with results from previous factor analytic studies (see for reviews: Mandy and Skuse 2008; Happé and Ronald 2008) in which the majority identified at least one factor measuring autistic traits in the social domain, and one or more factors assessing non-social traits. Similar to previous studies, the correlation between the social factor (Social behavior) and the non-social factor (Numbers/patterns) was only modest (r between .16 and .20). This suggests they are considerably—but not totally—independent.

We stress that the AQ-Short is not a diagnostic instrument. For a clinical diagnosis and detailed clinical studies, established diagnostic assessments and interviews (in which different autistic behaviors including communication impairments are assessed) remain essential. The aim of this study was not to replace the full-scale AQ, but rather to develop a short version of the instrument that could be more easily implemented in large-scale studies. Large epidemiological studies often do not allow for inclusion of a 50-item measure. The AQ-Short may also be valuable as a quick screening for autistic traits in a clinical setting when filling out a 50-item questionnaire is too demanding. Our study suggests that a cut-off of >65 may be useful in this setting. When more time and resources are available, the full-scale AQ still has merits, particularly if one wants to examine communication difficulties.

A limitation of the current study is that the diagnosis of individuals in the English AS sample was based on DSM-IV or ICD-10 criteria and not on standardized diagnostic assessments. There is currently no universally agreed method for making research diagnoses in high functioning adults (the Autism Diagnostic Interview (Lord et al. 1994) and the Autism Diagnostic Observation Schedule (Lord et al. 2000) were not designed for this purpose), but this limitation means that the AS diagnoses could not be validated. A subset of the English sample obtained their diagnosis in the Cambridge clinic where self-report diagnosis was confirmed using DSM-IV criteria, and others from the English sample have been tested in person using the ADI-R and ADOS, again confirming their self-report. This study did not include participants with other ASC diagnoses, such as PDD-NOS and classic autism. These limitations need to be kept in mind when interpreting our findings. Future studies should examine the strength of the AQ-Short in clinical settings, including different clinical (both ASC and non-ASC) groups. Women were slightly overrepresented in all four samples included in our study, and this might have affected the factor structure of the AQ-Short. Future studies could specifically examine sex differences in the factor structure of autistic traits.

In conclusion, like the full-scale AQ, the AQ-Short shows a continuous distribution, the total AQ-Short score and its two higher-order factors showed acceptable to good internal consistency in all samples under study, and sex and group differences were in the expected direction. These results suggest that the AQ-Short is a reliable instrument for a quick assessment of quantitative autistic traits. This abridged version of the AQ could be particularly useful in large scale population-based studies and in clinical settings when filling out the full 50-item version is too demanding.