This review examined the literature on screening tools used for autism in CALD populations. A total of 56 studies that examined 17 screening tools in CALD populations were used to ascertain the accuracy, reliability, validity, and acceptability. Although a variety of tools were identified, only some tools were implemented in more than one study. Of the total number of studies, 26 studies examined the reliability, validity and/or accuracy of the screening tool. Twenty-one studies provided only validity scores and 12 studies presented only reliability measures.
The psychometric evaluations of the tools varied both in the properties evaluated and in the number of studies that assessed each screener. The performance varied from excellent to poor in terms of sensitivity and specificity. For example, the M-CHAT was studied in several different languages with good reliability and/or validity of the tool found when used in Spanish, Turkish, Chinese, Korean, French, and Arabic populations indicating its potential utility in these groups. The M-CHAT-R/F, the updated version of the M-CHAT, appears to have similar results and was found to be valid and reliable in Spanish, Chinese, Portuguese, Serbian, Malian, Northern Soho, and Albanian communities.
However, the M-CHAT’s performance was not universally robust. The tool’s sensitivity was as low as 25% in a Sinhalese demographic (38). It is worth noting that this study (38) was the only study that employed the M-CHAT within a South Asian nation. It was found that despite a “rigorous translation process” first to Sinhalese and then back to English to ensure validity, some behaviours significant for the diagnosis of autism were not recognised as abnormal by the Sinhalese mothers. This highlights a disadvantage of applying a tool designed for one population to another population where it might not align with the cultural context. The tool’s contrasting accuracy in different settings underscores a critical research gap.
On the other hand Perera et al (88) demonstrated that a pictorial autism screening tool yielded 88% sensitivity in differentiating autism from typical developing Sinhalese children. Hence, there may be merits of using alternative methods such as pictorial scales for identifying autism in diverse populations where there is poor evidence base supporting conventional screening tools and presents an area for further studies.
The CHAT-23, a Chinese adaptation of the M-CHAT, showed moderate validity in the Chinese population (52). Interestingly, there was a significantly higher sensitivity (0.93) when using the Arabic translation of the CHAT-23 within an Egyptian population (53). It should be noted that the Egyptian study utilised a higher cut-off score (failing 15 of 23 questions) to determine a positive screen. In contrast, the Chinese study (52) employed a cut-off threshold of just 6 failed questions, which may have contributed to the higher sensitivity. Of note, the study by Afsah et al (53) screened the population, while Wong et al (52) utilised a case-control method that may have overinflated the tool’s performance.
Importantly, the studies in this review did not consistently report cut-off scores for a positive screen, sensitivity, and specificity for these instruments. To ensure standardised screening efforts, it may be significant to establish and report culturally appropriate cut-off points as more studies are conducted in populations without screening histories. This may be an area where further research and validation studies are beneficial to determine the ideal cut-off for each population.
Various translations of the SCQ have proven reliable across populations and demonstrated validity specifically within Arabic, Greek and Malian populations and higher sensitivity within a Chinese demographic, despite the larger sample size (n = 819) and wide age range (2–18 years). Liu et al (70) analysed subgroups of various ages and found the validity scores to be consistent in < 4 years and > 4 years. This is clinically significant as there are limited options for autism screening between 30 and 48 months of age, with the M-CHAT-R/F being the only validated tool in China at the time but limited to toddlers up to 30 months of age. Thus, the SCQ demonstrates remarkable versability across varying age groups in addition to good reliability and validity. Further research would be beneficial to determine if these psychometric properties translate to other cultures.
Given the various autism screening tools analysed within a variety of countries in this study, an accurate comparison of the screening tools cannot be completed. Although generally it can be noted that the M-CHAT appeared to have the highest overall validity and reliability, its low sensitivity within Sinhalese populations highlights that CALD populations cannot be generalised.
This is furthered by the fact that the four screening tools included in this review that were devised specifically for their culture criteria namely, IASQ/ISAA for Indian populations, Hiva for Kurdish and Persian cohorts and the TASQ for Taiwanese children demonstrated excellent sensitivity (0.97–1). This indicates not only the accuracy, but also the benefit of tailoring the screening tool for the particular culture. The methodologies behind the development of these tools vary. The ISAA (77) and IASQ (78) originated from a collaborative effort from a body of health professionals who devised questions based on the Indian population. Items in the TASQ (87) were based on a qualitative study where a child psychiatrist conducted interviews with Taiwanese families. The Hiva scale (30) was based on DSM 5 criteria but the questions were devised specifically for the Persian and Kurdish populations. This emphasises that the key factor is not only the method of tool development, which can vary, but the consistent element is that it was created in partnership with local stakeholders within the context of the specific population.
The performance of these tools created within their intended populations further supports the notion that the correct application of existing tools in different cultural and linguistic settings extends beyond just translation. It requires a comprehensive assessment of potential mismatches in language and concepts, followed by adjustments to ensure comprehension by the target population. A review by Soto et al (7) emphasised that the objective of cultural adaptation is to achieve “functional equivalence” with the original version which, as demonstrated by the varying psychometric results in this review, is difficult to establish with only translation.
During the process of this review, some cultural adjustments were noted. Some examples include the replacement of the name “peek-a-boo” with a description of the game for Mexican families undertaking the Mexican M-CHAT (26) and the removal of references to British rhymes in the Arabic SCQ (27). However, this review did not rigorously examine the cultural adaptation process for each study and its adherence to recommended guidelines (89). This would be a beneficial area for further research as it may influence the quality of the psychometric outcomes. Soto et al (7) did delve into this area and found that the details of adaptation methods were rarely reported with a notably large variation in the processes that were reported.
Another finding of this review is a noticeable gap in the literature with a lack of studies exploring the acceptability of autism screening tools within CALD populations. The absence of acceptability studies impacts our understanding of the broader societal impact and is both academically and clinically relevant to ensure equitable and culturally competent screening (90). Without a clear understanding of how these tools are received within CALD communities, there may be inadvertent use of methods that are linguistically inappropriate or culturally insensitive. This may lead to inaccurate diagnoses or delays in early intervention and support. It can also result in mistrust within the communities towards the healthcare system (91), exacerbating disparities.
Clinical implications
There are several clinical implications stemming from this review. While it appears to have been useful in several studies, the use of multiple stages of translation and explanation as well as the requirement of additional personnel, training and education in the assessment introduces an additional layer of complexity to the assessment process. This can increase the risk of misunderstandings between the healthcare provider and parent/carer, potentially affecting the accuracy of the diagnosis.
Moreover, the use of multiple stages within the screening (e.g., follow-up interviews and phone calls) and diagnosis process may lead to increased drop-out or decreased follow-up rate. This can potentially result in further delays in intervention. The additional steps are also relevant in terms of policymaking as they can consume a considerable amount of time and resources with negative financial implications.
Strengths and limitations
One of the strengths of this review lies in its inclusivity of a wide range of studies from diverse languages and cultural backgrounds, each of which is distinct from the others. This allows for a more rounded understanding of the complexities and variations in autism in CALD children across a variety of cultures. The strength is enhanced by the use of a systematic approach with broad inclusion criteria and the inspection by two independent reviewers.
This review has several limitations. First, the inclusion of a wide range of studies from diverse cultural backgrounds, while a strength, also introduces a potential limitation in terms of the heterogeneity of the data. Variations in research methodologies, sample sizes, and cultural contexts make it challenging to draw uniform conclusions and comparisons across all studies.
Of significance is that the majority of these studies exhibited a high risk of bias during quality assessment, using the QUADAS-2 and several studies lacked sufficient detail in their MMAT evaluations. This raises concern regarding the credibility of the findings and whether they can be generalised to other settings and contexts. Another limitation of this study is the absence of an examination of grey literature and reference lists which may have allowed for more complete data on autism screening in CALD communities. Additionally, language barriers and the exclusion of studies not available in English may limit the comprehensiveness of the review, potentially excluding valuable research conducted in non-English-speaking regions. Furthermore, while the review highlights the cultural diversity within the selected studies, it focused only on reliability and validity which was demonstrated by psychometric parameters rather than the relevance of items to specific cultural contexts and as mentioned above, the adaptation process.