Introduction

Adolescence is a period of rapid development during which youth continue to define themselves in relation to their social environment, form their self-esteem, and shape their self-concept (DuBois et al., 1998; Smetana et al., 2006). During adolescence, youth more commonly engage in activities with their peers and less with their families, seeking opportunities to expand their social networks and develop social skills (Smetana et al., 2006). Like their neurotypical (NT) peers, most autisticFootnote 1 youth desire peer friendships and wish to engage in social activities (Bauminger & Kasari, 2003). Despite these aspirations, autistic adolescents engage less frequently in social activities than their NT counterparts (Askari et al., 2015). Although the factors impacting the social interactions of autistic adolescents are complex, their experiences are likely, at least in part, impacted by the different social communication and interaction styles inherent to a diagnosis of autism (American Psychiatric Association [APA], 2013; Bottema Beutel et al., 2021). Restricted engagement in social activities due to social communication differences limits autistic adolescents’ opportunities to gain experience and practice social skills (Askari et al., 2015; Majnemer et al., 2015; Smetana et al., 2006), increasing their risk of social isolation and withdrawal (Bauminger & Shulman, 2003). In the long term, participation restrictions may negatively impact autistic adolescents’ mental health and adult outcomes (Ratcliff et al., 2018), hampering independent living, employment, and further education opportunities (Howlin & Magiati, 2017).

Social Skills Group Programs

To date, intervention development has focussed on designing specialised (Radley et al., 2020; Tseng et al., 2020) and generic (Naveed et al., 2019) psychosocial programs aiming to support autistic individuals in navigating the neurotypical world (Lerner et al., 2012). Social skills group programs (SSGPs) are most frequently delivered to school-aged autistic youth with average or above-average general cognitive abilities (IQ > 70). SSGPs vary in their theoretical underpinnings, content, teaching strategies, delivery modes, and intensity. Despite this variability, SSGPs commonly focus on supporting participants to develop their interpersonal skills, social knowledge, and the social performance necessary to achieve their social goals within a neurotypical world (Wolstencroft et al., 2018). Overall, SSGPs are most frequently delivered by one to three trainers in weekly 60 to 90-min sessions (12 to 16 sessions) to a small group of autistic youth (two to six individuals). Ultimately these programs aim to support participants in generalising their practised or newly acquired skills to their everyday social contexts (Reichow et al., 2010; Wolstencroft et al., 2018).

SSGPs can be delivered in a structured or semi-structured format, employing either explicit didactic, implicit performance-based teaching strategies or both (Wolstencroft et al., 2018). Given that the success of explicit programs relies on translating knowledge into behaviour, the outcomes of programs employing these teaching approaches largely depend on participants’ level of motivation and concentration (Guivarch et al., 2017). In contrast, implicit teaching approaches focus on delivering opportunities for participants to develop their social skills during immersive activities, focusing on changing behaviour rather than the overt teaching of skills.

Current Evidence

Recent decades have seen an increase in published studies evaluating the efficacy of SSGPs targeting the social skills of autistic youth. Across efficacy studies, there is considerable variability in the components underpinning these programs and the measurement frameworks employed in evaluating their efficacy. The need to understand the efficacy of these approaches more broadly has led to the publication of several systematic reviews in this field.

Recent systematic reviews synthesising the literature for SSGPs, targeting autistic youth evaluated via randomised controlled trial (RCT) design, suggest a modest treatment efficacy in the areas of social knowledge and performance (Gates et al., 2017) and a reduction in autism characteristics (Wolstencroft et al., 2018). However, these findings should be interpreted within the context that they are almost exclusively underpinned by reports from informants other than autistic youth themselves (Gates et al., 2017; Monahan et al., 2021) and the broad age range of participants in the included studies (ranging from 5 to 25 years), with only one review noting the potential moderating effect of age (developmental stage) on study outcomes (Gates et al., 2017). Notably, research on SSGPs is dominated by samples of male school-age autistic children with an IQ > 70 and of European ancestry (Jonsson et al., 2016). Further, this body of research has largely disregarded the opinions of autistic individuals in developing the content or format of these interventions (Monahan et al., 2021). Collectively, these issues call into question both the external and social validity of SSGPs.

The Current Review

Although previous reviews have contributed significantly to our understanding of the efficacy of SSGPs in increasing the social knowledge and social skills of autistic youth in navigating the neurotypical social world, some limitations remain. Firstly, despite SSGPs demonstrating some efficacy in increasing autistic youth’s knowledge of the social skills commonly utilised in the neurotypical world, there has been little consideration of program fidelity (PF), that is, whether the program is administered as initially intended (Gates et al., 2017; Tseng et al., 2020). Judging the true efficacy of SSGPs when PF is unclear or unreported is virtually impossible, given that other unaccounted-for factors may influence the intervention’s efficacy (Borrelli, 2011). To date, no review has systematically explored the degree to which SSGPs were delivered as initially intended (Borrelli, 2011). Further, previous systematic reviews only included RCTs evaluating the efficacy of SSGP compared to the waitlist or no-treatment control groups (Gates et al., 2017; Wolstencroft et al., 2018). It remains unclear whether the observed effects of these programs resulted from participants' exposure to a structured, supportive group context or the SSGPs alone (Gates et al., 2017). Evidence further suggests that SSGPs are likely more efficacious for autistic adolescents than children (Choque Olsson et al., 2017). To advance understanding of the efficacy of SSGPs in autistic adolescents, this systematic review firstly assessed the methodological quality and PF of studies evaluating the efficacy of SSGPs in improving autistic adolescents’ (aged 12 to 17 years) socialisation success within a neurotypical context via an RCT design. Subsequently, a meta-analysis of outcomes categorised as social communication and interaction skills, behavioural/emotional challenges, adaptive functioning, and autism characteristics investigated the impact of SSGPs on autistic adolescents in these specific domains. This review also included studies employing active controls as a means of controlling for exposure to the social context in judging the efficacy of these programs.

Method

This systematic review and meta-analysis was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (Liberati et al., 2009). This review was registered with PROSPERO (identifier CRD42020213178) on 24 October 2020.

Eligibility Criteria

Studies evaluating a SSGP to improve the socialisation success of autistic adolescents within a neurotypical context were included in this review. Although this review focused on programs targeting autistic adolescents, studies employing samples with a broader age range (including younger children) were also included. Studies examining the efficacy of school-delivered SSGPs were excluded, constraining the heterogeneity of programs and focusing on SSGPs delivered in clinical settings. The hallmark features of school-delivered SSGP, occurring in youth’s everyday social context facilitated by classroom teachers familiar with participants, make these programs inherently different from those delivered within clinical settings (Kasari et al., 2016). SSGPs primarily focussing on parent or family outcomes, in preference to improving the socialisation success of autistic adolescents within a neurotypical context, were also excluded (Table 1).

Table 1 Inclusion and exclusion criteria and search strategy

Information Sources and Search Strategy

Six electronic databases (CINAHL, Medline, ProQuest, PsycINFO, Clarivate, and Scopus) were searched for scholarly articles published in English from 2008 until December 2018 and later updated to November 2020, describing SSGPs aiming to improve the social communication and interaction skills of autistic adolescents. Title, abstract, and keyword searches were undertaken in each database. The main keyword search terms were grouped into five categories (‘autism’, ‘social’, ‘program’, ‘adolescents’, and ‘RCT’). They were then combined with related terms via Boolean operators (Table 1) and tailored to each database. The reference lists of the identified articles were searched for further manuscripts meeting the eligibility criteria. Studies identified via study registrations and personal communication were also included in this review.

Study Selection

All citations were imported into Endnote referencing manager, and duplicates were removed. Articles were screened at the title and abstract level for eligibility against the inclusion criteria. The full texts of candidate articles were subsequently retrieved and assessed for eligibility by two reviewers (BA, MB).

Data Extraction

A data extraction form designed for the purposes of this review was developed guided by the Cochran Handbook for systematic reviews (Higgins & Green, 2011). Extracted data included the study design and randomisation process, sample size, inclusion/exclusion criteria, recruitment strategy, facilitators and setting characteristics, incentives, program/comparison group characteristics, assessment time points, outcome measures (primary and secondary), collection of fidelity and adverse events related to the SSGP, type of analysis and a summary of results.

Assessment of Methodological Quality and Risk of Bias

Two reviewers trained and experienced in conducting systematic reviews (BA, MB) independently rated the quality of all included articles using the Standard Quality Assessment Criteria for Quantitative Studies for quantitative and qualitative studies (Kmet et al., 2004). This 14-item checklist assesses the methodological quality of articles regarding (a) clarity of the aim and design, (b) sample size calculation, (c) control group selection, (d) randomisation process, (e) blinding to group allocation (participants, investigators collecting data, or both), (f) the robustness of outcome measures, (g) analytic methods including some estimates of variance, (h) the sufficiency of reported results, and (i) relevant conclusions. Single items were scored on a scale of 0 (not achieved) to 2 (criteria met), with total proportional scores calculated (by dividing the total raw score by the possible maximal score of the relevant items) and converted to percentage scores, enabling categorisation of articles according to their methodological quality [> 80% strong, 70–80% good, 50–70% adequate and < 50% limited methodological quality (Lee et al., 2008)]. The reviewers compared the ratings for all included studies, with discrepancies discussed until consensus was reached.

The Procedure for Assessing Program Fidelity

The Treatment Fidelity Assessment and Implementation Plan was used to examine and summarise the extent to which each study delivered its program as initially planned (Borrelli, 2011). This 30-item checklist assessed strategies used in each study to ensure adherence concerning: program design (k = 7 items reflecting adherence to the underlying theoretical framework of the program); training of providers (k = 7 items assessing the standardisation of the training process); delivery (k = 9 items quantifying the level of rigour employed in assessing fidelity assessment during the trial); receipt of the program (k = 5 items describing the participants receiving the program) and; enactment of program skills (k = 2 items covering the assessment monitoring and improvement in participants’ performance of taught skills both within and outside the program context). Each checklist item is scored on a dichotomous scale of 1 (present) or 0 (not reported). These checklist items were subsequently used to calculate five subscale scores and one overall score (Bellg et al., 2004). Possible scores range from 0 to 1, with proportional scores of > 0.80 indicating high levels of PF (Borrelli et al., 2005). This measure has shown good reliability and validity, with programs with higher PF scores (total proportional scores) found to be more efficacious (Borrelli et al., 2005; Johnson-Kozlow et al., 2008). Two reviewers (BA, MB) independently rated all included studies, with discrepancies resolved via discussion.

Meta-analysis

Six meta-analyses were performed according to the meta-analytic procedures suggested by Liberati et al. (2009). Two explored the effect of SSGPs on all outcome measures used within the studies immediately after completion of the program and at 3-month follow-up. The remaining four investigated the effect of SSGP on four outcome categories used across these studies as described below (social outcomes, behavioural/emotional challenges, adaptive functioning, and autism characteristics). Data and the script required to replicate the process is available at https://osf.io/n93pu/.

Term Parameters—Outcome Categories

The first author grouped the outcomes into four overarching categories to address the heterogeneity of outcome measures employed to assess the efficacy of SSGPs, facilitating synthesis across studies (Table 2). All authors then reviewed these categories, discussing differences in opinion. This process resulted in four agreed categories: social outcomes, behavioural/emotional challenges, adaptive functioning, and autism characteristics. Social outcomes or social communication and interaction skills defined measures assessing autistic adolescents’ social knowledge or social behaviour (when socialising within a neurotypical context). Behavioural/emotional challenges measures included measures aiming to assess autistic adolescents’ internalising and externalising behaviours, including their emotional states and emotion regulation (Achenbach & Edelbrock, 1978). Adaptive functioning was defined as multidimensional measures capturing autistic adolescents’ ability to effectively and independently cope with everyday demands (Harrison & Boney, 2002). Autism characteristics defined measures employed to diagnose autism (APA, 1994, 2013) or quantify its characteristics.

Table 2 Outcome measure categories based on the outcomes used in the included studies

Statistical Analyses

The findings of studies conducted by Schohl et al. (2014) and Van Hecke et al. (2015) were found to be from an overlapping sample. In line with the process described by Gates et al. (2017), only the study with the more complete data set was included in the meta-analysis (Schohl et al., 2014).

Some studies only reported outcome measures demonstrating significant change in a measure’s total score, subscales, or both. Studies presenting only results for subscales were excluded from the analysis, decreasing heterogeneity across included studies and improving the internal validity of the meta-analysis. Estimates of effect size with a bias correction (Hedges’ g) were calculated by dividing the mean difference of the outcome measures for both SSGPs and control groups from baseline to post-test/follow-up by the pooled standard deviation of study groups at baseline (Morris, 2008). F values or t values were used to calculate the effect sizes in studies where the means and standard deviations were not reported. (Borenstein et al., 2009).

Separate random-effects meta-analyses (as outlined under meta-analysis) were performed using RStudio Version 4.2.1 (RStudio Team, 2015) and its available packages (metaphor, compute.es, and MAd; Del Re, 2013, 2015; Del Re & Hoyt, 2018). Effect sizes and variances within individual studies were aggregated for the meta-analysis process to enable a more precise estimate of the studies’ effect and account for any possible variance within and between the studies (Borenstein et al., 2009). A coefficient value of 0.5 was set for each category, as the correlations between outcome measures within each category were not readily available (Borenstein et al., 2009). Statistical significance was set at p < 0.05, with an effect size (Hedges’ g) of < 0.2, indicating a small, 0.2–0.5 a medium and > 0.8 a large SSGP effect (Fritz et al., 2012). Heterogeneity among effects was assessed using a restricted maximum-likelihood estimator for Tau2 and Chi-Square statistics with an inconsistency score (I2) of 25% demonstrating low, 50% moderate, and 75% high levels of heterogeneity (Higgins et al., 2003). An influence diagnostic assessment (e.g., Baujat plots) investigated how individual studies had affected heterogeneity (Enea & Plaia, 2014). Meta-regression moderator analysis was performed when 10 and more studies with high heterogeneity were included in the meta-analysis. Moderator analysis assessed whether methodological quality, PF, age group, gender, and exposure to the SSGP (as calculated in minutes) had influenced the effect sizes.

Publication Bias

Funnel plots and Egger’s test were used to estimate the possibility of publication bias by plotting the observed effect size against standard errors on the y-axis (Egger et al., 1997). A further sensitivity test was performed as a visual inspection of the funnel plots’ asymmetry alone cannot account for publication bias (Bartoš et al., 2020). If significant publication bias was present at α = 0.01 (Bartoš et al., 2020), a robust bias correction was performed to adjust the findings using JASP (https://jasp-stats.org). JASP is a free program developed to support conducting classical and Bayesian forms of meta-analysis.

Results

Search Results

Electronic database searches identified 3,337 articles. Upon removing duplicates, 1880 articles’ titles and abstracts were reviewed, with the full text of twenty-three articles subsequently evaluated for eligibility. Seven articles did not meet the inclusion criteria for reasons including (a) not employing an RCT design (k = 2), (b) not targeting social communication and interaction skills (k = 2), (c) targeting children younger than the age range of this review (k = 1), (4) focusing on parent and family outcomes (k = 1), and (5) not being a peer-reviewed journal article (k = 1). A review of the reference lists of eligible articles and trial registries identified two further studies, resulting in 18 articles being included in the narrative synthesis.

The eligibility of the eighteen articles included in the systematic review was assessed for inclusion in the meta-analysis. When manuscripts presented insufficient data to support meta-analysis (i.e., presented results for subscales only), corresponding authors were contacted (n = 7). Authors from two studies responded and provided the requested data. Five studies were therefore excluded from the meta-analysis (Corbett et al., 2019; Matthews et al., 2018, 2020; Van Hecke et al., 2015; Vernon et al., 2018). The selection process is presented in Fig. 1.

Fig. 1
figure 1

Selection of studies of social skills group training for autistic youth

Narrative Synthesis

Overall, the included studies (k = 18) evaluated seven unique manualised SSGPs delivered to autistic youth with IQ > 70, including (a) Program for the Education and Enrichment of Relationship Skills (PEERS®; k = 8; Laugeson et al., 2009; Matthews et al., 2018; Matthews et al., 2020; Rabin et al., 2018; Schohl et al., 2014; Shum et al., 2019; Van Hecke et al., 2015; Yoo et al., 2014), (b) KONTAKT® (k = 3; Choque Olsson et al., 2017; Jonsson et al., 2018), (c) SENSE Theatre® (k = 2; Corbett et al., 2016; Corbett et al., 2019), (d) Social Tools and Rules for Teens (START; k = 2; Ko et al., 2019; Vernon et al., 2018), (e) Multimodal Anxiety and Social Skills Program (MASSI; k = 1; White et al., 2013), (f) Sociodramatic Affective Relational Intervention (SDARI; k = 1; Lerner & Mikami, 2012), and (g) SOSTA-FRA (k = 1; Freitag et al., 2016). The majority of the studies were conducted in the United States of America (USA; 65%), with the remainder undertaken in Australia (Afsharnejad et al., 2021a, 2021b), China (Shum et al., 2019), Germany (Freitag et al., 2016), Israel (Rabin et al., 2018), Korea (Yoo et al., 2014) and Sweden (Choque Olsson et al., 2017; Jonsson et al., 2018).

These seven SSGPs employed varied teaching strategies, including structured (k = 4; 57%), semi-structured (k = 2; Ko et al., 2019; Vernon et al., 2018) and unstructured, performance-based approaches (k = 1; Lerner & Mikami, 2012). The majority of SSGPs were delivered to autistic youth in weekly 90-min sessions. The number of sessions varied across SSGPs, with the shortest program delivered over four sessions (Lerner & Mikami, 2012). The longest SSGP, KONTAKT®, was delivered over twenty-four sessions (Jonsson et al., 2018), being an extension of a shorter 12-session (Choque Olsson et al., 2017) and medium 16-session variants (Afsharnejad et al., 2021a, 2021b). Sessions were commonly delivered to small groups of between 3 to 10 participants. Although most studies targeted adolescents aged 11–17 years, the efficacy of KONTAKT®, SENSE Theatre, and SOSTA-FRA was evaluated with samples including both children and adolescents. All SSGPs were reportedly led by therapists/clinicians, with several involving trained and supervised coaches and NT peers. Parents were engaged in providing feedback and educated on ways to support their child during the program, with this role extending to coaching in the PEERS® and the MASSI programs. The KONTAKT®, SOSTA-FRA, START, and MASSI programs incorporated individualised goal setting or tailored planning, with the first three developing these in collaboration with autistic youth. KONTAKT®, PEERS®, SOSTA-FRA, and START, incorporated individually tailored homework assignments to support the generalisation of learnt skills to everyday contexts. Uniquely, the long (Jonsson et al., 2018) and medium (Afsharnejad et al., 2021a, 2021b) variants of KONTAKT® incorporated components supporting the in-vivo assessment of learnt skills within the sessions (participants leading a session) and the generalisation of skills to a community context (an excursion to a café). In assessing the efficacy of SSGPs programs, four studies employed active control groups (Lerner & Mikami, 2012; Matthews et al., 2018, 2020) or attempted to control for the effects of exposing autistic adolescents to a supportive social context (Afsharnejad et al., 2021a, 2021b). The remaining studies assessed the efficacy of SSGPs compared to treatment as usual or waitlist control. Only half of the included studies reported the setting where the SSGP was delivered, which included meeting rooms at community centres, clinical outpatient units, university settings and at school (after school hours). The measurement frameworks and informants used in establishing efficacy varied. A more detailed description of the included SSGPs and the studies evaluating their efficacy is provided in Appendix 1 (Tables 4, 5, 6).

Methodological Quality Analysis

Of the 18 studies, eight (41%) detailed the flow of participants through their studies in CONSORT diagrams (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Jonsson et al., 2018; Schohl et al., 2014; Shum et al., 2019; Vernon et al., 2018; White et al., 2013). Overall, included studies demonstrated good to strong methodological quality (i.e., scores of > 75% as assessed via the Kmet Checklist). Methodological limitations mainly included (a) small sample sizes, (b) failure to conduct a blind assessment of outcomes, (c) a lack of allocation concealment, and (d) failure to describe randomisation processes or conduct intent-to-treat analysis (Accessible via https://osf.io/n93pu/).

Program Fidelity Analysis

All 18 studies but one (Rabin et al., 2018) reported assessing PF, with studies employing various methods, including assessing video recordings of randomly chosen sessions using a fidelity checklist (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Jonsson et al., 2018; White et al., 2013), observing a session assessing fidelity either in reference to the program manual (Shum et al., 2019; Yoo et al., 2014) or via a checklist. The included studies demonstrated overall PF scores ranging from 0.33 to 0.90 (M = 0.52, SD = 0.15), with 17% (k = 3) demonstrating a strong overall fidelity (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Jonsson et al., 2018). An overview of the PF scores is available via https://osf.io/n93pu/.

Design

The seven items of this category assessed the degree to which studies reflected the stated theoretical constructs and mechanisms of the program. All included studies demonstrated a good program design score, ranging from 0.57 to 0.92 (M = 0.79, SD = 0.10), with 47% (k = 10) demonstrating a strong fidelity on this criterion (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Freitag et al., 2016; Jonsson et al., 2018; Lerner & Mikami, 2012; Matthews et al., 2018, 2020; Schohl et al., 2014; White et al., 2013; Yoo et al., 2014). Fidelity was negatively affected by a lack of information in relation to contingency planning for managing implementation setbacks (such as drawing on reserve trainers) and providing insufficient detail regarding the program’s underlying theoretical constructs (Corbett et al., 2016, 2019; Ko et al., 2019; Laugeson et al., 2009; Shum et al., 2019; Van Hecke et al., 2015; Vernon et al., 2018) or facilitators’ credentials (Corbett et al., 2016, 2019; Lerner & Mikami, 2012; Rabin et al., 2018).

Providers

The seven items in this category assessed the degree to which studies provided standardised training to trainers/coaches. This category received the lowest fidelity score, with scores ranging from 0.14 to 0.71 (M = 0.36, SD = 0.19). Common fidelity limitations included failing to clearly describe the training materials provided to therapists, coaches, or peers (Freitag et al., 2016; Laugeson et al., 2009; Lerner & Mikami, 2012; Matthews et al., 2020; Rabin et al., 2018; Shum et al., 2019; Van Hecke et al., 2015; White et al., 2013), or approaches to supervision (Freitag et al., 2016; Rabin et al., 2018; Van Hecke et al., 2015). Six studies described assessing trainers’ readiness to deliver the SSGP before commencing (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Corbett et al., 2019; Jonsson et al., 2018; White et al., 2013; Yoo et al., 2014) Three studies described their training as standardised (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Jonsson et al., 2018). Eligibility (Corbett et al., 2019; Matthews et al., 2018), fitness to deliver the program (Corbett et al., 2019) and individualisation of the training process (Matthews et al., 2018) were reported in studies drawing on neurotypical co-leader peers. However, these studies provided limited details in relation to the training of therapists leading the groups.

Delivery

The nine items of this category assessed the degree to which studies were executed as outlined in their RCT Protocols. Delivery scores ranged from 0.11 to 0.88 (M = 0.51, SD = 0.22), with six studies (33%) demonstrating strong fidelity in this category (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Corbett et al., 2016, 2019; Freitag et al., 2016; Jonsson et al., 2018; White et al., 2013). Across the included studies common limitations included (a) failure to specify fidelity scores a priori (e.g., adhere to delivering > 80% of components; Corbett et al., 2016; Corbett et al., 2019; White et al., 2013), (b) omitting a description of the strategies employed in delivering the programs (e.g., reinforcement, prompting; Ko et al., 2019; Laugeson et al., 2009; Lerner & Mikami, 2012; Matthews et al., 2018, 2020; Rabin et al., 2018; Schohl et al., 2014; Shum et al., 2019; Van Hecke et al., 2015; Vernon et al., 2018; White et al., 2013; Yoo et al., 2014), (c) failure to specify if scripts were used in delivering SSGP curriculum (Corbett et al., 2016, 2019; Ko et al., 2019; Vernon et al., 2018; White et al., 2013) and (d) not monitoring adverse events or nonspecific program effects (Corbett et al., 2016, 2019; Ko et al., 2019; Laugeson et al., 2009; Lerner & Mikami, 2012; Matthews et al., 2018, 2020; Rabin et al., 2018; Schohl et al., 2014; Shum et al., 2019; Van Hecke et al., 2015; Vernon et al., 2018; White et al., 2013; Yoo et al., 2014). Only the study comparing the efficacy of the KONTAKT® to an active control group (Afsharnejad et al., 2021a, 2021b) described the strategies employed to mitigate potential contamination threat between the study arms (contact amongst SSGP and control group participants).

Receipt of Program

The sum of the five items, assessing whether participants understood and acquired the skills covered in the SSGPs, demonstrated high scores ranging from 0.4 to 1.0 (M = 0.73, SD = 0.19). Twelve studies (67%) achieved strong fidelity (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Corbett et al., 2016, 2019; Freitag et al., 2016; Jonsson et al., 2018; Ko et al., 2019; Laugeson et al., 2009; Lerner & Mikami, 2012; Matthews et al., 2018; Rabin et al., 2018; Shum et al., 2019; Vernon et al., 2018; Yoo et al., 2014). Fidelity scores were negatively impacted by the failure to report consideration of (a) cultural factors (e.g., assessing cross-cultural acceptability of the program; Corbett et al., 2016, 2019; Ko et al., 2019; Laugeson et al., 2009; Lerner & Mikami, 2012; Matthews et al., 2018, 2020; Schohl et al., 2014; Van Hecke et al., 2015; Vernon et al., 2018; White et al., 2013), (b) participants enactment of learnt skills (e.g., via homework assignments; Corbett et al., 2016, 2019; Lerner & Mikami, 2012; Schohl et al., 2014), (c) comprehension of session content (e.g., reviewing the session at the end; Corbett et al., 2016, 2019; White et al., 2013), or (d) the use of applied strategies during sessions to enhance comprehension (e.g., providing visual aids, workbooks or written session agenda; Matthews et al., 2020; Rabin et al., 2018; Schohl et al., 2014; Shum et al., 2019; Van Hecke et al., 2015; Vernon et al., 2018; White et al., 2013; Yoo et al., 2014).

Enactment of Program Skills

The two items assessing the enacting of program skills pertained to trainers’ assessment of participants' skills either within or outside the SSGPs sessions (Borrelli, 2011). Though two studies (11%) demonstrated strong fidelity (M = 0.23 [0.00, 1.00], SD = 0.34) under this category (Afsharnejad et al., 2021a, 2021b; Jonsson et al., 2018). Scores in this category were negatively impacted by the failure to document the assessment of participants’ performance during group sessions (Corbett et al., 2016, 2019; Freitag et al., 2016; Ko et al., 2019; Laugeson et al., 2009; Matthews et al., 2018, 2020; Rabin et al., 2018; Schohl et al., 2014; Shum et al., 2019; Van Hecke et al., 2015; Vernon et al., 2018; White et al., 2013; Yoo et al., 2014) or other contexts (Freitag et al., 2016; Laugeson et al., 2009; Lerner & Mikami, 2012; Matthews et al., 2018, 2020; Rabin et al., 2018; Schohl et al., 2014; Shum et al., 2019; Van Hecke et al., 2015; Vernon et al., 2018; Yoo et al., 2014).

Meta-analysis

Analysis of Outcomes from Baseline to Post-test

A total of 57 effect sizes (Mean = 4.38 per study, SD = 2.06, Median = 4) from the 13 studies were included in this analysis. These studies overall had strong methodological quality (M = 86.31, SD = 6.5) and modest PF (M = 0.54, SD = 0.18). In their RCT, three studies employed usual care, two active controls, and the remaining waitlist controls. Four studies reported pooled results from children and adolescents. According to Hedges’ g, the effect sizes of these studies ranged from -0.58 to 3.42 (Fig. 2 and Table 3).

Fig. 2
figure 2

Forest plot comparison of all outcomes. Analysis was based on the aggregated score calculated from the total score of all outcomes used within each study. Positive scores indicate more significant improvement for the intervention group compared to the control group from baseline to post-test

Table 3 Social skills group training effect sizes (Hedges’ g) from baseline to post-test

The meta-analysis revealed an aggregated large overall effect for the efficacy of SSGPs for improving autistic adolescents’ outcomes from baseline to post-test (Hedges’ g = 0.96, p = 0.001, 95% CI [2.71, 4.13]). There was high heterogeneity observed in the included effect sizes. Egger’s regression test indicated no evidence of small study bias (z = 0.71, p > 0.05). No significant moderation effects on quality, PF, gender or age were found (p < 0.5).

The visual inspection indicated that some studies fell outside the funnel plot. Hence a sensitivity analysis was performed. Findings indicated three studies were influencing the results, indicating a need for a bias correction (Afsharnejad et al., 2021a, 2021b; Lerner et al., 2012; Schohl et al., 2014). The reported effect of SSGP demonstrated a decrease from the unadjusted model to the adjusted one (Fig. 3), suggesting a moderate efficacy (μ = 0.60, p < 0.001, 95% CI [0.11, 1.08]).

Fig. 3
figure 3

Adjusted and Unadjusted Models. Analysis was based on the Bias Correction model suggested by Bartos et al., 2020, adjusting for publication bias. Positive scores indicate more significant improvement for the intervention group compared to the control group from baseline to post-test

Analysis of Maintenance Effects from Baseline to Follow-Up

Five studies provided data at a follow-up time on all their outcomes. The meta-analysis of maintenance effects resulted in effect sizes ranging from 0.24 to 6.15 (Fig. 4). No significant overall maintenance effect was observed at follow-up. This finding supported the conclusion that across studies, from post-test to follow-up, autistic adolescents failed to sustain the benefits they reported directly following the completion of the SSGP (Hedges’ g = 1.43, p = 0.12, 95% CI [− 0.38, 3.23]). Findings indicated heterogeneity between effect sizes across the included studies (Q = 122.24, p < 0.001, I2 = 99.31% [98.15, 99.89]).

Fig. 4
figure 4

Forest plot comparison of the social outcomes category outcome measures. Analysis was based on the aggregated score calculated from the total score of all outcome measures in the social outcome category. Positive scores indicate more significant improvements for the intervention group compared to the control group from baseline to post-test

Outcome Categories—Analysis of Outcomes from Baseline to Post-test

Social Outcomes

Data from all informants underpinned the assessment of social outcomes across the included studies. However, more than half of the reported findings drew on data obtained via adolescents’ self-reports immediately after completion of the SSGP. Social outcomes assessed included (a) measured improvements in social skills knowledge (Laugesson et al., 2009; ES = [0.36, 3.57]) as assessed by the Test of Adolescent Social Skills Knowledge (Laugeson & Frankel, 2010), (b) social skills (Laugeson et al., 2009: ES = 0.68; Vernon et al., 2018: ES = 0.21) as assessed via the Social Skills Rating System (Gresham & Elliott, 1990), and the Social Skills Improvement Scale (Gresham & Elliott, 2008), (c) friendship quality (Laugeson et al., 2009; ES = 0.48) and skills (Schohl et al., 2014: ES = not reported; Rabin et al., 2018: ES = 0.09) as assessed by the Friendship Qualities Scale (Bukowski et al., 1994) and Quality of Socialisation Questionnaire (Laugeson et al., 2009) and (d) hosted get-togethers (Laugeson et al., 2009; ES = 1.04) as assessed via the Quality of Play Questionnaire (QPQ; Frankel & Mintz, 2008). One study undertook a blind assessment of the primary outcome, adolescents’ progress towards their personally meaningful social goals. The progress was measured via goal attainment scaling (Kiresuk et al., 1994), reporting that SSGP participants made more progress towards their goals than those attending a cooking program (active control; Afsharnejad et al., 2021a, 2021b; ES = 0.35).

Six studies obtained social outcome data via observer reports indicating improvements in youth’s social skills across SSGPs. Studies employing the NEPSY-II (Korkman et al., 2007) measured outcomes via both blinded (Corbett et al., 2016) and unblinded means (Corbett et al., 2019), noting improvements in group play (ES = 0.77) immediately following participation in the SENSE Theatre® SSGP (ES = 0.75), and delayed improvements in participants recall of faces (ES = 0.98), engagement in cooperative play (ES = 0.58), verbal interaction (ES = 0.47) and theory of mind (ES = 0.45). Lerner and Mikami (2012) noted that upon completing SDARI, participants had significant decreases in occasions of negative social interactions (positive: ES =  − 1.17; negative: ES =  − 0.98) as assessed via the Social Interaction Observation System (Bauminger, 2002). One study assessing autistic youth’s social skills via the Contextual Assessment of Social Skills (Ratto et al., 2011) reported that participants were more engaged in social situations and asked more questions after completing PEERS® (Rabin et al., 2018; ES = 0.16). After attending START, autistic youth demonstrated improved social competencies (Vernon et al., 2018) as measured via the Social Motivation and Competencies Scale (Chevallier et al., 2012; ES = 0.29), asked more questions (ES = 0.13) and recognised more positive facial expressions (ES = 0.19; Ko et al., 2019).

Only one study suggested PEERS® was efficacious in improving participants’ social skills (Rabin et al., 2018; ES = 0.30) via data from parent proxy reported Social Skills Improvement Scale. Teacher reports collected via the same measure, however, failed to detect any significant differences between groups.

Based on the meta-analysis, the Hedges’ g effect sizes of the nine studies providing data related to social outcomes ranged from − 0.64 to 6.80 (Fig. 5). Egger’s regression test demonstrated no evidence of publication bias (p > 0.05). Findings indicated large efficacy for SSGPs in relation to improving social outcomes from baseline to post-test, showing autistic adolescents attending SSGPs gained significantly more social skills than those in control groups (Hedges’ g = 1.91, p = 0.01; 95% CI [0.45, 3.38]). There was significant heterogeneity in effect sizes (Q = 156.81, p < 0.001, I2 = 97.66% [94.81, 99.39]).

Fig. 5
figure 5

Forest plot comparison of the behavioural/emotional challenges category outcomes. Analysis was based on the aggregated score calculated from the total score of all outcomes in the behavioural/emotional challenges category. Positive scores indicate more significant improvements for the intervention group compared to the control group from baseline to post-test

Behavioural/Emotional Challenges

Two studies reported reduced social anxiety following participation in SSGPs as assessed via the self-reported Social Interaction Anxiety Scale (Mattick & Clarke, 1998). One study reported this change immediately following the intervention period (Schohl et al., 2014: ES = not reported) and the remaining at 3-months follow-up (Afsharnejad et al., 2021a, 2021b; ES = 0.47). One study utilising the Emotion Quotient (Baron-Cohen & Wheelwright, 2004) reported improvements in emotion regulation (ES = 0.12). A further study assessing participants' prosocial behaviour and psychopathology via parent proxy-reported Strength and Difficulties Questionnaire (Rothenberger et al., 2008) demonstrated a significant improvement in the behavioural and emotional challenges experienced by autistic youth at 3 months follow-up (ES = 0.34; Freitag et al., 2016).

The effect sizes of the eight studies contributing data in this category ranged from − 0.76 to 0.56 (Fig. 6), with Egger’s regression test finding no evidence of publication bias (p > 0.05). Attending SSGP significantly reduced autistic adolescents’ behavioural and emotional challenges compared to those in control groups from baseline to post-test (Hedges’ g = − 0.14, p = 0.25, 95% CI [− 0.38, 0.10]). The analysis suggested no significant heterogeneity of effect sizes (Q = 18.14, p = 0.01, I2 = 65.96% [13.71, 94.32]).

Fig. 6
figure 6

Forest plot comparison of the autism characteristics category outcomes. From the autism symptomology category, the included studies all had only used SRS/SRS-2. As such, the scores are indicative of autistic-like traits. Negative scores indicate more significant improvements for the intervention group compared to the control group from baseline to post-test

Autism Characteristics

About half of the included studies (k = 9) employed either the SRS (Constantino & Gruber, 2005) or SRS-2 (Constantino & Gruber, 2012) in their measurement frameworks, with six studies denoting it as their primary outcome (Choque Olsson et al., 2017; Freitag et al., 2013; Jonsson et al., 2018; Lerner & Mikami, 2012; White et al., 2013; Yoo et al., 2014). One of these studies reported that parents were blind to group allocation (Lerner & Mikami, 2012). Findings of these studies showed a significant decrease in autistic-liked traits (p < 0.05) immediately after attending SSGP ranging from 0.19 to 1.2. Three studies reported this change was sustained at 3-month follow-up (KONTAKT®: ES = [0.33, 0.82]; SOSTA-FRA: ES = 0.34). One study employing a large sample (n = 296) reported that female participants demonstrated a greater change in autistic-liked traits than males (Choque Olsson et al., 2017).

Based on the meta-analysis, the effect sizes of the eight studies contributing to this category (all employing SRS/SRS-2) ranged from − 1.00 to 1.38 (Fig. 7), with Egger’s regression test finding no evidence of publication bias (p > 0.05). Findings suggest that overall, attending SSGP did not significantly influence the autistic characteristics of adolescents in comparison to their peers in the control groups, between baseline to post-test (Hedges’ g = − 0.10, p = 0.71, 95% CI [− 0.64, 0.43]). The heterogeneity of effect sizes was significant (Q = 49.50, p < 0.001, I2 = 89.16% [73.45, 97.55]).

Fig. 7
figure 7

Forest plot comparison of all outcome measures from baseline to follow-up. Analysis was based on the aggregated score calculated from the total score of all outcome measures used within each study. Positive scores indicate more significant improvement for the intervention group

Discussion

This systematic review was conducted with the express goal of advancing understanding of the methodological quality and PF of studies evaluating the efficacy of SSGPs in samples of autistic adolescents via RCT design. Overall, eighteen studies met the inclusion criteria, evaluating the efficacy of seven distinct manualised programs delivered to small groups of cognitively able autistic adolescents’ (aged 12 to 17 years), aiming to improve participants’ socialisation success within a neurotypical context. Despite the good to strong methodological quality of included studies, the majority demonstrated moderate to low PF. Comparable to previous reviews (Gates et al., 2017), findings of the meta-analysis suggested that SSGPs are moderately effective in supporting autistic youth in navigating the neurotypical world, particularly in developing their social communication and interaction skills.

Methodological Quality

The present review included studies not previously incorporated in a systematic review. Overall, the included studies demonstrated strong methodological quality in line with previous reviews. The confidence in their findings, though, is constrained by several notable limitations. Notably, in line with previous reviews (Gates et al., 2017; Wolstencroft et al., 2018), the present review identified that most SSGP evaluation studies are underpinned by small samples, likely inflating effect sizes and leading to overestimations of their efficacy (Bukowski et al., 1996). Further, as highlighted in a previous review (Gates et al., 2017), most relevant studies either failed to employ blind assessment of outcome measures or nominated blindly assessed outcomes as the primary outcome. This limitation exposes the current body of evidence to high levels of expectancy bias (Williams et al., 2012), stemming from parents’ expectancy of improvement of their child’s social skills given their investment of time and energy in participating in the trial (McMahon et al., 2013), and both adolescents and researchers’ expectations of improvement (Williams et al., 2012). Few studies in this review specified a primary outcome a priori, as recommended by the CONSORT guideline (Moher et al., 2010), for conducting RCTs limiting comparison across studies.

Program Fidelity

Uniquely, this review examined the PF of SSGPs delivered to autistic youth. Although the studies included in this review reported assessing the fidelity of their programs via various strategies, few reported the extent to which the delivery of the study aligned with their RCT protocols. This limited reporting and the present review's low to moderate PF scores are of concern (Borrelli et al., 2005; Harden et al., 2015). Low PF scores in an RCT can threaten the reliability and validity of the efficacy findings for SSGPs (Barton & Fettig, 2013; Craig et al., 2008; Harden et al., 2015; Santacroce et al., 2004; Wells et al., 2012). PF is of greater importance when the program is delivered within a community context, where researchers’ ability to control the context is limited, and program facilitators have varied professional backgrounds (Smith et al., 2007; Wells et al., 2012). Due to the increasing number of available SSGPs and limited information provided by the current published reports, future research should focus on assessing the efficacy of SSGPs in community settings when accounting for PF. Considering facilitators’ views or perceptions of SSGPs, and providing systematic training, monitoring and supervision of facilitators, are all strategies likely to enhance the fidelity and translation of these programs into models of service delivery (Harden et al., 2015; Mandell et al., 2013). Further exploration of PF components, such as assessing and monitoring nonspecific program effects (e.g., therapeutic alliance or SSGP-related adverse events) and the participant's enactment of the learned skills, both within and outside of the group context, can clarify whether observed positive effects are attributable to the SSGP or the facilitators running the groups (Bellg et al., 2004; Kang et al., 2021), enabling consideration of how these factors impact on participant attrition and adherence (Borrelli et al., 2005).

Meta-analysis

As reported by previous reviews (Gates et al., 2017; Wolstencroft et al., 2018), the present meta-analysis revealed a positive effect of SSGPs on social outcomes. However, this finding should be generalised with caution, given these outcomes were largely measured via adolescent self-reported gains in social knowledge (Gates et al., 2017) and included an instrument specifically developed to assess the efficacy of the PEERS® SSGP (Tseng et al., 2020).

Findings from the meta-analysis revealed a large overall effect size comparable to previous studies evaluating the efficacy of SSGPs for autistic youth (Gates et al., 2017; Wolstencroft et al., 2018). Most studies were undertaken with samples where the majority of participants were male, a factor significantly moderating the effects of SSGP (Freitag et al., 2016; Lerner & Mikami, 2012; Rabin et al., 2018; Yoo et al., 2014). Research suggests that the social challenges experienced by autistic males and females differ, and they respond to SSGP differently (Dean et al., 2017), with females possibly benefitting more from attending these programs (Choque Olsson et al., 2017). Considering the higher male-to-female ratio amongst autistic youth (ABS, 2019), there are usually fewer females in SSGP groups. Limited contact with other young females and being in a male-dominated group may limit the social benefits of attending these programs for female autistic youth (Cridland et al., 2014).

Findings

Studies rarely obtained data from informants reporting on participants’ performance in everyday social contexts such as schools, with those collecting data from teachers failing to find any significant changes following participation in SSGPs (Gates et al., 2017). Future research would benefit from understanding the generalised effects of these interventions in contexts beyond program groups. There is a further limited understanding of the impact of dosage (number of sessions) on program outcomes (Gates et al., 2017).

Wolstencroft et al. (2018) review suggested that well-designed SSGPs improve autistic youths’ social knowledge and performance. Interestingly, findings from the present review identified three further outcome categories commonly utilised in assessing the efficacy of SSGP, including autism characteristics which demonstrated no significant program effect for these programs (as measured by parent proxy reports). This finding contradicts those of previous systematic reviews and meta-analyses reporting a modest efficacy for SSGPs based on this category (Gates et al., 2017; Wolstencroft et al., 2018). The majority of research designs evaluating the efficacy of SSGPs continue to employ measures of autistic characteristics as their primary outcome measure. This approach may be counterproductive, given that it inadvertently promotes the view that being autistic is problematic, compounding participants’ feelings of marginalisation and difference (Gillespie-Lynch et al., 2017). This approach does not align with contemporary views of autism which focus on supporting individual needs rather than promoting the notion that individuals need to comply with the neurotypical world (Cage et al., 2018; Monahan et al., 2021). There is a clear need for future research to focus both on building autistic youth’s social competence and promoting the acceptance of neurodiverse individuals within their social contexts (Bölte et al., 2021).

Adolescence is characterised by periods of emotional instability and growth (Hare et al., 2008), with adolescents commonly spending considerable time by themselves and with their peers and less time with their families (Guivarch et al., 2017). Given that adolescence is a time when young people begin to assert their independence from their parents and that autistic adolescents are the consumers of SSGPs, measuring their perceptions of the social knowledge they gain as a result of attending these programs is important in understanding their true efficacy and impact (Gates et al., 2017). To date, autistic youths’ lived experiences and views on the content and structure of SSGPs are rarely considered (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2016; Lurie & Morgan, 2013; Monahan et al., 2021). Limited understanding of the social skills and norms autistic youth themselves wish to adopt and practice may result in SSGPs inadvertently promoting camouflaging, negatively impacting participants' long-term mental health (Cage et al., 2018; Cassidy et al., 2018, 2020). The efficacy of SSGPs would likely be improved by a greater understanding of the views of autistic youth on neurotypical social norms, the skills they wish to develop, and involving them more actively in co-producing the content and strategies employed in SSGPs (Björling et al., 2020; Monahan et al., 2021).

Other Findings

As found by previous reviews, the majority of SSGPs were delivered weekly by health professionals in sessions over 90 min to small groups of 3 to 10 autistic youth. The number of sessions within each program predominantly ranged from 12 to 16 sessions. Notably, SSGPs aiming to support the social skills of autistic adolescents vary in regard to their aims and approaches, which range from structured, semi-structured, and unstructured to performance-based. This variability may, in part, be in response to the preferences and goals of autistic youth and the target context and culture. Consideration of the appropriateness of SSGP for autistic youth should include an assessment of the alignment between a particular program and the target setting and culture (Marsiglia & Booth, 2015). To date, few SSGPs have incorporated adolescents' personally meaningful social goals or in-vivo activities outside their sessions' regular settings. It is plausible that providing opportunities for youth to practice their skills in everyday social situations would increase the effects associated with SSGPs (Afsharnejad et al., 2021a, 2021b).

Across studies, the limited reporting of autistic adolescents’ level of motivation for engaging in SSGPs prior to their enrolment is cause for concern, particularly given the evidence that participants' motivation strongly influences the outcomes of these programs (Chevallier et al., 2012). Despite some programs employing goal setting as a strategy to enhance the intrinsic motivation of participants to engage in SSGPs, only one study reported actively involving participants in establishing their goals. Failing to support the autonomy of autistic adolescents in identifying and pursuing their own social goals can negatively impact the outcomes of SSGPs (Hodgetts et al., 2018).

Across this body of evidence, there is limited understanding of the impact of dosage (number of sessions) on program outcomes (Gates et al., 2017). To date, only efficacy evaluations of KONTAKT® have considered the impact of dose on the outcomes of autistic youth, finding that a longer, 24-session variant demonstrated nearly twice the effect of a medium 16-session and shorter, 12-session variants (Afsharnejad et al., 2021a, 2021b; Choque Olsson et al., 2017; Jonsson et al., 2018). Although this research provides preliminary insight into the role of dosage in influencing outcomes, there is a need to further understand the efficacy of SSGPs across other cultural and service delivery contexts and the feasibility of these longer programs, given their cost (Wolstencroft et al., 2018). Given these limitations and the continued development of novel SSGPs, the external validity of the programs would be improved by efficacy evaluations with more heterogeneous (Gates et al., 2017) and community-sourced samples.

Lastly, though this review enabled insights into the design of studies published over 12 years evaluating the efficacy of SSGPs via an RCT design, it must be noted that autism-related research is rapidly evolving. As such, the SSGPs evaluated, study methods and approaches do not reflect some of the more contemporary stances in this field, such as the neurodiversity paradigm (Lord et al., 2021). More recent research has suggested that social interaction difficulties experienced by autistic individuals may be due to a mismatch in communication styles between autistic and non-autistic neurotypes (Fletcher-Watson et al., 2019). As such, questions may arise about the appropriateness of SSGPs for autistic individuals. Especially given the potential for unattended and adverse effects (i.e., as a result of masking). Nevertheless, SSGPs are still frequently implemented in clinical settings. These programs likely remain helpful for some individuals to develop the skills they need to meet their social goals and communicate their wants and needs. By nature, a review can only examine what has been done in the past. Going forward, researchers should focus on developing SSGPs co-produced with autistic individuals to enhance the relevance and efficacy of such programs. Likewise, programs that support neurotypicals better communicate with other neurotypes may also be important.

Limitations

This systematic review has several limitations. The meta-analysis results represent 72% of the 18 studies included in this systematic review, posing a risk for potential publication bias. Further, nearly half of the identified studies failed to report the total scores of outcome measures, reporting only subscale scores, with only half of the relevant body of research contributing to the meta-analysis. Variation across the measurement frameworks employed by studies (Gates et al., 2017; Wolstencroft et al., 2018) necessitated categorising outcome measures into four key areas, likely limiting the conceptual clarity of the meta-analysis and contributing to the high heterogeneity. Despite the necessity for assessing and reporting publication bias, the meta-analysis of the outcome categories and the maintenance effect, which contain less than 10 studies, are less reliable due to low statistical power (Dalton et al., 2016). The practice of pooling outcomes across autistic children and adolescents (Choque Olsson et al., 2017; Corbett et al., 2016, 2019; Freitag et al., 2016; Jonsson et al., 2018) combined with findings suggesting adolescents’ experience greater gains from attending SSGPs than children (Choque Olsson et al., 2017) raises the question that this meta-analysis does not solely reflect the efficacy of SSGPs in autistic adolescents alone. There was variability across the included studies with regard to the delivery of SSGPs, program components, participants, and data collection approaches, which is not unusual in the context of a systematic review and meta-analysis. Studies aligned in terms of their aims and fundamental principles, with the majority evaluating standardised SSGPs (Tseng et al., 2020).

The delivery of SSGPs is affected by the training and skills of program facilitators (Craig et al., 2008). Further, allocation concealment for studies evaluating the efficacy of behavioural interventions such as SSGPs can be difficult, if not impossible. Given the noted limitations in the quality of the design of RCTs reviewed, the robustness of future efficacy evaluations of the programs would be significantly improved by adhering to guidelines such as CONSORT (Boutron et al., 2008). Due to the lack of response from corresponding authors, the PF was scored based on the information provided in the published RCT manuscripts. Although it is plausible that the limited word counts allowed by publishers contributed to items being marked as “not reported” on the PF checklist, future studies should strongly consider reporting PF items.

In assessing PF, previous studies have largely relied on assessment methods with low reliability (e.g., observational methods or checklists; Borrelli, 2011). Future research should consider employing more rigorous approaches to fidelity assessment, including capturing the views of all stakeholders in relation to a SSGP and audio/video recording sessions, enabling evaluation of the reliability and validity of reported adherence (Borrelli, 2011). Social skills are complex, a fact that has likely underpinned the evidential lack of consistency in the measurement frameworks employed across SSGP efficacy studies. In the present review, variability across studies concerning their choice of outcome measurements did not allow assessment of how PF influences the efficacy of a SSGP.

Few studies employed active control groups in their designs, enabling control for exposure to social context. Although the vast majority of evidence assessing the efficacy of SSGPs for autistic adolescents is underpinned by comparison to inactive control groups, the effects of SSGPs relative to other social groups remain unclear (Karlsson & Bergmark, 2015). As RCT studies are associated with high levels of missing data and noncompliance, future evaluations employing intention-to-treat approaches to data analysis can maintain the balance across study arms achieved by randomisation (Gupta, 2011) regardless of participant withdrawals (Everitt & Wessely, 2008). Given the contextual and highly individualised nature of social skills in autistic youth (Marsiglia & Booth, 2015), future research should consider employing single-subject research designs to advance understanding of the efficacy of SSGPs at the individual rather than group level.

Conclusions

This review found that despite demonstrating good to strong methodological quality, the majority of studies assessing the efficacy of SSGPs neglect clear reporting of PF, negatively impacting both their internal validity (i.e., the extent to which the implementation of the program aligned with study protocols) and external validity (i.e., the extent to which a study can be replicated and interpreted in a real-world context). Although previous reviews have concluded that SSGPs have a significantly moderate effect on the outcomes of autistic youth, the present review, which categorised outcomes into four discrete groups, is the first to highlight that these effects can be largely attributed to changes in social functioning with SSGPs having limited effect on autistic traits and behavioural and emotional challenges. The findings of this review highlight the need for existing SSGPs targeting autistic youth to consider factors affecting PF, accounting for the effect of a supportive group context and program dosage, and considering the alignment of these programs with the social goals of autistic adolescents themselves. Further investigations evaluating the efficacy of carefully conceptualised and designed SSGPs should attempt to accommodate for the heterogeneity of the autism spectrum and variations in a social context while attending to methodological fidelity.