The ability to briefly maintain information in memory is related to higher cognitive performance in wide-ranging areas including spatial navigation, math, reading, decision-making, and language (Baddeley, 1996). In cognitive theories, this ability has been termed short-term memory or working memory, although the definitions of these terms have varied widely (see Cowan, 2017, for a review). The present meta-analysis distinguishes between tasks that require the brief (15–30 s) maintenance of a limited amount of information, short-term memory (Atkinson & Shiffrin, 1968), and tasks that require the processing and manipulation of limited amounts of information, working memory (Baddeley & Hitch, 1974). More specifically, the present meta-analysis examines where group differences across nonimplanted, Deaf signers and hearing nonsigners do and do not occur.

The topic of deafness and memory has been studied since the mid-1900s (e.g., Blair, 1957). Despite over half a century of study, the literature has presented mixed results regarding whether, compared with hearing individuals, Deaf individuals demonstrate memory deficits (e.g., Hirshorn et al., 2012), strengths (e.g., Cardin et al., 2018), or no differences (e.g., Marshall et al., 2015). This lack of consensus can be attributed to numerous factors, including differences in stimuli (e.g., faces versus words; Bettger et al., 1997, and Geraci et al., 2008, respectively), method (e.g., recall the last seen item versus recall serial span; Hirshorn et al., 2012, and Boutla et al., 2004, respectively), and participant characteristics (e.g., hearing aids, cochlear implantation, age; cf. Conway et al., 2009, Kronenberger et al., 2018, and Arfé et al., 2015, respectively). This variability of findings limits the field’s ability to assess models of memory and advance practical recommendations (e.g., best practices in education). Furthermore, the conflicting findings have contributed to and perpetuated misconceptions about Deaf signers, including the notion of Deaf signers having poor working memory (cf. Hamilton, 2011), or Deaf signers are “visual learners” (Marschark et al., 2017). Accordingly, the current meta-analysis begins to untangle inconsistent findings by focusing on a well-defined set of studies. The present research seeks to establish the direction and magnitude of possible memory differences between nonimplanted, Deaf signers, and hearing nonsigners for two distinct types of short-term and working memory: verbal-serial memory and visuospatial-serial memory.

Moreover, the current systematic review and meta-analysis focuses on prelingually Deaf users of a sign-based language (e.g., British Sign Language, American Sign Language) who do not use any auditory assistance (e.g., hearing aids, cochlear implants, bone-anchored hearing aids). A related literature examines short-term and working memory processes after receiving a cochlear implant to evaluate the role of changes to sound awareness on memory (e.g., Conway et al., 2009; Kronenberger et al., 2018). By contrast, the current systematic review and meta-analysis aims to evaluate how using sign language impacts short-term and working memory processes compared with using spoken language. Critically, evaluating individuals with residual hearing (e.g., use hearing aids for amplification), who were postlingually Deaf, or who used other forms of technology to access sound awareness (i.e., cochlear implants) would not address this question because such participants received at least some auditory input. By focusing only on nonimplanted, Deaf signers in contrast to hearing nonsigners, we aim to systematically evaluate the role of deafness on memory for items in serial order, both in verbal and visual tasks.

Verbal-serial-order memory

Verbal-serial-order tasks require participants to report back items exactly as presented; the items can be words, letters, sentences, or nameable stimuli such as digits. Digit span, for example, is used frequently in measures of intelligence (e.g., Wechsler scales; Wechsler, 1997). The classic forward-digit-span task typically starts with a length-two series and after at least one correctly reported sequence (e.g., recall 4–9 as 4–9), the series length is increased by one. This process repeats until length nine is successfully completed or, alternatively, the participant can no longer report a correct sequence. The backward digit span task is identical, except participants recall digits in the reverse order (e.g., recall 4–9 as 9–4). In terms of the distinctions guiding this review, forward-digit span taps short-term memory, as it requires maintenance of information, whereas backward-digit span taps working memory, as it requires maintenance and manipulation of information (Baddeley & Hitch, 1974). This distinction between short-term and working memory is relative, as opposed to absolute, in the cognitive literature, because both tasks require similar levels of processing (Cowan, 2017). However, for the current review, these two domains are discussed separately to highlight the difference in task instructions and demands.

When considering verbal short-term memory (i.e., forward serial recall), nonimplanted Deaf signers of American Sign Language (ASL) often do not perform as well as hearing, nonsigning counterparts, including on tasks with printed letters (e.g., Bavelier et al., 2008), ASL letters (e.g., Boutla et al., 2004), ASL signs (Krakow & Hanson, 1985), digits (e.g., Koo et al., 2008), words (e.g., Geraci et al., 2008), and sentences (e.g., Streff et al., 1978). This pattern has been found both with Deaf children (e.g., Tomlinson-Keasey & Smith-Winberry, 1990) and adults (e.g., Bavelier et al., 2008). Effect sizes for group differences range from small (e.g., Andin et al., 2013) to moderate (e.g., Koo et al., 2008), suggesting unidentified sources of variance driving these differences in effect magnitudes. Theorists attribute these patterns to Deaf signers’ lack of auditory experience, given verbal sequencing is a demand inherent to spoken language that is not placed in the language modality of ASL (Conway et al., 2009). In other words, Deaf signers have less experience with serial ordering than hearing nonsigners and the impact of this experience difference manifests in group differences on verbal short-term memory tasks.

Although previous studies seem to reach consensus on a Deaf deficit for forward-verbal-serial recall (e.g., Boutla et al., 2004), the results are not as clear when considering backward-verbal-serial recall. Some earlier studies suggest the backward-serial spans of Deaf signers are equivalent to those of hearing nonsigners (e.g., Wilson et al., 1997), predominantly due to hearing nonsigners demonstrating a decrease in backward span compared with forward span, whereas Deaf signers maintain their performance on forward and backward span tasks. These findings have emerged when considering simple digit spans (e.g., Wilson et al., 1997) and complex spans, such as operation span (Andin et al., 2013). Moreover, the pattern holds when evaluating backward span in children (e.g., Alamargot et al., 2007) and adults (Boutla et al., 2004). By contrast, some studies suggest Deaf signers’ backward spans may be higher than those of hearing nonsigners (Hamilton, 2011; Powell & Hiatt, 1996). Thus, while the findings for verbal-backward spans (working memory) are less consistent than those for verbal-forward spans (short-term memory), the existing data does not reveal any Deaf deficit in working memory—a contrast to the consistently reported Deaf deficit in short-term memory outlined above.

Taken together, most research suggests a hearing advantage for forward verbal serial order recall (e.g., Bavelier et al., 2008), but there is no clear consensus about the magnitude of this difference, or whether developmental cascades may impact this effect size. Furthermore, there is not yet consensus on whether group differences appear on backward verbal serial order recall, making verbal serial-working memory a particularly unclear area in studies of Deaf memory.

Visuospatial-serial-order memory

Previous theorists have postulated that deprivation in one sense (e.g., hearing) could lead to advanced skills in another (e.g., vision), as in the case of Deaf signers (for discussion, see Bavelier et al., 2006). Indeed, some studies support the notion that Deaf signers demonstrate stronger visuospatial-serial recall skills than hearing nonsigners in memory for designs (Blair, 1957) and forward recall from the Corsi Block Tapping Test (e.g., Cardin et al., 2018; Heled & Ohayon, 2021; Hirshorn et al., 2012). The Corsi Block Tapping Test (Corsi, 1972; hereafter abbreviated as Corsi) is traditionally a three-dimensional task where the experimenter and participant sit opposite each other with a board of blocks on the table between them. The test administrator taps on the blocks in a particular sequence, which the participant replicates. Corsi has inspired numerous adaptations for administration, including virtual tasks on a computer or iPad (e.g., Alamargot et al., 2007; Logan et al., 1996; McFayden et al., 2023). As a visuospatial analog of verbal-span tasks, forward Corsi has been used as a measure of visuospatial short-term memory and backward Corsi has been used as a measure of visuospatial working memory (Corsi, 1972). Of note, some theorists have challenged the use of forward and backward Corsi as analogues for forward and backward verbal recall, given the heavy working and processing load required in both conditions of the visuospatial task (Vandierendonck et al., 2004). For the purposes of the current review, forward and backward Corsi are considered separate to avoid intermixing different dependent measures of visual span.

Despite a research narrative of a sensory deprivation hypothesis (e.g., Hall & Bavelier, 2010), wherein nondeprived senses strengthen due to a deprived sensory system, Marschark et al. (2016, 2017) urge researchers to question this generalization of a visuospatial advantage in Deaf signers. In addition to qualitatively evaluating whether Deaf signers may identify as visual learners (Marschark et al., 2013), Marschark et al. (2016, 2017) empirically demonstrate no significant group differences between Deaf signers and hearing nonsigners on forward-visual-serial tasks, which replicates prior reports (e.g., Flaherty & Moran, 2001; Marshall et al., 2015). Thus, whereas some research suggests a Deaf advantage in visual short-term memory (e.g., Hirshorn et al., 2012), recent work suggests Deaf signers’ visuospatial short-term memory may be equivalent to that of hearing nonsigners (e.g., Marshall et al., 2015).

Fewer studies have considered backward-visuospatial-serial recall, although some studies conduct the Corsi Block Tapping test in forward and backward conditions (e.g., Heled & Ohayon, 2021). When evaluating backward-visuospatial-serial tasks, results again are mixed: Whereas some studies suggest group equivalency (e.g., Heled & Ohayon, 2021; Romero et al., 2014), others suggest audition to be an advantage (e.g., Marshall et al., 2015).

Ultimately, although a prominent narrative has emerged wherein Deaf signers have visual strengths (e.g., Hirshorn et al., 2012), when considering findings across multiple research teams (e.g., Flaherty & Moran, 2001; Marschark et al., 2016, 2017; Marshall et al., 2015) that narrative does not hold. The lack of consensus across studies warrants systematic evaluation.

Current study

Numerous studies have evaluated Deaf signers’ memory compared with hearing nonsigners’ memory and have reached variable conclusions, both about the direction (e.g., Hirshorn et al., 2012 vs. Boutla et al., 2004) and magnitude (e.g., Bavelier et al., 2006 vs. Boutla et al., 2004) of group differences. These discrepant findings not only have theoretical implications for understanding general frameworks related to memory but have equity implications when considering appropriate memory measurement for Deaf populations. Thus, the current study seeks to systematically evaluate verbal- and visual-serial-order performance of nonimplanted, Deaf signers compared with hearing nonsigners. In doing so, additional questions about Deaf memory that have not been addressed in individual studies will be investigated here, including differences as a function of task modality (verbal, visual) and whether age may moderate these effects. Thus, the current systematic review and meta-analysis has two primary aims, broken out into components: the first aim is to evaluate Deaf memory for verbal-serial-recall, which includes short-term memory (forward recall) and working memory (backward recall); the second aim is to evaluate deaf memory for visuospatial-serial-recall, which includes short-term memory (forward recall) and working memory (backward recall).

Method

Transparency and openness

We adhered to the PRISMA 2020 guidelines for systematic reviews (Page et al., 2021). All data, analysis code, and research materials (including our coding scheme) are available online (https://osf.io/vwns8/?view_only=26fd6130332d4cc0a3281a720639352e). Data were modeled using RStudio Version 2021.09.0 with the metafor Version 3.4-0 and metaSEM Version 1.2.5.1 (Cheung, 2019; Viechtbauer & Cheung, 2010). This review project was preregistered on PROSPERO (https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=167987).

Eligibility criteria

The inclusion and exclusion criteria were specified in advance and documented in the preregistration on PROSPERO. The current meta-analysis compares verbal and visuospatial spans of nonimplanted Deaf signers and hearing nonsigners across the life span. Thus, study exclusion criteria included any form of hearing assistance (e.g., cochlear implantation or hearing aids), any task where memory span was not directly measured using a serial-recall task, or any study where only Deaf participants were assessed without a hearing group. Detailed study inclusion and exclusion criteria are in Table 1.

Table 1 Systematic review and meta-analysis inclusion and exclusion criteria

Search strategy

The search strategy was developed by the first and final authors and was verified by a research librarian to ensure accuracy and inclusivity of retrieved articles. Electronic searches for publications in English were conducted in PsycINFO and PubMed. The query string used was:

(Deaf* OR "hard of hearing" OR "hearing loss" OR "hearing disorder*") and ((corsi OR "visual short term memory" OR "visual short-term memory" OR "visuospatial short term memory" OR "visuospatial short-term memory" OR "visuospatial memory" OR "visual memory") OR ("serial recall" OR "order recall" OR "forward recall" OR "backward recall" OR "verbal working memory" OR "verbal short term memory" OR "verbal short-term memory" OR "span" OR Recall (Learning))).

To reduce bias, two independent searches were conducted to identify all relevant studies (authors T.C.M. and M.K.G.A.). Specifiers on the search included selection of ‘Peer Reviewed’ and ‘English’ language. Search results were exported into CovidenceFootnote 1; duplicates were automatically removed based on DOI, author, and year of publication. Title and abstracts were hand-screened by authors T.C.M. and M.K.G.A. based on inclusion and exclusion criteria. Any conflicts were resolved by consensus and a novel reviewer (author K.S.M.). Based on the title/abstract review, full-text screens were conducted by authors T.C.M. and M.K.G.A., again with consensus and conflict resolution conducted by author K.S.M. This searching and screening process was conducted twice—once in June 2021 and again in September 2022; the second phase was conducted searching for articles published only 2021–2022 to capture any articles that may have been missed during the manuscript-writing phase. Relevant articles to be included in the study were hand-searched by author T.C.M. to identify any potentially eligible publications. In addition, the first author invited corresponding authors of primary publications to share additional research, published or unpublished, that met the inclusion criteria. Lastly, the first author disseminated email requests for grey or existing literature related to the current study via three cognitive LISTSERVs.

Coding

The first author developed a coding manual and data extraction process in consultation with the second and final author (available in OSF: https://osf.io/vwns8/?view_only=26fd6130332d4cc0a3281a720639352e). The coding procedure was piloted by the first and second authors using eight articles, after which the coding manual and data extraction forms were revised accordingly. Data extracted from articles included study characteristics, participant demographics, methods, and outcome measures. The first, second, and third authors coded all articles using the refined coding manual and data extraction form. If selected articles had missing data, or required clarification on study details, T.C.M. contacted corresponding authors up to three times. Data presented in visual formats only (e.g., figures) were extracted using Web Plot Digitizer (Rohatgi, 2021). Data coding documents were compared using 4TOPS softwareFootnote 2 and CloudyExcelFootnote 3; interrater reliability for continuous dependent variables was assessed using intraclass correlation. Inconsistencies between the three coders were discussed by all five authors and resolved by consensus.

Assessment of study quality

Three domains of study quality were coded: methodological quality, comprehensiveness of reporting, and hearing bias. Comprehensiveness of reporting and hearing bias measures were developed with consultation from author A.M., a member of the Deaf community.

Methodological quality was assessed using an adapted version of National Institute of Health’s (NIH) Quality Assessment Tool for Case-Control Studies (NIH, 2021). Of the 12 criteria proposed by the NIH, five were changed or adjusted to reflect the use of a Deaf sample; one was omitted because it was encapsulated by the inclusion criteria for the review (i.e., cases and controls differentiated; full criteria in Table 2). Studies were assigned values of 0, 1, or “CD” for cannot determine. In accordance with Tawfit et al. (2019), a score 0–3 was considered poor quality, 4–7 as fair, and 8–11 as good.

Table 2 Adapted quality assessment criteria from NIH’s Quality Assessment Tool for Case-Controls

For comprehensiveness of reporting, studies received up to one point for reporting each of five demographic characteristics of included samples: age (assigned a half point if only one statistic was reported; full credit for reporting two statistics such as mean and standard deviation, which was only relevant for age), sex, deafness onset, degree of hearing loss, and years of education. All domains were assessed independently by the first three authors with discrepancies resolved by consensus with all five authors. After one full round of coding assessment quality, an additional coding training was conducted which included revisions of the operational definition of several codes. The first three authors then re-coded quality and comprehensiveness codes independently and all five authors met to reach consensus on any remaining discrepant codes.

Lastly, the coders evaluated whether a member of the authorship team identified as Deaf or Hard of Hearing (scored as 0, 1) to evaluate hearing bias in publication. Deaf membership was determined by web searches and online profile data.

Data analysis

Power

Power calculations for a random effects model were used to determine the minimum number of studies needed to provide sufficient power. Prior research indicated that deafness had a small-to-moderate, negative effect on verbal memory, and a small, positive effect on visual memory. Power calculations indicated with an average group size of 23 participants, 12 studies would be needed for a moderate effect size (~.50) to reach sufficient power (~.80 in a high heterogeneity model), and 20 studies would be needed for a small effect size (~.20) to reach sufficient power (Valentine et al., 2010).

Effect sizes

Effect sizes were calculated to represent the difference in serial-span scores between nonimplanted, Deaf signers and hearing, nonsigning participants using the standardized mean difference (Hedges’s g), which is less susceptible to upward bias compared with Cohen’s d (Hedges, 1981). Effect sizes were coded so that positive values reflect superior memory performance for Deaf signers compared with hearing nonsigners; negative values reflect a Deaf disadvantage.

Meta-analyses

Given that multiple effect sizes were generated from interdependent data (e.g., both a forward and backward effect size reported in one study), a multivariate meta-analysis was conducted for each topic (e.g., visual short-term memory) that had sufficient studies in the literature. The use of a multivariate meta-analysis in the case of non-independent effect sizes results in more precise estimates (smaller confidence intervals) than univariate meta-analysis and is the most appropriate model to use when sampling covariances are unknown (Cheung, 2019), as is the case with the current data. In the case of small sample size for one outcome (i.e., visuospatial working memory) a univariate meta-analysis with only one outcome was conducted (Harrer et al., 2021).

Given the included studies had diverse populations and heterogeneity was anticipated, a random effects estimation model was used to account for heterogeneity by assuming additional variance beyond the studies in the analysis and adjusting study weights according to the extent of variation, which facilitates generalizability of findings (Borenstein et al., 2009). Heterogeneity was evaluated using the Q statistic and I2 statistic. A significant Q rejects the null hypothesis of homogeneity and indicates that the variability among effect sizes is greater than what is likely to have resulted from study-level variability alone (Lipsey & Wilson, 2001). The I2 statistic describes percentage of total variation across studies that is due to heterogeneity rather than chance (Higgins & Green, 2009). With significant heterogeneity, exploratory moderator analyses were conducted to determine whether other study characteristics were systematically associated with primary outcomes. Exploratory moderator variables were chosen a priori and included participant age, stimuli (e.g., letters, digits, words), and publication year.

Bias

Mean effects were assessed for degree of publication bias using a funnel plot of the effect sizes by their standard error and the trim-and-fill procedure (Duval & Tweedie, 2000) and by conducting Egger’s linear regression method (Egger et al., 1997). To assess whether one or more studies had a substantial statistical impact on the summary effect, a Baujat plot was used to graphically assess studies that contributed considerably to the overall heterogeneity (Baujat et al., 2002) and studies were further evaluated using influence statistics (Viechtbauer & Cheung, 2010), which evaluate the presence of an outlier using eight statistical models. When outliers were detected, the analyses were run with and without the outlier to assess its effect on the overall findings.

Results

Study selection

Figure 1 shows the study selection results using the PRISMA flowchart (Page et al., 2021). Searches conducted using databases retrieved 1,720 records; of those records, 896 were removed as duplicates. With the 824 records remaining, 700 were excluded based on title and abstract review, which left 124 records that entered the full-text review process. Authors were able to retrieve all 124 articles for full-text review; of these, 95 were further excluded due to a variety of reasons aligned with the inclusion and exclusion criteria (see Fig. 1 for further detail). After the full-text review process, 29 articles derived from database searches were extracted. From these 29 articles, two additional articles were hand-selected from the references to be included. One additional, unpublished article was identified to be included via grey literature searches, which resulted in a total of 32 articles to be coded and extracted.

Fig. 1
figure 1

PRISMA flowchart for included and excluded studies

The interrater agreement at each level of the decision-making process was acceptable (Norcini, 1999): 90% (title and abstract screening) and 75% (full text review). For data extraction, the coder interrater reliability was excellent, r = .981.

Study characteristics

Thirty-two articles, representing 37 studies were eligible for inclusion (see Table 3). Upon further inspection, two studies did not report any measure of variance (e.g., standard deviation, standard error) and thus could not be included in effect size generation, which rendered the final sample size to 35 studies, comprising 1,701 participants (n = 816 deaf). Studies were published in nine countries and spanned 64 years (range: 1957–2021, M = 2004, SD = 16.07 yrs). Studies were predominantly conducted with adults (n = 22) and in English (n = 28), followed by Italian (n = 5). Ten studies contained both verbal and visuospatial span tasks.

Table 3 Characteristics of included studies

When considering verbal and visual memory separately, 21 articles comprising 25 studies contained a verbal serial span task (all contained forward trials, n = 12 contained backward trials) comprising 1,202 participants (n = 576 Deaf). Deaf participants were an average age of 23.47 years (SD = 9.95, range: 9.17–44.13) and hearing participants were an average of 23.01 years old (SD = 10.01, range: 9.30–46.00). The most common stimulus for verbal span was digits (n = 16), followed by letters (n = 5) and words (n = 4).

Comparatively, 20 articles comprising 20 studies contained a visuospatial span task (all contained forward trials, n = 4 contained backward trials), comprising 1,080 participants (n = 510 Deaf). Deaf participants were an average age of 20.61 years (SD = 11.65, range: 7.95–44.13) and hearing participants were an average of 20.17 years old (SD = 11.42, range: 8.03–44.80). The most common stimuli for the visuospatial span tasks were Corsi blocks (n = 15), others included pictures of objects (n = 2), lights from the game Simon, Knox cubes, or nonsense forms.

Quality of evidence

Interrater agreement for the quality, comprehensiveness, and deaf authorship codes was high (93%). Quality of evidence data is available in Table 4. Studies were overall of fair quality (M = 4.5, SD = 1.58, range: 0–11) with 25 studies being of fair quality as evidenced by scores on the adapted NIH criteria; 10 of the 37 studies were in the “poor” range and only two studies were in the “good” range. Zero studies included sample size justifications or information to suggest data collectors were blind to study hypotheses; only one study indicated the use of concurrent controls, suggesting these practices are not common in research with Deaf participants. Excluding these three codes (i.e., sample size justification, blind to hypotheses, concurrent controls) did not significantly change the average quality score (M = 4.47, SD = 1.54), but did alter the range distribution (zero studies “poor” [range 0–2], 30 studies “fair” [range 3–5], and seven studies “good” [range 6–8]).

Table 4 Study scores for quality assessment, comprehensiveness of reporting, and Deaf authorship

For comprehensiveness of reporting, the average score was 3.3/5 metrics (range: 0–5) with the modal study reporting on 4/5 domains. The domain with the least reporting was educational level (46% of studies reported). Lastly, 43% of studies had a Deaf author included in their authorship team; however, as some articles contained multiple studies, this number decreased to 38% when considering article authorship teams (12/32).

Meta-analyses

Verbal serial order

Funnel plots for forward and backward verbal serial order are available in Supplementary Figures 1 and 2. Trim-and-fill analyses for publication bias (Duval & Tweedie, 2000) imputed zero studies to the right of the mean (positive Hedges’s g effect sizes) for both forward and backward recall. However, Egger’s test (Egger et al., 1997) revealed the funnel plot asymmetry for forward recall was significant, z = −5.75, p < .001, suggesting evidence of publication bias favoring larger, negative effect sizes (suggesting a hearing bias). Funnel plot asymmetry for backward recall was not significant, z = 1.21, p = .22, suggesting a lack of publication bias. Next, inspecting the forest plot identified one statistical outlier in the forward recall condition (Gozzi et al., 2011; Supplementary Figure 3), which was confirmed with the Baujat plot (Baujat et al., 2002) and indicated as an outlier on 100% (8 out of 8) of the influence plots (Viechtbauer & Cheung, 2010). Supplemental Figure 3 shows the full forest plot with the outlier depicted in red; Fig. 2 shows the plot without the outlier. Due to strong evidence of a statistical outlier, the meta-analysis was conducted with and without Gozzi et al. (2011). Comparing the two meta-analyses suggested statistically different estimates, χ2 = 20.16, p < .001; to be conservative, the multivariate meta-analysis was conducted without Gozzi et al. (2011). No studies in the backward recall conditions were indicated as outliers (see Fig. 3).

Fig. 2
figure 2

Forest plots of population estimates and variance for forward verbal recall

Fig. 3
figure 3

Forest plots of population estimates and variance for backward verbal recall

Results of the multivariate meta-analysis suggested a significant effect of deafness on verbal-serial recall, both forward, g = −1.33, SE = 0.17, p < .001, 95% CI [−1.68, −0.98] (see Fig. 2), and backward, g = −0.66, SE = 0.11, p < .001, 95% CI [−0.89, −0.45] (see Fig. 3). Effect sizes for verbal short-term (forward) and working memory (backward) were significantly related, r = 0.89; however, effect sizes and estimates of variance were significantly different between forward and backward recall, χ2(df = 1) = 17.89, p = .001, suggesting the effects of deafness are not the same on forward and backward verbal recall. Heterogeneity indicated low within-study heterogeneity, I2 forward = 0.86, I2 backward = 0.49; and high across-study variation, Q =166.476, p < .001, suggesting other factors may also be accounting for significant variability in verbal serial recall performance.

Due to the significant heterogeneity indicators, separate meta-regressions were conducted to evaluate the role of participant age, study stimuli (letters versus digits), and publication year on the population estimate for verbal short-term and working memory. Results for the meta-regression with age indicated a significant regression coefficient for age in verbal short-term memory, B = 0.05, p = .004, 95% CI [0.017, 0.09], but not working memory, B = 0.007, p = .65, 95% CI [−0.02, 0.03]. The statistically significant age moderation for verbal short-term memory indicated the effect of deafness was stronger in studies with higher age participants (adults compared with children/adolescents); no relation was detected for working memory. This pattern may be due to the fact that the R2 value when only group was considered was higher for working memory than for short-term memory (the bump in R2 by age did not have as much room to increase for backward span as it did for forward span). Incorporating age into the final model accounted for 34% of the variability in verbal short-term memory and 64% of variability in verbal working memory, R2 = 0.339 and 0.636, respectively. Results for the other two meta-regressions indicated no significant effect of stimuli (verbal short-term memory: B = 0.31, p = .36, 95% CI [−0.36, 0.98]; verbal working memory: B = −0.008, p = .97, 95% CI [−0.42, 0.41]) or publication year (verbal short-term memory: B = 0.01, p = .30, 95% CI [−0.01, 0.03]; verbal working memory: B = −0.007, p = .72, 95% CI [−0.01, 0.007]).

Visuospatial serial order

Due to the small sample size of studies reporting on visuospatial working memory (n = 4), there was insufficient power to include backward recall as an outcome in the analyses (est. power [1 − β] = 0.14 for moderate heterogeneity given average sample size of 18.75 with proposed effect size of 0.2). Thus, a univariate meta-analysis was conducted with forward visual recall only. The same assumptions and random effect models were used for the univariate analysis, as indicated in the Method section.

The trim-and-fill analysis for publication bias (Duval & Tweedie, 2000) imputed zero studies to the right of the mean; however, Egger’s test (Egger et al., 1997) revealed the funnel plot asymmetry was significant, z = −2.12, p = .03, suggesting evidence of publication bias. Visual inspection of the funnel plot (Supplementary Figure 4) indicates a bias favoring larger, negative effect sizes (negative effect sizes indicate hearing participants scoring higher than Deaf participants). Next, visually inspecting the forest plot identified two potential statistical outliers (Koo et al., 2008; McDaniel, 1980), which were confirmed with the Baujat plot (Baujat et al., 2002). However, neither study reached statistical significance on any of the eight measures of statistical influence (Viechtbauer & Cheung, 2010). Thus, there did not appear to be a strong reason for exclusion, so the model was run including both studies.

Results of the univariate meta-analysis indicated no significant effect of deafness on visuospatial short-term memory, g = −0.055, SE = 0.17, p = 0.75, 95% CI [−0.39, 0.28] (see Fig. 4). Heterogeneity indicated low within-study heterogeneity, I2 = 0.86, and moderate across study variation, Q = 94.87, p < .001, suggesting other factors may also be accounting for significant variability in visuospatial short-term memory.

Fig. 4
figure 4

Forest plots of population estimates and variance for forward visual recall

Due to the significant heterogeneity Q value, a meta-regression was conducted to evaluate the role of participant age on the population estimate for visuospatial short-term memory. Results indicated a significant regression coefficient for age, B = 0.041, p = .015, 95% CI [−0.019, 0.86]. The statistically significant age moderation indicated the effect of deafness on forward visual recall was stronger in studies with participants of higher ages (adults compared with children/adolescents). Incorporating age into the final model accounted for approximately 13% of variability in visuospatial short-term memory, R2 = 0.126.

A meta-regression was also conducted with publication year as a moderator. Results indicated no significant effect of publication year, B = 0.0097, 95% CI [−0.010, 0.019], p = .394. There was not sufficient variability in visuospatial stimuli to assess for moderation.

Discussion

The current systematic review and meta-analysis evaluated the impact of Deafness on serial-order memory, specifically visual- and verbal-serial-order short-term (forward recall) and working (backward recall) memory. Importantly, the current meta-analysis was only able to generate population estimates for verbal short-term, verbal working, and visual short-term memory. Due to currently limited sample size and insufficient power, visual working memory represents an area for future study.

Results of the multivariate meta-analyses for verbal memory indicated significant effects of Deafness on both short-term and working memory, wherein hearing nonsigners performed significantly better than Deaf signers. This finding held across stimuli and publication year. Despite the correlation between forward and backward verbal-serial-order effect sizes indicating a strong relationship, forward and backward recall effect sizes were significantly different, suggesting that Deafness has a greater impact on short-term memory (forward recall) compared with working memory (backward recall) of verbal items. Additionally, verbal short-term memory was significantly moderated by age, insofar as studies with adult participants reported a larger hearing advantage than studies with child/adolescent participants. Interestingly, this age moderation was not significant for verbal working memory.

Results of the univariate meta-analysis for visual short-term memory indicated no significant effect of Deafness on visual forward recall, suggesting Deaf signers and hearing nonsigners have equivalent skills in forward visual serial recall tasks. Despite the lack of significant group differences, a significant moderation did emerge wherein studies with adult participants demonstrated a larger gap between Deaf signers and hearing nonsigners than studies with child/adolescent participants, which was also noted in verbal short-term memory. Although variance estimates suggest other factors may also moderate the relationship between Deafness and visual short-term memory, publication year was not significant, and there was not sufficient variability in stimulus type to probe for moderation as almost all studies (15/20) used a Corsi task.

The results of the visual meta-analysis provide a key take-away message: Although previous literature has suggested that Deaf signers demonstrate weaker short-term and working memory for serial recall items (e.g., Bavelier et al., 2008; Conway et al., 2009; Kronenberger et al., 2018), the current visual results suggest Deaf signers are similarly skilled as hearing nonsigners at remembering visual items in a serial order when presented in a forward direction. Thus, the previous claim about Deaf signers having poorer serial order memory globally is not supported. When paired with the verbal results, discussed in greater detail below, we glean a better understanding of the relative strengths and weaknesses of Deaf signers studied in the current systematic review. The present results suggest that when considering the routine assessment of working memory in educational and cognitive contexts, the hearing bias of using verbal items, such as digits, may be disproportionately impacting Deaf signers and underestimating their abilities.

Indeed, the results of the verbal multivariate analyses supported the direction of the group findings reported in individual studies, wherein hearing nonsigners demonstrate a significant advantage in forward recall of serially presented items compared with Deaf signers. Previous explanations have attributed this finding to the nature of the language modalities of both groups, wherein spoken English relies more heavily on serial order compared with ASL (e.g., Cardin et al., 2018; Conway et al., 2009). Interestingly, the hearing advantage replicates with backward recall of serially presented verbal items although the overall observed effect size was significantly smaller than for forward recall. These results suggest backward serial recall may not be as impacted in people who use sign-based languages compared with forward-recall. One area of future study could investigate whether a hearing advantage, including a larger one for forward than for backward recall, holds when using ASL words for the Deaf group and spoken words for the hearing group. Moreover, it will be important to assess whether the patterns replicate on sentence span tasks that use sentence structure observed in both native languages because these tasks may have higher external validity for daily exercises in working and short-term memory. Although the results of the multivariate meta-analysis suggested stimuli type did not moderate this relationship, only a few studies of the 25 analyzed studies included non-digits (n = 5 letters, n = 4 words), and thus perhaps was not powered enough to detect moderation.

The nonsignificant stimuli moderation for verbal forward and backward serial recall was unexpected. Many previous works (e.g., Boutla et al., 2004; Flaherty & Moran, 2001, 2004; Krakow & Hanson, 1985; Logan et al., 1996; Wilson et al., 1997) have used within-participants designs to compare spans of different verbal stimuli, such as digits, letters, and words. Although findings have been mixed, some studies (e.g., Boutla et al., 2004) demonstrated that there is no difference between Deaf signers and hearing nonsigners on forward verbal span when native language stimuli are used (i.e., read aloud digits for hearing nonsigners, signed numbers for Deaf signers). The lack of significant moderation of stimuli type in the current meta-analysis does not support these previous claims. However, perhaps multiple interactions are taking place that were not accounted for in the current review. Not only could stimuli type vary, but the presentation modality and recall modality are also domains that could vary between studies, ranging from printed stimuli/recall, verbal stimuli/recall, or a combination. Although there are too many factors to have considered for the current meta-analysis, it stands to reason that the multiple modalities of presentation, recall, instructions, and task could have cascading impacts on the results above and beyond the stimuli selected for each group.

Although the current results suggested no significant stimuli moderation, there was a significant effect of age on both forward verbal and forward visual serial recall. These results suggested that the effect size increased in magnitude along with participant age, wherein the gap between hearing and Deaf participants grew larger with increasing age. In other words, for children and adolescents, Deaf signers are at a small disadvantage in forward serial-order recall skills compared with hearing nonsigners, but by adulthood, differences of larger magnitude emerge. These findings also support what we term the language-use hypothesis, wherein with more experience using a language modality (e.g., ASL or spoken English), our memory systems narrow towards that specific language modality. Thus, as we age, we may become less flexible with memory in unfamiliar language modalities. This process mirrors what we know about language experience in early development, wherein perception in infancy changes from language-general to language-specific, otherwise called perceptual narrowing, which is observed in both speech and sign (e.g., Maurer & Werker, 2014). Perhaps there is a similar process of perceptual narrowing related to one’s native language modality that continues with age beyond the infancy period. Indeed, as one’s brain adaptively and developmentally prunes irrelevant connections over time to increase efficiency, these age-related findings are consistent with previous literature suggesting perceptual narrowing which makes adults less flexible in their memory compared with younger children (Maurer & Werker, 2014). Future studies may also consider how cognitive aging may impact memory for serial recall in Deaf signers compared with hearing nonsigners, as none of the current studies included older adults.

Whereas the size of the Deaf–hearing difference on both verbal and visual forward recall was modified by age, there was no age effect for verbal backward recall. In other words, the size of the Deafness effect increased with age for short-term memory but not for working memory. This finding is counter to the pattern that would be expected based on the adult cognitive aging literature where age-related differences on short-term memory tasks can be explained by age-related slowing (Multhaup et al., 1996) and these are small as compared with larger age-related differences in working memory (e.g., Wingfield et al., 1988). The contrasting aging effects found in the present meta-analysis with those in the adult aging literature highlight the need for life span research in Deaf as well as hearing samples.

The present study is the first to systematically evaluate short-term and working memory of nonimplanted Deaf signers compared with hearing signers. A related systematic review and meta-analysis evaluated these domains with Deaf cochlear implant (CI) users to evaluate the impact of restored auditory access on memory processes. Although their meta-analyses only contained two to four studies per dependent variable, Akçakaya et al. (2022) reported significant effect sizes for digit span (g = −1.194), comparable to our results (g = −1.33). Their findings of digit span backwards were not significant (g = −0.26), which contrasts with our significant verbal working memory population estimate (g = −0.52). The backward span pattern across meta-analyses is consistent with the idea that hearing experience may affect performance (e.g., Conway et al., 2009), although the consistent forward span difference across studies is not. Importantly, although our results do closely mirror those reported by Akçakaya et al., our novel results underscore the importance of evaluating group differences in short-term and working memory prior to cochlear implantation in Deaf signers. Indeed, although longitudinal work would be required to evaluate this claim, comparing results from both meta-analyses may indicate that cochlear implantation does not improve short-term memory of verbal stimuli but may improve working memory of verbal stimuli.

Theoretical and practical implications

The present findings bring clarity to a literature that includes multiple discrepant findings. The first two meta-analyses revealed significant, negative effects of deafness on verbal forward and verbal backward recall, or verbal short-term memory and working memory, respectively. By contrast, the third meta-analysis failed to detect an effect of deafness on visuospatial forward recall, or visuospatial short-term memory. One theoretical approach to understanding the effect of deafness on cognition, the auditory scaffolding hypothesis (Conway et al., 2009), suggests that the Deaf signers’ relative lack of experience with sequentially ordered language should result in a general deficit in serial-order task performance. The third meta-analysis, however, challenges this view because it reveals similar performance of Deaf signers and hearing nonsigners on visuospatial serial-order recall. The effect of deafness on forward and backward serial-order recall of verbal information remains, however. With these group differences established clearly by this first two meta-analyses of span scores in Deaf signers and hearing nonsigners, next steps include doing more detailed analyses of the error types that the groups make across conditions. The M3 error model (Oberauer & Lewandowsky, 2019) may be a particularly helpful guide for these next steps (see McFayden et al., 2023, for detailed discussion). In addition to establishing group differences in verbal span, the first two meta-analyses also showed that the size of the Deaf signer disadvantage increased as participant age increased from children/adolescents to adults. Another important next step is to broaden data collection to older adulthood. Indeed, the cognitive aging literature has discussed the effect of disrupted sensory input on cognitive aging (e.g., Phillips et al., 2022). To our knowledge, there has not been studies comparing older adults who are Deaf signers with older adults who are hearing nonsigners. Such data would contribute to current discussions in the cognitive aging literature as well as clarify whether the age-related increase in effect sizes continues throughout adulthood, asymptotes, or is cubic.

The meta-analyses’ clarification of the data patterns also has practical implications. For example, there are narratives of Deaf signers being less intelligent compared with hearing nonsigners, which is not empirically supported by the current findings (cf. the “deaf and dumb” literature, still being published in the 21st century, as recent as 2021; Eling & Finger, 2021). The present findings counter this narrative, in part because there is no group difference in visual short-term memory. The data are also relevant in applied settings. For example, the systematic disadvantage for Deaf signers, compared with hearing nonsigners, on verbal span tasks coupled with no group difference on visuospatial span tasks suggests that the best means of obtaining accurate, nonaudist measures of memory is to assess short-term memory with visuospatial tools. Further discussion about implications for audist biases and equity is below.

Quality

Importantly, the quantitative results presented here may be hedged by quality estimates in the “fair” range, with several articles landing in the “poor” range as indicated by adapted NIH criteria for case-control studies. However, as zero studies included data on three out of the 12 domains assessed by the NIH (e.g., sample size justifications, blinded hypotheses, concurrent controls), our quality results suggest that these metrics are not being routinely considered in research with Deaf participants, perhaps due to unique factors of the sample (e.g., Deaf participants may be harder to recruit, thus reliance on a sample size calculation is not common). Removing these three quality domains significantly increased study quality, which suggests research with Deaf communities is not as systematically rigorous as other experimental work. As the NIH Quality Assessment checklist was created in 2013 and the current study spans over 50 years, only 15 studies were published after the creation of the quality domain criteria. Pearson correlations indicated no significant relationship between publication date and quality score, r = .24, p = .16 (Supplementary Figure 5), suggesting these quality domains may stand the test of time, and that significant efforts are needed to improve quality of studies with Deaf participants.

Future quality estimates for Deaf scholarship should consider (a) creating discipline-specific quality guidelines in accordance with research ethics with Deaf participants (see Singleton et al., 2014, for a review), or (b) evaluating comprehensiveness of reporting guidelines instead, as evidenced in the current review. Data from the 35 studies suggested comprehensive reporting of age, sex, hearing status, and age of deafness. The category with the least reporting was educational level (46% of studies reported), which is imperative for authors to assess and report given current and historical exclusion of deaf students and scholars from educational pursuits (Schick et al., 2006). Additionally, by recommendation from Deaf scholar, author AM, two other comprehensiveness metrics were evaluated, including vision screening and access to closed captioning at home, which were reported on at low incidences (closed captioning: 0%, vision: 8%). Future studies may wish to consider what relevant demographic or health considerations are important to consider when working with Deaf communities, as they may relate to task design (e.g., vision may be important to address if conducting a visuospatial task), and/or may occur at higher prevalence rates than in hearing populations (Chia et al., 2006).

One important take-away from the quality assessment process was the lack of Deaf authorship in the majority of articles and studies on Deaf memory. Previous research (e.g., Singleton et al., 2014) has underscored the importance of involving Deaf and signing communities in research to address many ethical research barriers including distrust towards researchers, auditory biases in measure selection and auditory bias in reporting of results. Although rates of Deaf authorship ranged from 38% (articles) to 43% (studies), the authorship metric used in the current meta-analysis was dichotomous and does not account for other areas of bias, such as Deaf tokenism (e.g., feeling that Deaf scholars are sometimes treated as tokens in research teams; Singleton et al., 2014). Funding for research projects should target Deaf scholars conducting research in this domain with an emphasis on Deaf perspectives driving future scholarship.

Conclusion

Nonimplanted, Deaf signers, previously conceptualized to have weaker memory for items presented serially (Wilson et al., 1997), demonstrate similar visual spans and smaller verbal spans compared with hearing nonsigners. The effect sizes observed for significant group differences were in the moderate (backward verbal recall) to large (forward verbal recall) range, suggesting a hearing advantage. These effects are stronger with age, increasing from childhood through adulthood. By contrast, when Deaf signers are assessed in a modality commensurate with their primary language, specifically a visual modality, they demonstrate no significant differences in forward span compared with hearing nonsigners. These results challenge guiding theories that a sign-based language disrupts serial-order processing, as evidenced by no group differences in serial visual recall. As results from the quality assessments of Deaf bias further indicate, our current systems show an audism bias (Eckert & Rowley, 2013) towards hearing nonsigners in the design of studies with Deaf participants and the discussion of results from an audist-centered perspective. Future research should evaluate the types of errors made during short-term and working-memory tasks to further elucidate memory mechanisms, as well as expand developmental research to older adult populations.