Current theories of working memory (WM) contend that the ability to engage in non-automated, goal-directed behavior across time requires consideration of at least two separate components of the central executive (Shipstead, Lindsey, Marshall, & Engle, 2014; Unsworth & Engle, 2007a; Unsworth, Fukuda, Awh, & Vogel, 2014). One component reflects the ability to actively maintain or manipulate goal-relevant information in a temporary memory store known as immediate or primary memory (PM) (Cowan, 1995, 2001; Cowan, Elliott, Saults, Morey, Mattox, Hismjatullina, & Conway, 2005; Kane & Engle, 2002; Kane, Hambrick, Tulholski, Wilhelm, Payne, & Engle, 2004), whereas the other component reflects the ability to retrieve goal-relevant information from episodic or secondary memory (SM) back into PM after that information has been lost from PM (Mogle, Lovett, Stawski, & Sliwinski, 2008; Unsworth, 2007, 2009; Unsworth & Engle, 2007a, b; Unsworth & Spillers, 2010).

The purpose of the present study is to resolve an empirical discrepancy that has arisen in the literature regarding the relation between the PM and SM components of WM. Two empirical approaches have been used to measure the PM and SM components. One approach has been to estimate the capacities of the PM and SM components using a single task that was designed to measure both components, whereas the other approach has been to estimate the capacities of the PM and SM components using different tasks that were designed to measure each component separately. Under the former approach, the correlation between estimates of the PM and SM components has been observed to hover around zero (Engle, Tulholski, Laughlin, & Conway, 1999; Unsworth & Engle, 2007b; Unsworth, Spillers, & Brewer, 2010; Shipstead, et al., 2014), whereas under the latter approach, the correlation between estimates of the PM and SM components has been observed to be positive and of moderate strength (Shipstead et al., 2014; Unsworth et al., 2014).

This empirical discrepancy in turn has had important theoretical implications for understanding the relation between WM and higher-order cognition. For instance, studies that have used the single-task approach have supported the conclusion that PM and SM contributed only unique variance to higher-level cognitive abilities (e.g., Unsworth & Engle, 2007b), while studies that have used the separate-task approach have supported the conclusion that PM and SM contributed significant shared (and unique) variance to these abilities (Unsworth et al., 2014). However, despite the growing acknowledgement that PM and SM may represent related abilities, most of the focus has remained on the distinct, “component-specific,” nature of these abilities as opposed to the related, “component-general,” nature, perhaps because the empirical status of overlapping processes has not been consistently established across different studies. The present study attempted to resolve this empirical discrepancy by examining the extent to which the weak correlation typically observed when the capacities of the PM and SM components are both estimated from a single task is due to a third variable that operates during task performance to suppress the otherwise positive and moderately strong relation between the two components.

The immediate free recall (IFR) task has been the primary task used to estimate the capacities of the PM and SM components when only a single task is used. Unsworth and Engle (2007a) demonstrated that performance on a verbal IFR task loaded just as highly on a latent WM construct as did performance on other, more traditional, measures of WM (such as operation span). However, unlike these more traditional measures, IFR allows separate estimates of PM and SM to be derived using Tulving and Colotla’s (1970) method. According to this method, estimates of PM and SM must take into consideration both input and output interference during recall; the greater the amount of interference preceding recall of an item, the more likely the item was recalled from SM as opposed to PM. Following this method, the number of items between a given item’s presentation and recall are tallied. If seven or fewer items intervene between presentation and recall of a given item, the item is considered to be recalled from PM. If more than seven items intervene between presentation and recall, the item is considered to be recalled from SM.

For instance, suppose a participant receives a 12-item list, which for the sake of illustration, we have labeled as: A, B, C, D, E, F, G, H, I, J, K, and L. Furthermore, suppose this participant then recalls: L, K, I, G, D, and H in that order. Here three items were recalled from PM: L (zero items intervened between input and output), K (two items intervened), and I (five items intervened). The remaining three items were recalled from SM: G (eight items intervened between input and output), D (12 items intervened), and H (nine items intervened). Although the threshold for PM is seven intervening items, this method tends to produce estimates of PM capacity that are less than four items because input and output interference are given equal weight, thus making it consistent with modern estimates of PM capacity (Cowan, 2001; Shipstead et al., 2014). Prior work has suggested that this method provides reliable and valid estimates of PM and SM (Craik & Birtwistle, 1971; Unsworth & Engle, 2007a; Unsworth et al., 2010; Watkins, 1974).

One concern with using Tulving and Colotla’s (1970) method to estimate the PM and SM components of WM capacity is that the relative magnitude of these estimates can depend on an individual’s “recall initiation strategy” (Gibson, Gondoli, Flies, Dobrzenski, & Unsworth, 2010; Unsworth, Brewer, & Spillers, 2011). For instance, Unsworth et al. (2011) identified three distinct recall initiation strategies in a sample of 150 undergraduate students who completed a verbal IFR task. In this sample, approximately 45 % of the participants recalled mainly recency items (i.e., the last few items in the list), approximately 20 % of the participants recalled mainly primacy items (i.e., the first few items in the list), and approximately 35 % of the participants recalled equal proportions of both recency and primacy items (see also, Gibson et al., 2010).

Perhaps more importantly, Unsworth et al. (2011) found that individuals who favored a recency strategy tended to recall more items from PM than from SM, whereas individuals who favored a primacy strategy tended to recall more items from SM than from PM, even though the two groups did not differ in overall WM capacity (as measured by the operation span, reading span, and symmetry span tasks). Thus, differences in the order of recall can produce a trade-off in the number of items recalled from PM and SM across groups, without reflecting a difference in WM capacity per se (see also, Healey & Kahana, 2014). Note, however, that, due to the cross-sectional nature of these studies, it is currently unknown whether differences in the relative strength of PM and SM capacity are primary and drive differences in strategy preference or whether differences in strategy preference are primary and drive the relative strength of PM and SM.

Evidence suggesting that participants engage in different recall initiation strategies during performance of IFR tasks is potentially important because some researchers have suggested that estimates of PM and SM are most sensible and unambiguous when recall is based on a recency recall strategy (Unsworth & Engle, 2007a). That is, the use of a primacy recall strategy, wherein participants tend to recall items from the beginning of the list, may be problematic because it can cause estimates of SM capacity to be inflated and estimates of PM capacity to be deflated to zero when Tulving and Colotla’s (1970) method is used. Furthermore, it is not entirely clear whether primacy items that are maintained during list presentation and reported first using a primacy strategy are being recalled from SM (as they would be labeled using Tulving and Colotla’s method), or from PM. Consequently, some researchers have explicitly instructed participants to use a recency recall strategy rather than a primacy recall strategy during performance of IFR tasks when the goal was to consistently estimate the number of items recalled from PM and SM across participants (Craik & Birtwistle, 1971; Gibson, Gondoli, Ralph, & Stybel, in press; Gibson, Kronenberger, Gondoli, Johnson, Morrissey, & Steeger, 2012; Shipstead et al., 2014).

Despite the potential importance of controlling recall initiation strategies, many of the studies that have reported correlations between estimates of the PM and SM components have not done so, and likely represent the use of a mixture of different strategies across participants. Under these conditions, Engle et al. (1999) reported a correlation of −.02, Unsworth and Engle (2007a) reported a correlation of .29, and Unsworth et al. (2010) reported two correlations of .09 and −.13 (all of these IFR tasks involved lists of words). The one exception has been Shipstead et al.’s (2014) study in which participants were explicitly instructed to use a recency recall strategy during performance of two IFR tasks (one involving lists of words and the other involving lists of numbers). Despite this control, Shipstead et al. still observed small, non-significant correlations between estimates of the PM and SM components of .11 in the word IFR task and .01 in the number IFR task (note that this latter correlation was incorrectly reported as −.03 in Table 3 of the original article).

Although the experimental control of group-level recall initiation strategies apparently cannot explain the small correlations that have been observed when the capacities of the PM and SM components were both estimated from a single task, the possibility remains that finer-grained differences between individuals within a group may still be obscuring the relation between PM and SM. To investigate, the present study re-analyzed the results of two sets of studies that used Tulving and Colotla’s (1970) method to estimate the capacities of the PM and SM components from IFR tasks under conditions in which participants were instructed to use a recency recall initiation strategy. Of particular interest, the present study also measured differences in the extent to which individuals utilized this recency strategy so that these differences could be used as a more fine-grained form of statistical control.

Accordingly, a “Strength of Recency” (SOR) score was derived for each individual (see Methods for details). Higher scores reflected a stronger recency strategy whereas lower scores reflected a weaker recency strategy. As mentioned in the above description of Tulving and Colotla’s (1970) method, the greater the amount of input and output interference preceding recall of an item, the more likely the item was recalled from SM as opposed to PM. In addition, it is also important to point out that the greater the amount of input and output interference preceding recall of an item, the less likely the item will be recalled at all (Dalezman, 1976). Hence, a tradeoff in the number of items recalled from PM and SM was expected to follow from individual differences in SOR because SOR reflected differences in the amount of interference that was generated before recall of an item. Furthermore, because a recency recall initiation strategy was instructed in the present set of studies, wherein recency items were more likely to be recalled before primacy items, we expected interference to decrease the recall of primacy items (SM) more than recency items (PM).

For instance, when SOR scores were high, more recency items were recalled, resulting in potentially more interference before primacy items were recalled. In this instance, the number of items recalled from PM was expected to be relatively high and the number recalled from SM was expected to be relatively low. In contrast, when SOR scores were low, fewer recency items were recalled, resulting in potentially less interference before primacy items were recalled. In this instance, the number of items recalled from PM was expected to be lower and the number recalled from SM was expected to be higher than when SOR scores were high. In other words, SOR was predicted to be positively related to the number of items recalled from PM and negatively related to the number of items recalled from SM. Hence, if significant variation in SOR still exists when participants are instructed to use a recency recall strategy, then it is possible that the uncontrolled operation of SOR would cause the relation between PM and SM to be more negative than it otherwise would be, thereby offsetting any positive relation that may exist between these two components of WM (as suggested by studies using the separate-task approach).

In this way, SOR was thought to operate as a type of third variable that obscures the relation between PM and SM. According to Mackinnon, Krull, and Lockwood (2000), third variables can be construed as mediating or confounding variables. Typically, controlling for mediating or confounding variables decreases the magnitude of the relation between the other two variables. However, in the present context, we expected the control of SOR to increase the magnitude of the positive relation between PM and SM; in this situation, the third variable is known as a suppressing variable. Suppressing variables can either be mediating variables or confounding variables, depending on the causal relation between the three variables. However, the present study was unable to determine the extent to which SOR should be construed as a mediating variable or as a confounding variable. Fortunately, according to Mackinnon et al., the same analytic strategy can be used to assess the strength of either type of third variable (see Results for details).

In summary, the primary purpose of this article was to provide empirical evidence that the relation between estimates of the PM and SM components has been suppressed by SOR. This finding would be important because it would show that a moderately strong, positive correlation between estimates of the PM and SM components can be consistently observed regardless of whether estimates of the capacities of these two components are derived from single tasks or from separate tasks. Moreover, this conclusion would be important from a theoretical standpoint because it would support current theories that construe PM and SM as being highly related, yet distinct aspects of WM (Unsworth et al., 2014). In other words, this conclusion would motivate the distinction between component-general and component-specific processes in WM.

Experiment 1

Method

Participants

The participants were 140 adolescents between the ages of 9 and 16 years (M = 12.84 years, SD = 1.54 years) who were obtained from two previous publications (Gibson et al., in press; Gibson et al., 2012). Inclusion in the present study required participants to use an experimentally-induced recency recall initiation strategy in both verbal and spatial IFR tasks (see below for details). Seventy-seven of the adolescents were obtained from Gibson et al.’s later (in press) study. Of these 77 adolescents, 32 had study-confirmed diagnoses of attention-deficit hyperactivity disorder (ADHD), and 45 had study-confirmed absence of ADHD. Parents were instructed to withhold any ADHD medication (simulant or non-stimulant) for at least 24 h prior to any cognitive testing. The remaining 63 adolescents were obtained from Gibson et al.’s earlier (2012) study; diagnostic status was not determined in this study. The participants in Gibson et al.’s earlier (2012) study performed the two IFR tasks before and after various WM training regimens. Thus, only the pre-training assessment was included in the present study.

IFR tasks

PM capacity and SM capacity were estimated using the same spatial and verbal IFR tasks used by Gibson et al. (2010). In the spatial IFR task, locations were depicted as gray outline squares that were filled with white. The locations used in this task were arranged in a 15 × 12 matrix of adjoining squares; there were 15 columns of adjoining squares arrayed horizontally across the screen and 12 rows of adjoining squares arrayed vertically down the screen, resulting in a total of 180 unique screen locations. Participants were presented with 15 trials of 12 different locations. Each of the 12 locations in a trial was cued in consecutive order for one second, and there was a 1-s delay before the next cue appeared. Locations were cued by temporarily changing the color of the interior from white to red. To make the task manageable, only 36 of the possible 180 squares appeared at any one time. These 36 squares were selected randomly from the set of 180 possible locations and they remained visible for three consecutive trials (3 trials × 12 cued locations = 36 locations). At the conclusion of the third trial in each set, a new set of 36 locations was randomly selected from the 180 possible locations without replacement. This sequence of three trials repeated five times for a total of 15 trials. The five sets of 36 randomly-selected squares were determined separately for each participant. Three practice trials preceded the experimental trials. Each trial was initiated by the experimenter after the participant signaled his or her readiness.

At the conclusion of each trial, participants were prompted to use the computer mouse to recall as many locations as possible by clicking on the relevant locations. Because PM capacity and SM capacity can be influenced by participants’ order-of-report strategy, participants were told that they could recall the locations in any order, with the sole constraint that they should begin recalling locations from the end of the list first (see e.g., Gibson et al., 2011). These instructions tend to induce a recency strategy which is optimal for estimating the two capacities, and for ensuring that the estimates are based on the same strategy across participants (Craik & Birtwistle, 1971). Participants were given 30 s to recall the lists, and they were required to wait the full 30 s before proceeding to the next trial. This mandatory 30-s recall period ensured that participants could not prematurely terminate their recall in the event that they found the delay imposed by the recall period aversive (Sonuga-Barke, 2003).

In the verbal task, participants were presented with 15 lists of 12 unique high-frequency words that were randomly combined. The words were printed in 20-point font, and all words appeared white against the black background of a standard CRT monitor. Each word was presented consecutively for one second in the middle of the computer screen. Following the presentation of a single list, question marks appeared in the center of the screen prompting a response by the participant. As in the spatial IFR task, participants were told that they could recall the words in any order, with the sole constraint that they should begin recalling words from the end of the list first. In addition, participants were given 30 s to recall the lists, and they were required to wait the full 30 s before proceeding to the next trial. Participants reported their answers into a microphone, which was connected to a standard cassette recorder. Three practice trials using letter stimuli (instead of words) preceded the test trials. The word lists were presented in the same random order to all subjects.

The order and number of correct and incorrect recall responses were recorded for each participant in both the verbal and spatial IFR tasks separately. A response was scored correct if it matched one of the list items. In the verbal IFR task, a response was also scored as correct if it was a plural version of a singular list item (“boards” instead of “board” or vice-versa), or if it was a past-tense version of a present-tense list item (“shot” instead of “shoot” or vice-versa). Tulving and Colotla’s (1970) method was used to provide estimates of the number of items that were recalled from PM and SM in each task separately, and then these estimates were averaged across task to yield an average PM and SM score for each individual.

The extent to which participants used the recency order-of-report strategy was formally assessed within each IFR task separately. Two criteria were used to assess strategy (Unsworth et al., 2011), and participants were included only if they met at least one of the two criteria on both IFR tasks. First, probability-of-first-recall functions were generated for each participant from each of the verbal and spatial IFR tasks and then evaluated separately. Probability of first recall refers to the number of times the item presented at each serial position was reported first divided by the total number of trials. Probability-of-first-recall responses were averaged across the first three serial positions (i.e., the primacy positions) and across the last three serial positions (i.e., the recency positions). Participants were classified as using a recency strategy if the difference between the average recency and primacy responses was 0.10 or greater (see footnote 1 in Unsworth et al., 2011). Second, serial position functions, reflecting probability correct at each of the 12 serial positions, were also generated for each participant from each of the verbal and spatial IFR tasks and then evaluated separately. Probability correct responses were averaged across the first three serial positions and across the last three serial positions. Participants were classified as using a recency strategy if the difference between the average recency and primacy responses was 0.10 or greater (see footnote 1 in Unsworth et al., 2011).

Although inclusion in the present study required use of an experimentally-induced recency recall strategy during performance of both IFR tasks, it was also expected that individuals would still differ in the extent to which they utilized this strategy within a modality, which in turn could influence the relative number of items recalled from PM and SM. In the present study, the extent to which the recency strategy was utilized by each individual was reflected by the magnitude of the difference obtained between the average recency and primacy responses obtained in the probability of first response and accuracy measures within each task modality. In addition, because the same individual might differ in the extent to which they utilized this strategy across modalities, we derived this variable separately for each modality. A composite verbal SOR score was computed by averaging the standardized difference score obtained from each of the two measures within the verbal modality. Likewise, a composite spatial SOR score was computed by averaging the standardized difference score obtained from each of the two measures within the spatial modality. In both cases, higher scores reflected a stronger recency strategy wherein the number of items recalled from PM was relatively high and the number recalled from SM was relatively low, whereas lower scores reflected a weaker recency strategy wherein the number of items recalled from PM was relatively low and the number recalled from SM was relatively high.

Results

Means, standard deviations, skewness, and kurtosis values for PM, SM, and SOR are listed in Table 1 as a function of modality. The skewness and kurtosis values were all within acceptable values. The calculation of reliability estimates for these measures could not be provided because the individual trial data was no longer available for all of the data reported in Experiment 1. The bivariate correlations between the three variables are listed below the diagonal for each modality in Table 1, and the partial correlation between estimates of PM and SM holding SOR constant is listed above the diagonal for each modality.

Table 1 Descriptive statistics and correlations listed as a function of modality in Experiment 1. The bivariate correlations between variables are listed below the diagonal and the partial correlation between estimates of primary (PM) and secondary (SM) memory controlling for strength of recency (SOR) is listed above the diagonal

As expected, there was a small and non-significant correlation observed between estimates of PM and SM when SOR was not controlled in both the verbal (r = .10, p = .26) and spatial (r = .07, p = .42) modalities. However, there was also evidence that SOR may have obscured this relation in each modality. In the verbal modality, there was a significant positive correlation observed between the estimate of PM and SOR (r = .55, p < .0001) as well as a significant negative correlation observed between the estimate of SM and SOR (r = −.40, p < .0001). Likewise, in the spatial modality, there was also a significant positive correlation observed between the estimate of PM and SOR (r = .63, p < .0001) as well as a significant negative correlation observed between the estimate of SM and SOR (r = −.35, p < .0001). This pattern of correlations is consistent with the notion that the uncontrolled operation of SOR would cause the relation between estimates of PM and SM to be more negative than it otherwise would be.

Consistent with this interpretation, the partial correlation between estimates of PM and SM grew larger and became significant when SOR was controlled in both the verbal (r = .41, p < .0001) and spatial (r = .40, p < .0001) modalities, suggesting that the presence of SOR suppressed the positive relation between estimates of PM and SM.

This conclusion was also supported when regression was used to evaluate the effect of SOR as a third variable. Regardless of whether SOR is construed as a mediator or as a confounding variable, the influence of SOR on the estimates of the PM and SM components of WM can be referred to as the “indirect effect,” whereas the relation between estimates of the PM and SM components that remains when the indirect effect has been controlled can be referred to as the “direct effect,” as in standard mediation analyses (MacKinnon, Krull, & Lockwood, 2000; MacKinnon, Warsi, & Dwyer, 1995). According to MacKinnon et al. (2000), the indirect effect of SOR can be estimated by the difference between two regression coefficients: τ and τ’, where τ is the regression coefficient associated with the independent variable when it alone is used to predict the dependent variable (i.e., the “total effect”), and where τ’ is the regression coefficient associated with the independent variable when the third variable is also entered as a predictor of the dependent variable. In the following analyses, we treated the estimate of PM as the independent variable and the estimate of SM as the dependent variable, though the same pattern of results was obtained when these roles were reversed. In addition, following Sobel (1982, 1986), a 95 % confidence interval was computed around the estimate of the indirect effect.

In the verbal modality, the total effect (τ) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was not controlled was equal to .10 (p = .26), whereas the direct effect (τ’) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was controlled was equal to .45 (p < .0001). The indirect effect of SOR was therefore estimated to be .10−.45 = −.35 [95 % CI: −.18, −.53], a value that was significantly different than zero.

Likewise, in the spatial modality, the total effect (τ) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was not controlled was equal to .07 (p = .42), whereas the direct effect (τ’) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was controlled was equal to .48 (p < .0001). The indirect effect of SOR was therefore estimated to be .07 − .48 = −.41 [95 % CI: −.22, −.60], a value that was significantly different than zero.

In summary, the significant indirect effect of SOR observed in both modalities can be described as suppression because it attenuated the relation between estimates of PM and SM when it was left uncontrolled (MacKinnon et al., 2000).

Experiment 2

Experiment 1 showed that SOR suppressed the relation between estimates of PM and SM across both verbal and spatial modalities. However, this conclusion was based on a sample of adolescents, 32 of which had a study-confirmed diagnosis of ADHD. We therefore considered it prudent to investigate the extent to which the same pattern of suppression could be observed in a sample of adults. Accordingly, we re-analyzed the results of Shipstead et al.’s (2014) study; this study was chosen because it also instructed participants to use a recency recall initiation strategy during performance of two IFR tasks: one involving words and the other involving numbers.

Method

Participants

Shipstead et al. (2014) described their participants as follows: “The data were collected as part of a general screening procedure. All participants were residents of the general community of Atlanta and between the ages of 18–30 [years]. Participants were compensated with $30 per session, or credit toward course requirements (Georgia Tech students only). In total, 273 people consented to participate in a two[-]session study. Fifty-six either did not complete both sessions or were removed for reasons including disruptive behavior, copying of to-be-remembered items, not following instructions, or because they did not meet our inclusion criteria (age; 20/50 vision). In the final sample of 215 participants, the mean age was 22.31 years (SD = 3.70). [Forty-eight percent] were female. [Sixty percent] were either attending or had graduated from college. Sessions included 1–5 participants working individually at computers” (p. 120).

IFR tasks

Shipstead et al. (2014) described their word IFR task as follows: “Participants saw a series of 12 nouns, each of which was presented for 750 ms, followed by a 250-ms delay. Following the 12th word, participants were signaled to recall as many words as possible. The end of the recall period (30 s) was signaled by a beep that was played via headphones. Due to concern that community participants might have less typing experience than college students, responses were written on a sheet of paper. Participants were not required to recall the words in any order; however, the instructions stressed that recall should begin from the end of the list. This was done to regulate recall strategies across participants. Two practice trials were followed by 10 critical trials” (p. 122). In addition, Shipstead et al. (2014) described their number IFR task as follows: “This task was the same as word free recall, with the exception that participants saw [two-] and three-digit numbers, rather than words” (p. 122). Individual SOR scores were calculated in the same manner as in Experiment 1 for each of the word and number tasks separately.

Results

Means, standard deviations, skewness, kurtosis, and reliability values for PM, SM, and SOR are listed in Table 2 as a function of task. The skewness, kurtosis, and reliability values were all within acceptable values. The bivariate correlations between the three variables are listed below the diagonal for each task in Table 2, and the partial correlation between PM and SM holding SOR constant is listed above the diagonal for each task.

Table 2 Descriptive statistics and correlations listed as a function of task in Experiment 2. The bivariate correlations between variables are listed below the diagonal and the partial correlation between estimates of primary (PM) and secondary (SM) memory controlling for strength of recency (SOR) is listed above the diagonal

As expected, there was a small and non-significant correlation observed between estimates of PM and SM when SOR was not controlled in both the word (r = .12, p = .09) and number (r = .01, p = .86) tasks. However, there was also evidence that SOR may have obscured estimates of this relation in each task. In the word task, there was a significant positive correlation observed between the estimate of PM and SOR (r = .57, p < .0001) as well as a significant negative correlation observed between the estimate of SM and SOR (r = −.19, p = .005). Likewise, in the number task, there was also a significant positive correlation observed between the estimate of PM and SOR (r = .69, p < .0001) as well as a significant negative correlation observed between the estimate of SM and SOR (r = −.33, p < .0001). As in Experiment 1, this pattern of correlations is consistent with the notion that the uncontrolled operation of SOR would cause the relation between estimates of PM and SM to be more negative than it otherwise would be.

Consistent with this interpretation, the partial correlation between estimates of PM and SM grew larger and became significant when SOR was controlled in both the word (r = .28, p < .0001) and number (r = .35, p < .0001) tasks, suggesting that the presence of SOR suppressed the positive relation between estimates of PM and SM.

This conclusion was also supported when regression was used to estimate the indirect effect of SOR. In the word task, the total effect (τ) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was not controlled was equal to .12 (p = .09), whereas the direct effect (τ’) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was controlled was equal to .33 (p < .0001). The indirect effect of SOR was therefore estimated to be .12 − .33 = −.21 [95 % CI: −.11, −.32], a value that was significantly different than zero.

Likewise, in the number task, the total effect (τ) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was not controlled was equal to .01 (p = .86), whereas the direct effect (τ’) reflecting the relation between estimates of PM and SM when the indirect effect of SOR was controlled was equal to .46 (p < .0001). The indirect effect of SOR was therefore estimated to be .01 − .46 = −.45 [95 % CI: −.30, −.60], a value that was significantly different than zero.

In summary, consistent with Experiment 1, the significant indirect effect of SOR observed in both IFR tasks of Experiment 2 can be described as suppression because it attenuated the relation between estimates of PM and SM when it was left uncontrolled (MacKinnon et al., 2000).

General discussion

A common approach to measuring the PM and SM components of WM has been to apply Tulving and Colotla’s (1970) method to IFR tasks. Prior work has suggested that this method provides reliable and valid estimates of PM and SM (Craik & Birtwistle, 1971; Unsworth & Engle, 2007a; Unsworth et al., 2010; Watkins, 1974). Although these estimates have been analyzed for a variety of purposes, one reoccurring finding is that the correlation between estimates of the PM and SM components tends to hover around zero (Engle et al., 1999; Unsworth & Engle, 2007b; Unsworth et al., 2010; Shipstead et al., 2014), a finding that has reinforced the notion that PM and SM represent fully distinct and dissociable components of WM (see e.g., Gibson et al., 2010; Gibson et al., in press). However, the conclusion that PM and SM represent fully distinct and dissociable components of WM has conflicted with more recent studies that have used separate attention and retrieval tasks to estimate these capacities. Under these conditions, the correlation between estimates of the PM and SM components has been observed to be positive and moderately strong (Shipstead et al., 2014; Unsworth et al., 2014).

The present study attempted to resolve this empirical discrepancy by providing evidence that individual differences in SOR—the extent to which individuals used the experimentally-induced recency recall initiation strategy during IFR tasks—could have opposite effects on the relative number of items recalled from PM and SM. Although all of the participants considered in this article were instructed to adopt a recency recall initiation strategy, individual differences in SOR were found to be positively related to the number of items recalled from PM and negatively related to the number of items recalled from SM, and this indirect effect of SOR on estimates of PM and SM was found to be significant within the analytic framework proposed by MacKinnon et al., (2000). This finding therefore extends previous research (Gibson et al., 2010; Unsworth et al., 2011) indicating the importance of group-level differences in recall initiation strategies and suggests that the variation that remains between individuals within a group can still have an important confounding influence on estimates of the PM and SM components.

Of greatest importance, the present findings also showed that the positive correlation between estimates of PM and SM grew from a small size to a moderate size, and became significant, when the indirect effect of SOR was controlled (see Tables 1 and 2). This finding is important because it reconciles research that has used the IFR task to measure the PM and SM components of WM with more recent research that has used separate attention and retrieval tasks to measure these two components. One reason the control of SOR may have been important is because it allowed us to statistically equate the amount of interference that was experienced across participants in the IFR task, particularly during recall of primacy items from SM. Indeed, although interference is also a central feature of the delayed free recall tasks used to measure SM within the separate-task approach, the amount of interference experienced during the delay is equated across participants in virtue of the fact that it is under experimental control in these tasks and therefore does not need to be statistically controlled. Regardless, the conclusion that PM and SM represent related, yet distinct, components of WM can be stated with greater confidence in light of the present findings.

From a theoretical perspective, the conclusion that estimates of PM and SM share approximately 20 % of their variance is important and can serve to motivate a distinction between component-general and component-specific processes, which in turn motivates the application of “bi-factor” models to the study of PM and SM. According to Reise (2012), “[a] bifactor structural model specifies that the covariance among a set of item responses can be accounted for by a single general factor that reflects the common variance running among all scale items and group factors that reflect additional common variance among clusters of items, typically, with highly similar content. It is assumed that the general and group factors all are orthogonal. The general factor represents the conceptually broad “target” construct an instrument was designed to measure, and the group factors represent more conceptually narrow subdomain constructs” (p. 668).

By way of an analogy, consider Baddeley’s (1986) multicomponent model of WM. According to this model, WM can be divided into “domain-specific” storage systems versus a “domain-general” central executive. The storage systems are considered to be domain-specific, because phonological and visuospatial information are thought to be encoded and stored in separate buffers. In contrast, the central executive is considered to be domain general because it controls the maintenance and manipulation of information across the two storage systems as well as their interaction with long-term memory. In this view, the PM and SM components of WM are considered to be different aspects of the central executive.

In order to access these different domain-specific and domain-general abilities, researchers have administered both verbal and spatial WM tasks, and then identified both common and unique variance associated with these tasks (see e.g., Kane et al., 2004; Rapport, Alderson, Kofler, Sarver, Bolden, & Sims, 2008). As shown in the left panel of Fig. 1, the common variance has been construed as representing the domain-general central executive whereas the unique variance has been construed as representing the domain-specific storage systems. This distinction has been important because the common variance has been found to be positively related to higher-level cognitive abilities such as fluid intelligence (Kane et al., 2004) and negatively related to clinical symptoms of inattention and hyperactivity/impulsivity (Rapport et al., 2008). In contrast, the unique variance associated with the storage systems has contributed relatively little to the variance of these important outcomes.

Fig. 1
figure 1

Illustration of general and specific processes. The panel on the left depicts the distinction between domain-general and domain-specific aspects of working memory (WM), whereas the panel on the right depicts the distinction between component-general and component-specific aspects of WM. The figure is adapted from Rapport et al. (2008)

For instance, Rapport et al. (2008) derived the common and unique variances by using non-recursive regression techniques in which scores obtained from a verbal WM task were regressed on scores obtained from a spatial WM task, and conversely, scores obtained from a spatial WM task were regressed on scores obtained from a verbal WM task. The two resulting regression equations were then used to derive predicted verbal and spatial scores for each individual, and these predicted scores were averaged and used as a dependent measure of the domain-general (i.e., common variance) aspect of the central executive. In addition, residual verbal and spatial scores were also derived by subtracting the predicted scores from the actual scores, and these two residual scores were used as a dependent measure of the domain-specific (i.e., unique variance) aspect of the central executive.

Similar to the distinction between domain-general and domain-specific processes, the distinction between component-general and component-specific processes represents both common and unique processes, though in this latter case, both types of processes are considered to be aspects of the central executive. As shown in the right panel of Fig. 1, the common aspects likely represent common control processes that are shared by both the PM and SM components (though these processes are poorly understood at present), whereas the unique aspects represent the specific mechanisms associated with the PM and SM components. In the present study, the magnitudes of partial correlations observed between estimates of the PM and SM components of WM reflecting the component-general aspect were only moderate in size and limited by the reliabilities of the various IFR tasks used to estimate their capacities. Still, although the magnitude of these partial correlations may rise with the use of more reliable instruments, the component-specific aspects would still be expected to play a critical role.

More specifically, the ability to actively maintain goal-relevant information in PM is thought to rely on at least two component-specific attention mechanisms referred to as the “focus of attention” (Cowan, 1995, 2001; Cowan et al., 2005) and “executive attention” (Kane, Bleckley, Conway, & Engle, 2001; Kane & Engle, 2002, 2003; Unsworth, Schrock, & Engle, 2004). In contrast, the ability to retrieve information from SM is thought to rely on three component-specific memory mechanisms such as “the size of the cued search set,” “the recovery of potential targets from this set,” and “error monitoring” (Unsworth, 2007, 2009).

The conclusion that the central executive can be divided into component-general and component-specific processes is especially relevant to studies that have used the IFR task to examine potential differences in PM and SM abilities across groups. For instance, Gibson and his colleagues (Gibson et al., 2010; Gibson et al., in press) have applied the dual-component model of WM to ADHD using DSM-IV diagnostic criteria, and they have used Tulving and Colotla’s (1970) method to estimate the capacity of the PM and SM components from both verbal and spatial IFR tasks. The main findings have suggested that the retrieval of information from SM was more deficient than the maintenance of information in PM in adolescents with ADHD relative to age-matched controls. However, given that estimates of PM and SM appear to share at least some common variance, the magnitude of the group difference in PM is likely overestimated whereas the magnitude of the group difference in SM is likely underestimated.

Ideally, group (and individual) difference studies of PM and SM should isolate the component-general and component-specific aspects of these two components, and it might seem tempting to derive the common and unique variances by using Rapport et al.’s (2008) approach in which the PM component is regressed on the SM component, and the SM component is regressed on the PM component. The two resulting regression equations can be used to derive predicted PM and SM scores for each individual, and these predicted scores can be averaged and used as a dependent measure of the component-general (i.e., common variance) aspect of the central executive (see e.g., Rapport et al., 2008). Moreover, at first glance, it may seem as though the residual portions of the PM and SM scores could serve as a dependent measure of the component-specific (i.e., unique variance) aspects of the central executive and that this measure would be independent of the component-general aspects. After all, the residual portion of the PM score is orthogonal to SM, and the residual portion of the SM score is orthogonal to PM.

However, although the residual portion of the PM score is orthogonal to SM, and the residual portion of the SM score is orthogonal to PM, the two residual scores are not themselves orthogonal to each other. In fact, as shown in Appendix A, this is statistically impossible for any two variables, X and Y, in which −1 < ρXY < 1, where ρ is a parameter representing the correlation between X and Y in the population; note that the residual portion is equal to 0 when ρXY is equal to −1 or 1, and is thus undefined. Rather, as shown in Section D of Appendix A, when X is regressed on Y, and Y is regressed on X, the bivariate correlation between the residual portions of the X and Y scores, denoted here as êX and êY, must be equal to the negative bivariate correlation between these two variables. That is, ρ XY = −ρ êxêy, meaning that the part of the PM score that cannot be explained by SM is inherently correlated with the part of the SM score that cannot be explained by PM; this, of course, is exactly what we were trying to avoid by deriving the residuals in the first place.

Thus, the component-specific aspects of PM and SM will not represent orthogonal constructs, as they were intended to be, when they are derived as residuals, and this violates the assumption underlying bi-factor models that the general and specific factors all are orthogonal (Reise, 2012). The only orthogonal constructs shown in Fig. 1 are between the residual and SM-predicted portions of PM scores, and between the residual and PM-predicted portions of SM scores. Moreover, this limitation is not unique to Rapport et al.’s (2008) analytic approach, and would also apply to other approaches, as long as the PM and SM components were correlated. Likewise, this limitation is not unique to the single-task approach, but would also apply to the separate-task approach, as long as the PM and SM components were correlated.

In conclusion, the IFR task has been commonly used to estimate the capacities of the PM and SM components of WM. Using this method, the correlation between estimates of the PM and SM components has been observed to hover around zero, suggesting that PM and SM represent fully distinct and dissociable components of WM. However, this conclusion has conflicted with more recent studies that have observed moderately strong, positive correlations between PM and SM when separate attention and retrieval tasks are used to estimate these capacities, suggesting that PM and SM represent related capacities. The present study resolved this empirical discrepancy by showing that the positive relation between estimates of PM and SM has been suppressed by SOR reflecting differences in the extent to which individuals used the same experimentally-induced recency recall initiation strategy. As predicted, the present findings showed that the positive correlation between estimates of PM and SM grew from a small size to a moderate size, and became significant, when the indirect effect of SOR was controlled. This finding is important because it reconciles research that has used the IFR task to measure the PM and SM components of WM with more recent research that has used separate attention and retrieval tasks to measure these components. In so doing, this study has provided stronger support for the distinction between component-general and component-specific aspects of PM and SM, and as such, has justified the application of the bi-factor statistical model to the study of these components. However, this study also presented a statistical proof showing that the orthogonality assumption between component-specific aspects of PM and SM cannot be satisfied using regression techniques. Thus, the study of component-specific aspects of PM and SM will require the development of new statistical methods.