Odor working memory (WM) has received little scientific scrutiny. The main aim of this study was to find out whether we can keep information about olfactory stimuli in WM and to determine to what extent this performance is dependent on a verbal code. By using odors that are very hard to verbalize, we investigated whether odor WM was functional under circumstances in which little or no verbal information was present.

The nature of the code(s) underlying olfactory memory is an issue of controversy; the role of verbal coding, in particular, has been the focus of much argument, research, and discussion over the years (Herz & Eich, 1995; Jönsson & Olsson, 2003; Jönsson, Tchekhova, Lönner & Olsson, 2005; Larsson & Bäckman, 1997; Larsson, Öberg, & Bäckman, 2006; Lyman & McDaniel, 1986, 1990; Öberg, Larsson, & Bäckman, 2002; Olsson & Cain, 2000; Rabin & Cain, 1984; Yeshurun, Dudai, & Sobel, 2008; Zelano, Montag, Khan, & Sobel, 2009). The at times disparate conclusions of different researchers have ranged from the view that verbal codes play a nonessential role in episodic recognition (e.g., Engen, 1987; Herz & Engen, 1996) to the view that they play a central role (e.g., de Wijk, Schab, & Cain, 1995; Murphy, 1995; Murphy, Cain, Gilmore, & Skinner, 1991; Wilson & Stevenson, 2006). The majority of the evidence supports an important role for verbal codes in olfactory memory, but the relative roles that verbal and perceptual codes play have been more elusive and may vary between tasks.

To our knowledge, almost no studies have explicitly targeted olfactory WM (but see Dade, Zatorre, Evans & Jones-Gotman, 2001). Dade et al. utilized the n-back task (also used in the present study). In the n-back task, participants are continually and sequentially presented with stimuli—typically, at a fixed presentation rate—with the task being to decide whether each stimulus in the sequence matches the one that appeared n trials ago. The participants have to correctly recognize both the current stimulus as one previously presented and whether it was presented exactly n trials ago. Kane, Conway, Miura, and Colflesh (2007) argued that, as such, the n-back task has face validity as a WM task, because participants must maintain and update a dynamic rehearsal set while responding to each item. Concerns have recently been raised about the validity of the n-back task as a measure of WM, mainly due to the low correlation between n-back performance and that on other WM tasks, such as verbal WM span task performance (Kane et al., 2007), the digit span backward task (Miller, Price, Okun, Montijo & Bowers, 2009), and the reading span task (Jaeggi, Buschkuehl, Perrig, & Meier, 2010; but see Shamosh, DeYoung, Green, Reis, Johnson, Conway, Engle, Braver & Gray, 2008). This suggests that the n-back task measures a construct different from those in the other tasks. However, Schmiedek, Hildebrandt, Lövdén, Wilhelm, and Lindenberger (2009) argued that such correlations are underestimated by content-specific variance, task-specific variance, and measurement error. To handle this, they compared a set of three updating tasks (e.g., n-back) with a set of three complex span tasks (e.g., reading span). With a latent variable analysis, they found a latent correlation between a complex span factor and an updating factor that was close to perfect (r = .96), and both factors were equally predictive of performance on a test of fluid intelligence. In sum, Schmiedek and colleagues showed convincing evidence in favor of the n-back task as a valid measure of WM.

The n-back task has gained increased popularity lately, particularly in imaging studies (e.g., Dade et al., 2001; Kane et al., 2007; Owen, McMillan, Laird, & Bullmore, 2005). Dade et al. let 12 volunteers perform the 2-back task on six black-and-white photographic face images of men and women and six odors (peach, geraniol, eucalyptus oil, costus oil, patchouli oil, and cinnamon bark oil), with an intertrial interval of 6 s and a stimulus presentation duration of 4 s. In a discrimination task, the participants were able to correctly tell the difference between similar and highly similar odors in about 87% of the cases. Performance in the odor WM task (88%) was not reliably different from that in the face WM task (90%), suggesting a functional WM for olfactory information comparable to that for visual information. However, Dade et al. did not distinguish between the role of verbal and perceptual (olfactory) codes. Because they used mostly familiar odors that were easy to verbalize, their observed olfactory WM performance may have been carried by a verbal code.

Andrade and Donaldson (2007) compared the short-term memory (STM) performance in several conditions with different to-be-remembered stimuli. They showed that olfactory, as compared with verbal, information had little interfering impact on verbal memory (Experiment 1) and that performance in an olfactory STM task was the same with verbal interference as without any interference (Experiment 2). They noted that much of the evidence for Baddeley’s WM model (Baddeley, 1986; Baddeley & Hitch, 1974) has come from studies of tasks that are concurrent with and potentially interfere with the primary task of interest (i.e., dual-task studies). That is, if two concurrent tasks share the same resources, this will impair performance more than if they do not, because the resources are limited. These results were used as support for the idea of a separate olfactory slave system in WM (see also Yeshurun et al., 2008; Zelano et al., 2009). Furthermore, White, Hornung, Kurtz, Treisman and Sheehe (1998) found that perceptual errors were reliably more frequent than phonological errors in a free recall task of odors, which led them to conclude that odors are encoded perceptually rather than verbally. However, in an STM task, Murphy et al. (1991, Experiment 4) used a retention interval of 30 s and showed that for both the young and the elderly, a backward-counting task disrupted episodic recognition of odors. Although this indicates that semantic information is important, Murphy et al. also noted that resources are limited and that, since counting backward takes resources, performance should decrease just due to this. Also, Engen, Kuisma, and Eimas (1973) found a nominal but not statistically reliable (very few participants were tested) negative effect of backward counting on odor STM after a 3- to 30-s retention interval.

In sum, although some studies have focused on olfactory STM (see White, 1998, for a review), very few studies have included more complex memory tasks in line with the definitions of WM (but see Dade et al., 2001). The STM studies reviewed above suggest that information can be temporarily stored in an olfactory code in a dedicated olfactory system of WM, but given that so few studies exist, it is too early to draw any definite conclusions. In the present research, we further pursued the issue of a functional olfactory WM, with the focus on the relative role of verbal and perceptual codes. In two experiments, we investigated WM performance for odors in a 2-back task. Effects of verbalization of odors on the performance in this task were determined (Experiments 1 and 2). More specifically, we compared 2-back performance for odors that are both familiar and easy to name and/or verbalize with odors that are less familiar and very hard to verbalize (i.e., they do not have a veridical label or trigger very many other verbal descriptors or associations). Of particular interest is whether n-back performance is above chance level for the latter odors. Although above-chance performance for these odors would not prove the existence of a separate olfactory slave system, it would be consistent with the idea, whereas failure to show above-chance performance would speak against this notion. Finally, we also investigated odor discrimination as a limiting factor for olfactory WM performance (Experiment 2).

Standardization of the odorants

Twenty-one psychology students (16 women; age: M = 28.38 years, SD = 9.29, range = 21–51 years) at Uppsala University volunteered for the experiment. They reported having a normal sense of smell and were paid two movie vouchers for their participation. Twelve highly familiar and 12 highly unfamiliar odorants were selected for the experiment, on the basis of previous research (Møller, Hansen, Mojet, & Köster, 2010; Møller, Wulff, & Köster, 2004; Sulmont, Issanchou, & Köster, 2002). The odorants were presented in 160-ml tinted glass jars with screw lids, with a unique number printed on each lid. Each jar contained 0.5 ml of the odorant essence applied on a cotton pad, but with some few exceptions where an odorant was diluted with odorless mineral oil to achieve roughly equal intensities across the set of odorants (see the Appendix). A second cotton pad covered the first to prevent visual inspection of the stimulus materials.

All participants were tested at the same time in a rearranged classroom with good ventilation. The pilot experiment consisted of two sessions. In the first session, the participants performed a recognition task (which is not further reported here). A second session followed after a short break. In the second session, the participants sampled each of the 24 odors once and (1) judged how familiar the odor was on a category scale from 1 to 7 (1 = not at all familiar; 7 = extremely familiar) and (2) wrote down anything that came to mind when they smelled the odor. The latter was done to get a normative verbalizability score for each odor.

Scoring of the verbalization responses

Because 12 of the odors were odors that did not have a natural correct response (the odorants with chemical names in the Appendix), we used a very liberal method to score verbalizability. No response or a generic response of the kind “disgusting,” “good,” “old,” and so forth was scored as nonverbalizable. All other responses (i.e., responses that were descriptions of actual objects or associations to places) were scored as verbalizable, independently of their correctness. Each odor was then assigned a familiarity and verbalizability score based on the mean performance across participants for each odor. The 12 most verbalizable odorants were assigned to the high-verbalization category, and the other 12 to the low-verbalization category, and similarly for the high-/low-familiarity division (see the Appendix).

Experiment 1

The present experiment assessed whether a functional WM performance can be observed for odors in the 2-back task. This has been indicated once by Dade et al. (2001), but only for highly familiar odors that are relatively easy for participants to label. Of key interest here is the relative level of performance for the odors that are very hard to verbalize. Would performance be worse for these odors than for those that are easy to verbalize? Moreover, would performance for the unfamiliar and hard-to-verbalize odors be at all above chance?

Method

Participants

Nineteen participants (15 women; M = 25.74 years, SD = 6.07, range= 21–43 years) from the Department of Psychology at Uppsala University, who were not included in the pilot study, participated in the experiment. They all reported having a normal sense of smell and were paid a movie voucher for their participation.

Materials and design

Twelve odorants were used in this experiment—namely, the six least and the six most verbalizable odors from the pilot experiment (see the Appendix). Odorants were presented in 160-ml tinted glass jars with screw lids, with a unique number printed on each lid. Except for peppermint, each jar contained 0.5 ml of the odorant essence applied on a cotton pad. Peppermint was diluted with odorless mineral oil to achieve roughly equal intensities across the set of odorants (see the Appendix). A second cotton pad covered the first to prevent visual inspection of the stimulus materials.

2-back task presentation lists

Each participant received a different randomized presentation order of the odors, but with the following constraints: Each odor always appeared as a target once (i.e., it was identical to the odor that was presented 2 trials back) and as a lure twice (i.e., it was not the same odor as the one presented 2 trials back). Altogether, each list consisted of 12 targets and 24 lures. To clarify, each odorant was presented three times, two times as a lure and one time as a target, and the experiment consisted of 36 odor trials in total. Hence, the probability of an odor being a target odor was .33.

Design and analysis

The design comprised one within-group factor—verbalizability (high/low)—and three dependent variables. The dependent variables analyzed were (1) hit rate, (2) false alarm rate, and (3) A′. A′ is the nonparametric equivalent to the signal detection theory measure d′ (Pollack & Norman, 1964). The hit rate was calculated as the proportion of correct responses when the currently sampled odor in a series was the same as that two trials back (e.g., trials 2 and 4 in Fig. 1). The currently sampled stimulus/odor denotes the odor that should be compared with the odor two trials back.

Fig. 1
figure 1

Five example trials of the 2-back task in Experiment 1. The experimenter presented the participants with an odor every 10 s on 36 consecutive trials. On each trial, the participants sampled the odor once and replied “yes” if they thought that the currently sampled odor was the same as the odor two trials back or “no” if they thought that it was different. In the figure, the odors on trials 3 and 5 are different from the one two trials back (i.e., lures). Whereas the odor on trial 4 is identical to the odor two trials back (i.e., target)

The false alarm rate was calculated as the proportion of incorrect responses when the currently sampled odor was not identical to the one presented two trials back (e.g., trials 3 and 5 in Fig. 1). If hits (H) ≥ false alarms (F), A′ was computed as A′ = .5 + (H − F)(1+H − F)/4H(1 − F), but if hits < false alarms, the following formula was used: A′ = .5 −(F − H)(1+F −H)/4F(1 − H). This is in accordance with Stanislaw and Todorov (1999, Eq. 2). Extreme values of H and F were adjusted by replacing zero values with .5/n and values of 1.0 with (n −.5)/n, where n is the number of trials. In both experiments, an alpha level of .05 was used. Missing values, if any, were handled by using case-wise deletion. Effect sizes are denoted by Cohen’s d for the t-tests. Because there cannot be any target odors for the first two trials, they were excluded from the analyses. As noted, an odor is a target odor if it is the same as the odor presented two trials back. If it is not, it is a lure.

Procedure

All participants were tested individually. They first filled in a questionnaire that asked for some background data, such as age, gender, and whether they had a normal sense of smell, after which they read the instructions for the experiment, where the 2-back task procedure was explained for them. In the 2-back task, participants are continually and sequentially presented with odors, and for each odor, they should decide whether it is the same or not as the one presented two trials ago. The 2-back task is described in more detail in Fig. 1. All participants had a few 2-back practice trials before the experiment started. The experiment took, on average, 6 min, excluding instructions.

Results

To analyze working memory performance as a function of odor verbalizability, the three dependent variables—A′, hit rate, and false alarm rate—were analyzed separately. In the first analyses below, the verbalizability division was based on the currently sampled odor in the sequence—that is, the odor that should be compared with the odor two trials back. If the currently sampled odor is verbalizable (e.g., the trial 5 odor in Fig. 1, which is compared with the 2-back odor on trial 3), rejecting it as different from the 2-back odor may be easier than if it is not. This analysis ignores the verbalizability of the 2-back odor. Another way of dividing the data is presented under “Further analysis” below.

False alarm rate

First, we examined the false alarm rate. As can be seen in Table 1, there was a significant effect of verbalizability, t(18) = 2.22, d = 0.65, p = .04, with fewer false alarms for the more verbalizable than for the less verbalizable odors.

Table 1 Mean performance (SD) in the 2-back task in Experiment 1 as a function of whether the odorants belonged to the high- or low-verbalization category

Hit rate

The hit rate analysis showed a significant effect of verbalizability, t(18) = 2.92, d = 0.77, p = .01. Performance was higher for the highly verbalizable odors than for the less verbalizable odors (Table 1).

A

There was a significant main effect of verbalizability, t(18) = 2.79, d = 0.86, p = .01, with a higher performance for the highly verbalizable odors than for the less verbalizable odors (Table 1). Of importance is also that the participants also performed well above chance level (i.e., .50) for the low-verbalizability odors, as tested by a one-sample t-test, t(18) = 6.42, p<.001.

Further analysis

Whereas the verbalizability of the currently sampled odor, as analyzed above, may be important, the 2-back odor may also be easier to keep in working memory, for later comparison, if it is verbalizable. Hence, here, we instead split the odors into two categories on the basis of the verbalizability of the 2-back odor only, while ignoring the verbalizability of the currently sampled odor. Deciding whether the currently sampled odor is the same as the 2-back odor should be easier the better the memory of the 2-back odor is (e.g., at trial 5, the memory of the trial 3 odor might be better if it was verbalizable). The false alarm rate analysis showed no effect of verbalizability (M high = .18, SD high = .16; M low = .16,SD low = .13), t(18) = 0.57, d = 0.14, p = .59. There was a tendency toward an effect of verbalizability when analyzing A′. The performance tended to be higher for the highly verbalizable odors (M = .87, SD = .09) than for the less verbalizable odors (M = .81, SD = .12), t(18) 1.89, d = 0.49, p = .08. It should be noted that for the calculation of hit rates, the 2-back odor is the same as the current odor (e.g., compare trials 2 and 4 in Fig. 1). Hence, the hit rate analysis is identical to the one above, where a significant effect of verbalizability was found.

Experiment 2

The first experiment demonstrated that 2-back task performance is better for more verbalizable odors. Most notably, performance was well above chance levels even for the low-verbalizability odors. In Experiment 2, we used an even more distinct manipulation of verbalization. First, more distinct high- and low-verbalization categories were used (see the Appendix). Second, the high- and low-verbalization odors were not presented interchangeably, but in two separate but consecutive blocks (first all low-verbalization odors and then all high, or vice versa). Moreover, each odor was repeated five times instead of three, one time as a target and four times as a lure. Would above-chance performance for the low-verbalizability odors persist under these circumstances? If it did not, it would mean that we cannot keep olfactory information in working memory without accompanying verbal information.

Participants’ working memory cannot be better than their ability to discriminate between the odors. In fact, the ability to discriminate between odors must be a limiting factor in any measure of odor recognition memory. We measured discrimination among the odorants used in the 2-back task in order to assess to what degree odor discrimination limits odor WM performance. Although it is impossible to measure odor discrimination without at least some minimal load on short-term memory, we assumed that the load on short-term memory would be much less in a discrimination test than in the 2-back task.

Method

Participants

Forty participants (20 women; M = 25.50 years, SD = 6.43, range = 20–48) from the Department of Psychology at Uppsala University participated in the experiment. They all reported having a normal sense of smell and were paid a movie voucher for their participation.

Materials and design

The five least and the five most verbalizable odors (see the Appendix) from the pilot experiment was used in both the discrimination and the 2-back tasks of this experiment. They were prepared exactly as in Experiment 1.

2-back task presentation lists

The high- and low-verbalizability odors were presented in two separate blocks, with no pause in between the blocks. The presentation order of the blocks was counterbalanced across the participants. For each block, each participant received a unique randomized presentation order of the odors, but with the following constraints: Each odor always appeared as a target once (i.e., it was identical to the odor that was presented two trials back) and as a lure four times (i.e., it was not the same odor as the one presented two trials back). Altogether, each list consisted of two blocks with five targets and 20 lures in each, plus two buffer items at the beginning of each block. Capronaldehyde was shown twice at the beginning of the low-verbalizability block, and pineapple was shown twice at the beginning of the high-verbalizability block. The working memory task totaled 54 trials, including the two blocks and four buffer items. The buffer items were excluded from all analyses. The experimenter filled in all answers on an answering sheet. The probability of an odor being a target odor was .20.

Odor discrimination presentation lists

The high- and low-verbalizability odors were presented in two separate blocks, with no pause in between. Each block consisted of 15 trials, with 5 target trials (the same odor was shown twice in quick succession) and 10 trials with two different odors. To clarify, each odor within a block was compared with all other odors in that block (lures) and once with itself (target). The presentation order of each paired comparison was fully randomized for each participant, and the presentation order of the two blocks was fully counterbalanced across the participants. This session consisted of 30 trials in total, and the probability of two odors being the same was .33.

Procedure

All participants first performed the odor discrimination task, followed by the 2-back task. They were first welcomed and seated near a table in front of the experimenter. The participant first filled in a short questionnaire about background data, such as age, sex, and whether they had a normal sense of smell, and was then given the instructions for the discrimination task. The instructions for the 2-back task were given after the discrimination task had been finished. Before each task started, the experimenter demonstrated the task, let the participant practice with empty jars, and ensured that the participant had understood all instructions.

Odor discrimination

The participants were tested individually and, on each trial, they were presented with two odors, with the task of telling whether they were the same or different. The participant smelled the first odor once and then, immediately after, smelled the second odor. The trials were separated by 10–15 s. The experimenter wrote down all responses. This phase lasted about 6 min, excluding instructions.

2-back task

The 2-back task was identical to that in Experiment 1, but the interstimulus interval was 10–15s to match the intertrial interval in the discrimination task. The participants were tested individually. The experimenter wrote down all responses. This session lasted about 11 min, excluding instructions.

Results and discussion

In the first experiment, we performed two different analyses, based on two different divisions of the high- and low-verbalization categories (either the verbalizability status of the current odor or the 2-back odor). Because the high- and low-verbalizability odors were presented in two separate blocks in Experiment 2, this kind of division was not meaningful here.

Working memory

There was a clear and reliable difference in WM performance for odor sets of high and low verbalizability (see Table 2), which was true for both A′, t(39) = 6.05, d = 1.23, p< .001, and false alarms, t(39) = 8.55, d = 1.62, p< .001, as well as for hits, t(39) = 3.60, d = 0.75, p< .001. Nominally, there was a notable performance drop in the low-verbalization category in this experiment, as compared with Experiment 1.

Table 2 Group mean performance (SD) for A′, hit rate (Hits) and false alarm rate (FA) in the 2-back task and in the discrimination task in Experiment 2 for odor sets of high and low verbalizability

Discrimination

To discriminate between odor qualities is not a trivial task (Olsson & Cain, 2000; Rabin, 1988; Wise, Olsson, & Cain, 2000). Although the odorants used here were not originally chosen to challenge participants’ discrimination skills, performance across all odors was nevertheless significantly below 100%, as shown by a one-sample t-test, t(39) = 10.56, p < .001. Table 2 shows that for the high-verbalizability odors, discrimination performance was close to perfect, high enough to suggest a ceiling effect. The hit rate was high, the false alarm rate was low, and, consequently, the A′ value was high. However, for the less verbalizable odors, the participants did have discrimination problems, as evidenced by a significantly lower hit rate, t(39) = 4.33, d = 0.78, p< .001, a significantly higher false alarm rate, t(39) = 7.53, d = 1.35, p< .001, and a significantly lower A′ value, t(39) = 6.96, d = 1.46, p< .001.

Working memory as a function of discrimination

As was noted previously, measured WM performance cannot be higher than the participants’ ability to discriminate between the odors. As is evident from Table 2, this assumption was supported by the data, since discrimination performance was significantly higher than WM performance for both the high-verbalization odors, t(39) = 4.17, p< .001, and the low-verbalization odors, t(39) = 4.23, p< .001. The participants’ ability to discriminate between the odors limited WM performance, and both discrimination and WM performance were lower if the odors were hard to verbalize. A further question is whether WM performance is more dependent on a verbal code than is discrimination. If it is not, the decrease in WM performance as a function of verbalizability is due only to an increased difficulty in discriminating between the odors. In such a case, the difference in performance between high- and low-verbalizability odors should be equally large for both tasks. If WM performance is dependent not only on odor discriminability, but also on an increased difficulty in remembering less verbalizable odors, due to factors other than just discrimination, performance should be significantly more affected by odor verbalizability in the WM task than in the discrimination task. To test this, we first calculated two difference scores, one for the discrimination task and one for the WM task. For each task, we subtracted the performance, as measured with A′, for the odors that were easy to verbalize from those that were not. The data were then entered into a paired samples t-test, t(39) = 2.30, d = 0.48, p = .03, which showed a medium effect size and a significant difference between the two tasks. The difference in performance between high- and low-verbalizability odors was larger in the WM task (M = .20, SD = .21) than in the discrimination task (M = .12, SD = .11).

To conclude, whereas both the ability to remember and the ability to discriminate differ between odors that are easy and difficult to verbalize, WM may be more sensitive to the level of verbalization than is discrimination. However, one caveat with this analysis is that we cannot be sure that the functional scales are equivalent in the two tasks—that is, whether a difference of, for example, 10% in the 2-back task means the same thing as a difference of 10% in the discrimination task. Hence, this conclusion needs further confirmation.

Item analysis

The relation between verbalization and performance in the WM task was also analyzed across items. The verbalization score from the initial standardization was used as a predictor and 2-back performance (A′) as the dependent variable. A′ was calculated for the ten odorants in this experiment. A linear regression analysis showed a strong relationship between the individual odorants’ verbalization scores and WM performance (Fig. 2). Clearly, WM performance drops as odors get harder to verbalize (r xy = .98, p< .001). The least verbalizable odor in our dataset had a normative verbalizability value of .24 (see the Appendix). If verbalization is set to .24, the regression equation gives a predicted 2-back performance of .65 ±95% CIs [.59, .72], and the lower bound of the confidence level is still above chance level (.50).

Fig. 2
figure 2

Graph of the verbalizability scores (based on the pilot experiment) for the 10 individual odorants used in Experiment 2 in relation to memory performance (A′) in the 2-back task

General discussion

In two experiments, we investigated olfactory WM. Olfactory information can be retained in the short term and continuously be updated for comparison with new probes. This is possible not only for highly verbalizable odors, as shown here and by Dade and colleagues (2001), but also, although to a lesser extent, for ones that are hard to verbalize. However, verbal information appears crucial to perform at levels seen for other modalities (cf. Dade et al., 2001). Although our results are in agreement with the view that verbal information is a central aspect of cognitive processing of odors (Wilson & Stevenson, 2006), the exact nature of how verbalization affects performance is not clear. An odor percept that is accompanied by identification (i.e., knowing what it is) and by the retrieval of a proper name or other label is more likely to be remembered than just an odor percept free of associations (e.g., Lyman & McDaniel, 1986, 1990; see Jönsson et al., 2005, for a discussion of the identification vs. naming distinction).

There is evidence that an encoded item is processed in parallel by at least two processes or is processed differently depending on whether it is identified or not. This is not unique for olfaction (e.g., Cleary, 2002). A specific version of dual-code theory has been proposed by Herz and Eich (1995) for odor processing. If verbal information is available for an odor, rather than complementing perceptual processing, processing switches from an olfactory to a semantic code. Thus, when semantic information is absent, the encoding and retention would be driven mainly by olfactory/perceptual processes. Indeed, in recent studies of long-term episodic recognition, the forgetting curves were similar for verbal information (word memory) and verbalizable odors, whereas it differed for nonverbalizable odors (Jönsson, Sikström, Willander & Larsson, 2011; Olsson, Lundgren, Soares & Johansson, 2009). Forgetting was faster for the latter odors.

Although it is fairly easy to compare different levels of verbalization of odors to study its role in olfactory processing, creating an odor set that is not influenced by verbalization at all is impossible. This makes conclusions concerning the role of a pure perceptual code elusive. In the present study, we used a more extreme division between verbalizable and nonverbalizable odors than is typically used. But even when an odor is far from identified, verbal or other information that can support retention can still be present, since most odors evoke at least some associations, verbal or other. The regression analysis across items (Fig. 2) showed a clear decrease in performance as a function of verbalization, but the performance was still above chance even for the least verbalizable odor. This indicates that the successful completion of the 2-back task is very dependent on verbalization, but it does not exclude the possibility of above-chance performance as independently carried by a perceptual code.

Some support for the notion of a perceptual code has come from Møller et al. (2004). They tested the same very unfamiliar odors as we did in an episodic recognition test and also measured odor verbalizability. They found that labels given to these highly unfamiliar odors did not aid memory performance and concluded that they had measured pure olfactory memory free from effects of verbal involvement (i.e., memory as carried by an olfactory code only). Along the same line, Andrade and Donaldson (2007) investigated the role of both verbal and visual information in odor memory. They showed that short-term recognition of odors was selectively challenged by a concurrent odor memory task, whereas concurrent visual and verbal memory tasks had no effect on odor memory. In the framework of working memory slave systems, an identified odor can be represented visually and, if named, also by subvocal articulation (Baddeley & Hitch, 1974). Andrade and Donaldson interpreted their findings as consistent with the idea of an additional store dedicated to temporary maintenance of olfactory information. Another study that showed that odor imagery and perception activate overlapping brain areas is also consistent with this idea (Levy, Henkin, Lin, Hutter & Schellinger, 1999). Although the present results are consistent with the idea of an olfactory slave system, they do not in themselves prove the existence of such a system. Apart from its theoretical relevance, our results have implications for the understanding of tasks of a more applied nature as well. When multiple food, beer, wine, or fragrance sources are evaluated, the task is similar to the n-back task, in the sense that multiple odors need to be kept in mind for comparison. Valentin, Chollet, Beal, and Patris (2007) showed that beer experts outperform novices in a recognition memory task, but to our knowledge, it has yet to be shown in a WM task.

Almost all tests of olfactory performance are limited to some degree by the property of discrimination (Cain, 1979; Olsson & Cain, 2000; Schab & Cain, 1991; Wise et al., 2000). De Wijk et al. (1995) noted that “all subsequent higher order processing, including recognition memory and identification, can only be as accurate as the resolution of the sensory system” (p. 24). Indeed, participants’ ability to discriminate between odors has repeatedly been found to correlate significantly with their ability to name odors (e.g., Cain, de Wijk, Lulejian, Schiet, & See, 1998; de Wijk & Cain, 1994; Eskenazi, Cain, Novelly, & Friend, 1983; Rabin, 1988). Hence, in Experiment 2, we argued that discrimination ability should be a limiting factor also in the 2-back task. We measured the discriminability among the high- and low-verbalization odors, and as was expected, discrimination performance covaried with odor verbalizability. Performance was nearly perfect for high-verbalization odors but was significantly lower for low-verbalization odors. It is therefore plausible to assume that odor discrimination ability did not limit WM performance for the highly verbalizable odors, at least not to a substantial degree, but did so for the less verbalizable odors. In addition, Experiment 2 suggested that the difference in WM performance as a function of verbalizability cannot be attributed only to discrimination ability, because verbalizability affected WM performance more than discrimination performance. However, the latter finding has to be confirmed in future studies.

Noteworthy is that short-term processing of odors may also be interpreted in terms of an amodal account (Baddeley, 2000; Hay, Smyth, Hitch & Horton, 2007). Johnson and Miles (2009) found that memory for serial position differed depending on the sense modality that conveyed the stimulus. Olfaction was unique among modalities in showing no serial position effects. But instead of just concluding that olfactory memory was different, they also considered an amodal account (Baddeley, 2000; Hay et al., 2007). With reference to Brown, Neath, and Chater’s (2007; see also Hay et al., 2007) SIMPLE model, it can be argued that the more verbalizable an odor is, the more psychologically distinctive and, therefore, less confusable or more discriminable it is in memory. The distinctiveness can be a consequence of higher stimulus familiarity and easier processing, and odors that are easy to verbalize are more likely to be more distinctive. Similar ideas concerning distinctiveness have been discussed for short-term visual memory (Hay et al., 2007; Neath & Brown, 2006), for long-term visual memory (Rajaram, 1996), as well as odor memory (Olsson et al., 2009). To conclude, although WM for odors benefit from verbalization, we cannot yet specify exactly why.

Altogether, the present study demonstrates the ability to maintain information about odorants online, updating this information in service of correctly matching odors in a series to previously presented ones. This is true for nameable odors, but also, to a lesser degree, for odors that are notoriously difficult to name. This is in line with the notion of a separate olfactory slave system, but it is premature to draw firm conclusions at this point.