The formation of natural categories is critical to survival for species faced with challenges such as distinguishing food items from nonpalatable objects and prey from predators. Given the fundamental importance of natural concepts in determining the appropriate response to novel objects, it is expected that most animal species would form natural categories, despite their notoriously fuzzy boundaries (Malt, 1993; McCloskey & Glucksberg, 1978). Natural categories may be inferred from perceptual and functional attributes that are necessary, sufficient, and perhaps prototypical for a member of that category. A deep literature grounded in cognitive, comparative, and developmental psychology has established that categories vary to the extent that they are tied to singular perceptual features or abstracted from patterns of features, functions, and attributes (Fize, Cauchoix, & Fabre-Thorpe, 2011; Gelman & Medin, 1993; Herrnstein, 1990; Mervis & Rosch, 1981; Roberts & Mazmanian, 1988; Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976; Vonk & MacDonald, 2002, 2004; Vonk & Povinelli, 2012; Zentall, Wasserman, Lazareva, Thompson, & Ratterman, 2008). To the extent that categories vary in inclusiveness, they will vary in the degree of abstraction that is required to identify category membership. For instance, a concrete-level category may include members of only a particular species, whose physical features overlap to a significant degree. The category of chimpanzee includes animals that have four limbs, dark coarse hair, dark human-like faces as adults, and so on. When contrasted with the category human, an organism need only attend to skin color or hair pattern to discriminate the categories. The category of primate, on the other hand, includes a greater number of species that vary substantially in coloration, size, and other features, such as presence of a tail, ischial callosities, and so on. It is expected that more abstract categories will be more difficult to acquire. A concept is said to be acquired when an organism generalizes learned responses to novel exemplars.

Although all concepts may be said to be grounded in perception (Goldston & Barsalou, 1998), natural categories vary in the extent to which they can be perceived without further abstraction. For instance, one may form a category for marine mammals that is more inclusive than pinnipeds, because it would also include cetaceans, but members of both groups share several physical features. Further abstraction is required to determine how inclusive the category is given a particular categorization task. Although it has been argued that abstract concepts refer only to concepts of abstract relations (Herrnstein, 1990), it is clear that acquisition of broad, natural categories, such as foods, predators, and animals, requires an abstraction beyond recognition of few shared features (Gelman & Medin, 1993). It has been suggested that the formation of superordinate categories relies less upon perceptual feature analysis and more on an understanding of how the category coheres across significant perceptual variance (Spalding & Ross, 2000). Category discriminations can be presented along a continuum of abstraction where concrete-level discriminations may be made by identifying key perceptual features that are shared between exemplars, whereas more abstract discriminations must be made on the basis of inferring a relationship among the exemplars beyond shared perceptual features (Fize et al., 2011; Roberts & Mazmanian, 1988; Vonk & MacDonald, 2002, 2004; Zentall et al., 2008). For humans, with a certain degree of biological knowledge, members of the group animal are identified by various nonperceptible features, such as the ability to breathe, feed, and reproduce (Eimas & Quinn, 1994; Hampton, 1998; Inagaki, 1989). However, when investigating categorization in nonverbal organisms, researchers are restricted to manipulating category inclusiveness with pictorial exemplars. Thus, even at the highest levels of abstraction, categorization must necessarily depend on an analysis of shared physical features, regardless of what traits are inferred on the basis of those features. In addition, experiments have demonstrated that pigeons may use observable features such as axis of orientation, rather than abstract categories, to discriminate images of different types of animals, such as birds and mammals (Cook, Wright, & Drachman, 2012). Thus, even relatively inclusive categories may be discriminated using simple, perceptual features rather than overarching concepts. Still, manipulating levels of abstraction can elucidate the extent to which nonhumans rely on particular perceptual features, rather than inferring relatedness among more perceptually diverse stimuli.

Humans progress developmentally from forming concepts on the basis of perceptual features to only later being able to abstract out commonalities among stimuli at a more conceptual level (Sloutsky, 2010). Given the late ontological development of abstraction in humans, it was once hypothesized that only our phylogenetically closest relatives would share the capacity. Several more distantly related species have now shown the capacity for abstract concept formation. For example, pigeons (Herrnstein, 1979; Herrnstein, Loveland, & Cable, 1976) have been shown to acquire concepts at a concrete level, where exemplars within the categories share many perceptual features in common (e.g., humans, landscapes), but they have also been shown to acquire concepts at intermediate (e.g., cars, cats, flowers) and abstract (e.g., artificial, natural) levels (Lazareva, Freiburger, & Wasserman, 2004). However, as was indicated above, other studies have also called into question the degree of abstraction that is involved in making such discriminations (Cook et al., 2012). In addition, given that very distantly related species, such as honeybees, have been shown to categorize at reasonably abstract levels (Benard, Stach, & Giurfa, 2006), the extent to which the process is an analogue to abstraction in humans has been questioned.

Lazareva, Soto, and Wasserman (2010) have suggested that even abstract concepts may be readily discriminated because, although the exemplars within categories are perceptually diverse, there is also much greater variance and less overlap between the categories. The category discrimination tasks traditionally presented to pigeons and other animals may, therefore, allow even the more abstract categories to be discriminated on a perceptual basis (Vonk & Povinelli, 2012). Thus, the degree of feature overlap both within and between categories is likely important for category discrimination (Rosch et al., 1976). Intermediate-level categories that are highly inclusive within categories but also share many features between categories (e.g., dogs vs. cats) may be more difficult to discriminate than abstract-level categories, such as animals versus foods, that are also highly inclusive within categories but share fewer features between categories. An animal’s pattern of responses with various category discriminations can therefore be informative with regard to the extent that the animal relies on between- or within-category similarity.

The recent use of touch screen technology with a wider range of species has allowed researchers to test for concepts extrapolated from pictorial stimuli in many previously untested taxonomic groups, such as carnivores (domestic dogs [Range, Aust, Steurer, & Huber, 2008] and black bears [Vonk, Jett, & Mosteller, 2012]). Range et al. demonstrated that dogs were able to form concepts at the concrete level. The dogs were reinforced for selecting images of dogs rather than landscapes. They were not tested on more abstract-level discriminations in which exemplars within categories might not share many perceptual features in common. For instance, Roberts and Mazmanian (1988) and Vonk and MacDonald (2002, 2004) varied abstraction in sets of natural stimuli such that sets of stimuli involved animals belonging to close taxonomic groups (concrete) or more global, superordinate (abstract) categories. Concrete categories included members of the same species, while intermediate categories included members of the same family or order, and the most abstract category included animals rather than nonanimals.

Roberts and Mazmanian (1988) found that pigeons and monkeys encountered the most difficulty with an intermediate level of abstraction, which roughly corresponds to the basic-level categories that human children presumably acquire first during development and form most quickly during learning (Mervis & Rosch, 1981; Rosch et al., 1976). That is, despite the prediction that animals would encounter the greatest difficulty at the most abstract level, both pigeons and squirrel monkeys could learn to accurately categorize animals and nonanimals at the most abstract level but failed to acquire the bird versus nonbird discrimination at the intermediate-level discrimination even with additional training (Roberts & Mazmanian, 1988). This finding may be more consistent with Mandler’s (2000) view, supported by several studies, in which the conceptualization of superordinate categories by human infants may actually precede that of basic-level categories. Mandler’s description highlights the distinction between perceptual and conceptual categorization, a distinction not traditionally acknowledged by other researchers. In her view, basic-level categories may be categorized perceptually before superordinate categories, but with regards to conceptual categories, which are based on kind as opposed to perceptual similarity, more global, abstract categories, such as animals, foods, and so on, may emerge first. Thus, there is some disagreement as to what sorts of categorization depend upon physical similarity, depend upon language, and emerge first in the human infant. By studying concept formation in nonhuman primates, one can determine parallels in the emergence of non-language-based categorization. However, the two types of categorization (perceptual and conceptual) may be even more difficult to disentangle with nonhuman primates, given restrictions with regard to presenting visual stimuli.

Kendrick, Wright, and Cook (1990) later found that pigeons could learn an intermediate -level discrimination of birds versus mammals, if a sufficient number of exemplars (35) were presented during training. Vonk and MacDonald found that while a gorilla subject also encountered some difficulty with the intermediate level of abstraction (2002), orangutans performed well with this level, acquiring the discrimination quickly and showing above-chance transfer immediately (2004). Both species of ape showed positive transfer to novel images on the most abstract problems. These results suggest that animals may rely on between-category dissimilarity to perform the tasks. It may be the particular sets of stimuli used, and the degree of shared features within them, that determine acquisition of categories, rather than the conceptual level defined by human-described taxonomic groupings. Vonk et al. (2012) recently tested black bears on the same tasks and found that black bears could also acquire concepts at each level of abstraction. The black bear study provides a direct comparison with the current results from chimpanzees, because the procedure and stimuli were almost identical, except for tailoring the taxonomic groups to those including the subjects’ own species. These sorts of comparisons are important, because bears are grossly understudied in relation to nonhuman primates, despite their large relative brain size. Bears are an interesting test case for hypotheses about the importance of a social lifestyle for complex cognition, given that they lead relatively solitary lives. The bears showed transfer comparable to that shown by the great apes tested previously, calling into question the assumption that social living predicts cognitive complexity, such as the capacity for abstraction, at least in carnivores.

Despite the fact that chimpanzees are by far the most studied of the apes and one of the most highly tested of the nonhuman primates, relatively little experimental work has addressed natural category discrimination in this species. The few studies that have been conducted have explored spontaneous classification of natural categories (Brown & Boysen, 2000; Murai, Kosugi, Tomonaga, Tanaka, Matsuzawa, & Itakura, 2005), classification of functional categories (Tanaka, 1997), or representation of pictures (Parron, Call, & Fagot, 2008; Tanaka, 2007b). With the exception of Tanaka (2001), previous research has not explicitly examined the extent to which chimpanzees focus on perceptual features, rather than overarching concepts, in performing these discriminations. Tanaka (2001) demonstrated that chimpanzees could match images from four different categories, even when the exemplars were perceptually distinct from images used in training. Thus, it was concluded that chimpanzees could categorize items beyond perceptual similarity and, instead, were using concepts to form the basis of their categorization. However, this study used a matching task and presented comparison images from perceptually distinct categories. Tanaka did not explicitly manipulate within- and between-category similarity in his study. Therefore, it is unknown how chimpanzees’ abilities will compare with those of the other species tested so far in tasks where the degree of perceptual similarity between and within categories varies across levels of abstraction.

In the present study, we presented two adult chimpanzees with several concept discrimination tasks that ranged along a continuum of abstraction. Whereas discriminations at the concrete level can be made on the basis of a few shared perceptual features, undermining the conclusion that transfer performance is indicative of true concept formation, discriminations at the more abstract level require the ability to infer a pattern of shared features between exemplars, rather than relying on the presence or absence of single features. If an organism shows transfer at the concrete level only, it is likely that it relies on perceptual similarity between novel exemplars and training exemplars, rather than knowledge of an overarching concept. However, if organisms show transfer to novel stimuli that share few features with training stimuli, as with the more abstract problems, there is better evidence for formation of the natural category. Although it is possible to argue that, at the most abstract level discrimination of animals versus nonanimals, subjects are simply looking for features such as eyes, limbs, and so on, we attempted to use images in both the reinforced and nonreinforced categories that were not fully predictive on the basis of such features. We examined the features of photos that were chosen at high levels and photos that were chosen at low levels in an attempt to identify features used by the subjects. We also examined whether accuracy was affected by particular species’ comparisons in some of the discriminations.

Method

Subjects

Two adult male chimpanzees, Joe and Bonzo, were tested. They were approximately 16 and 12 and were experimentally naïve when the experiments began. Bonzo had been raised with several other chimpanzees, including his mother, before his move to the Mobile Zoo. Joe had worked in entertainment as a juvenile but had lived alone at the zoo for many years prior to the onset of the research. Joe and Bonzo were housed separately, except for a period of 2 months when an attempt was made to introduce them. The research took place in an off-exhibit area of the chimpanzees’ enclosure at the Mobile Zoo in Wilmer, AL. Testing of the animals complied with the institutional animal care and use review board, and the zoo was compliant with USDA regulations.

Materials

A durable Panasonic Toughbook laptop computer and 21-in. capacitative touch screen monitor welded inside a hydraulic lift cart made up the experimental apparatus. Experiments were programmed using RealBasic 2006 for Windows. Stimuli consisted of two-dimensional photographs approximately 400 × 600 downloaded from FOTOSEARCH.com and edited using Adobe Photoshop CS2. The background of the images was natural and not standardized, such that the images included both the target object and the natural environment in which the image was taken. A sample pair of images from each discrimination is presented in Fig. 1.

Fig. 1
figure 1

Sample images from both correct and incorrect categories for each discrimination presented to the chimpanzees

An incorrect response was followed by an unpleasant buzzer tone and a brief time-out with a blank black screen. Correct responses were followed by a pleasant tone and a blank white screen and were paired with food reinforcement, which consisted of portions of the chimpanzees’ regular zoo diet (fruits, vegetables) and special treats, such as honey roasted peanuts, banana pellets, dried banana chips, yogurt covered raisins, sugar-free wafer cookies, and so on. Food was presented by hand.

Procedure

Subjects were tested in the indoor area of their home cages and could move freely in their home cages throughout testing sessions. Thus, participation was entirely voluntary. The computer cart was pushed up against the bars of the cage separating the human experimenter from the chimpanzee, allowing the chimpanzee access to the touch screen monitor. The chimpanzees had been trained on a two-choice discrimination where they selected images of either supermodels (Joe) or “planet of the apes” characters (Bonzo), when paired with each other, until they reached a criterion of 80 % on four consecutive 20-trial sessions without any guidance from the experimenter. Initial sessions focused on training the chimpanzees to touch an image and then to touch images from only the “correct category.” Joe required only 18 sessions to reach this criterion on the training set, while Bonzo required 62 sessions. However, Joe’s first transfer session was 65 % correct, whereas Bonzo’s was 75 % correct. Both chimpanzees were required to meet criterion on the transfer set before moving into testing. They required 16 and 26 sessions to again meet criterion.

During testing, the experimenter was centered behind the computer cart observing the chimpanzees’ responses on the laptop monitor, which was positioned directly behind the touch screen monitor and encased in a protective covering. The experimenter did not provide any cues and could not see the chimpanzees’ faces or hands during the trial. The experimenter presented the chimpanzee with a food reward, as described above, immediately following a correct response at a consistent location. Trials continued automatically until the end of a session.

This experiment followed the same two-choice discrimination procedure used by Vonk and MacDonald (2002, 2004). All subjects were rewarded for selecting images belonging to particular categories and were not rewarded for selecting images not belonging to those categories. The concrete level involved selecting photos of the members of the subjects’ own species (chimpanzees), while not selecting photos of humans. Joe was rewarded for selecting images of humans, while Bonzo was rewarded for selecting images of chimpanzees. To test whether they could learn another concrete-level discrimination where they weren’t simply biased toward choosing members of their own (or most familiar) species, they were also presented with a discrimination in which they were reinforced for selecting images of orangutans and not reinforced for selecting images of apes of any other species, including chimpanzees. Thus the S− category included both images of species that belonged to the previously reinforced categories (Bonzo) or nonreinforced categories (Joe) and those that looked more similar to chimpanzees (e.g., gorillas and bonobos). This test constituted the concrete/intermediate level of discrimination, because the S− category was more diverse in terms of its exemplars than the S+ category. As in the prior studies, at an intermediate/abstract level of abstraction, animals were reinforced for selecting photos of their own taxonomic groups (e.g., primates), while not selecting photos of animals from other taxonomic groups (a range of birds, reptiles, fish, insects, cetaceans, carnivores, rodents, etc.). In order to equate the level of abstraction and inclusiveness of both the S+ and S− categories, the chimpanzees were also presented with an intermediate-level discrimination of carnivores versus hoofstock in which both S+ and S− categories were equally cohesive, inclusive, and intermediate in level of abstraction. Joe was reinforced for selecting photos of carnivores, whereas Bonzo was rewarded for selecting photos of hoofstock. At the most abstract level, the chimpanzees were rewarded for selecting photos of animals versus nonanimals. The animal photos also included images of birds, reptiles, insects, amphibians, fish, and mammals. Nonanimal photos included images of landscapes, inanimate man-made objects, clothes, and food items.

Each session involved 20 trials, such that there were 40 images used in each photo set for each discrimination: 20 S+ and 20 S− images. There were three sets of images used for each level of discrimination. Care was taken to select images that were visually distinct within each of the category levels. Within the chimpanzee images, for example, the chimpanzee images included both young and old chimpanzees and close-up and distant images, as well as images that showed only the faces of chimpanzees as well as the entire bodies. Within the carnivore images, there was a mixture of canines, felines, bears, hyenas, and so on, to create a diverse group of images and species within that category. Most images were of individuals, but some showed pairs or groups of animals. The same was true for all other image sets.

Side location of the correct stimulus was counterbalanced within testing sessions. Images were randomly paired and presented on each trial. No image was repeated within a session. Subjects were reinforced for every correct response. The number of sessions presented on a given day varied as a function of the subject’s attention to the task, as did inter-trial intervals which were typically less than a few seconds. If the animal was incorrect, the black screen that informed the experimenter of the response was presented briefly, and then the next trial commenced immediately. If the subject was correct, the animal waited for the experimenter to offer a food reward before responding on the next trial. There was a period of 750 ms after the next trial images were presented before a subject could make a response, so that we could ensure that they did not simply initiate a response without viewing the images. Subjects were presented with 4–12 sessions on a given test day and were tested 2 or 3 days a week over a period of several years. Subjects continued to work with a particular stimulus set until a criterion of 80 % correct or more on a session was achieved for four consecutive sessions, or 90 % or more correct was reached on two consecutive sessions. At that time, a novel stimulus set of all new photographs depicting the same category discrimination was presented. At least two sessions of transfer images were always presented immediately following criterion on a previous set, on the same test day, such that there were no gaps in time between reaching criterion on one set and being presented with the relevant transfer images. When the chimpanzees completed the final set of images within a particular discrimination, they moved on to the next discrimination. Joe progressed from the most concrete to the more abstract problems, as did the apes tested previously (Vonk & MacDonald, 2002, 2004), but Bonzo was presented with the concrete, concrete/intermediate, abstract, abstract/intermediate, and then intermediate/intermediate levels of discrimination. This method was adopted to mitigate against the possibility that abstract discriminations would be acquired more readily because of increased exposure or training in the task.

Bonzo passed away after the completion of the primary discriminations; he was therefore unable to complete control tests.Footnote 1 The control tests were presented at the end of testing in order to determine whether Joe used memory or concepts to achieve transfer. It was expected that he might perform poorly on sets of images that contained a mixture of previously seen images from all prior discriminations (mixed) if coherent category structures aided in task performance. Thus, two sets of stimuli were created. The S+ set contained images that he had previously been reinforced for choosing, with several images randomly chosen from each set of photos at each level of discrimination. That is, the images included some humans, apes, primates, carnivores, and other animals. The S− set contained images from the previous sets that had not been reinforced, such as chimpanzees, apes, nonprimates, hoofstock, and nonanimals. Because of the overlapping nature of the categories, some of the same species appeared in both S− and S+ sets, making it difficult to make the discrimination on the basis of categories. However, if Joe had simply memorized which photos had been correct or incorrect previously, he would be expected to do well at this task initially. If he was using coherent categories, rather than memory for specific images, to perform the tasks, one might expect a decrement in performance when the mixed images were first presented.

As a further test of this hypothesis, a set of images was also composed that contained novel images belonging to the same categories that had been previously reinforced (random S+). That is, images were used that depicted animals from the same categories as those previously reinforced (humans, orangutans, carnivores, primates, and animals for the S+ set and the opposite categories for the S− set). However, none of the images had been presented previously, so Joe had no reinforcement history with the specific photos used in this task. If Joe was using memory for specific images, rather than category membership, to dictate choices, he should perform randomly on first presentation of this set of images. As with the mixed test, because of the overlapping categories, members of the same category could be included in both S+ and S− sets. For example, carnivores could belong to both the S+ and S− sets because they belonged to the category carnivore (correct at the intermediate level) but also to nonprimates (incorrect at the intermediate/abstract level). Some species could belong to an S− category but be included as S+ because they were animals. Therefore, even if Joe had memorized which categories, rather than images, were “correct,” it would be difficult to be above chance on this discrimination initially. Critically, if the chimpanzee required more sessions to reach criterion on this category, relative to the coherent categories, some evidence would have been established for the use of a coherent categorical set on the testing problems.

Analyses

At each level of discrimination, each chimpanzee’s individual performance on only the very first session with novel photos was compared with chance (50 %), using binomial tests. These results appear in Table 1. Nakagawa (2004) noted that there is a much greater probability of making a type ΙΙ versus a type I statistical error in the field of animal behavior and behavioral ecology as indicated by a meta-analysis on statistical power (Jennions & Møller, 2003). Similarly, in our study, it was considered more imperative to avoid the risk of falsely concluding that animals do not acquire concepts, when in fact they do (type ΙΙ error), relative to an error in which we attributed concept learning to an animal that did not really acquire a concept. This was the case because there is little power to detect such effects in small sample sizes and adopting a rigorous p value would substantially diminish our ability to detect any sign of concept learning (see also Crabbe, Wahlsten, & Dudek, 1999; Field & Armenakis, 1974; Nakagawa, 2004; Perneger, 1998). Therefore, we did not apply corrections for the number of comparisons.

Table 1 Table 1 presents percentage correct on the first session of each set of photos, arranged by discrimination task, for each chimpanzee, along with p values given by binomial tests comparing performance (number of trials correct out of 20) with chance (50 %), and number of sessions to reach criterion

Furthermore, we examined information from multiple aspects of performance in order to further ascertain whether the subject was likely using a general concept versus performing randomly or by memory. The number of sessions taken to reach criterion with each set of photos also appears in Table 1. If a subject has acquired a concept, one expects to see above-chance levels of performance at first or, at least, by second transfer (sets 2 and 3 and 4) with each discrimination, whereas he should not be above chance on the first session with the training data (set 1), because this finding would indicate a spontaneous untrained preference for images belonging to that category and would not speak to acquisition of a learned category. In other words, the subjects should not know a priori, without feedback, which categories are “correct” without experience at the task, even if they are spontaneously able to discriminate the categories (Brown & Boysen, 2000; Murai et al., 2005). One should also see that the sessions required to reach criterion should decrease with each set of photos depicting the same category discrimination, if the subject has indeed acquired the overarching concept being tested. Thus, examining both criterion performance against p values in binomial tests and the number of sessions required to reach criterion with each set of photos should jointly contribute to the conclusions as to whether individual subjects have indeed acquired concepts. As Crabbe et al. (1999) argued on the Web supplemental to their article, an astute reader will be able to determine for themselves whether these significance levels are indeed critical or whether they should be dismissed as chance. However, to further support the decision to retain the current alpha level, Field and Armenakis (1974) showed that seven significant effects at an alpha of .05, given 30 comparisons, yields only a .0006 chance of obtaining a significant difference by chance. Thus, the risk of obtaining these results (7 significant comparisons out of 32) by chance alone is not substantial.

Results

Repeated measures ANOVAs were conducted to determine whether performance differed as a function of level of abstraction (concrete, concrete/intermediate, intermediate, intermediate/abstract, abstract) and set (training, transfer 1, transfer 2). If the concept has been acquired, fewer trials to reach criterion should be required with each subsequent set. For Bonzo, there were no significant effects, but for Joe, there was a significant effect of both levels, F(3, 6) = 17.99, and set, F(2, 6) = 21.11, both ps < .01. Joe required fewer trials to reach criterion with each set. Furthermore, he required more trials to reach criterion with each increasing level of abstraction. Thus, his performance did not improve with familiarity with animal photos or the task itself.

Concrete level: chimpanzees versus humans

There was considerable variance in how many sessions were required to acquire the concrete-level discrimination of chimpanzees versus humans (see Table 1). At an individual level, binomial tests revealed that only Joe showed significant positive transfer, but only with the second set of transfer photographs (binomial test, p = .04). Both chimpanzees required fewer sessions to reach criterion with transfer relative to training images.

Concrete/intermediate level: orangutans versus other apes

The chimpanzees encountered greater difficulty in selecting images of orangutans over a mixture of other apes. Joe showed initially high levels of performance on the first session with the first two sets of photos. This could be because, at the concrete level, he was reinforced for not selecting photographs of chimpanzees, which were now included in the S− set, whereas Bonzo had previously been reinforced for selecting photos of chimpanzees, which would make this a near reverse discrimination for him. However, it is interesting to note that Joe actually chose chimpanzee photos at a higher rate than Bonzo did on the first session, despite their reinforcement histories (see Table 2). Bonzo was below chance on the first session with the first set but ultimately achieved 70 % correct on the first session with the third set of photos (this is still at chance; binomial test, p = .12). Thus, he may have been learning the concept of orangutan or learned which feature to attend to in this task (perhaps coloration, given that only orangutans were reddish/orange).

Table 2 Means (with standard deviations in parentheses) for percentage of choices of images of different species (or object types) within the sets: other apes (from orangutans/other apes), primates, nonprimates, animals, and nonanimals for each subject, with choices depicted for the first session and across all sessions (overall)

Intermediate level: hoofstock versus carnivores

Joe performed above chance on the first session of carnivores versus hoofstock, indicating that he had an untrained preference for photos of carnivores. In fact, we had noticed this bias during another, unrelated experiment (Vonk & Mosteller, in press). Bonzo seemed to have this same bias, which made it difficult for him to perform this discrimination, because he was rewarded for choosing photos of hoofstock instead of carnivores. Bonzo required many sessions to reach criterion with each set of photographs and was not above chance at transfer (binomial tests, both ps = .26). These results suggest that the chimpanzees simply preferred photos of carnivores (see also Table 2) but did not acquire an overarching concept or learn the perceptual discrimination on the basis of reinforcement. Whereas the results suggest that they perceived the difference between carnivores and hoofstock, it is difficult to know whether they represented the differences as indicative of a conceptual category, given that they did not learn to apply their knowledge operationally in the task. Previous studies have also revealed chimpanzees’ preferences for touching photos belonging to a particular category; typically, the preference is for selecting photos of humans (Tanaka, 2003, 2007a) or food (Parron et al., 2008).

Intermediate/abstract level: primates versus nonprimates

Both chimpanzees required a large number of sessions to reach criterion with each of the sets of photographs in this discrimination. However, Joe showed significant positive transfer on the second set of transfer photos (binomial tests, p = .04) and required fewer sessions to reach criterion with this last set, indicating that he acquired a concept of primate, although it was not easily acquired. Joe chose previously reinforced images of humans at a high rate but did not choose orangutan photos, which had also been reinforced previously, at a high rate. He also preferred photos of chimpanzees and gorillas, which he had not been reinforced for selecting (Table 2). Joe also erroneously selected carnivores and hoofstock at high rates. Thus, it appears that he chose photos of species that were familiar but not necessarily reinforced. Bonzo showed positive transfer with the first transfer test (p = .01), but he did not transfer to the second transfer set, and he required even more sessions to meet criterion with this last set. Thus, it is not clear that Bonzo acquired a general concept for primate or generalized learning from the images presented in the first to the second set of photos. However, it seems unlikely that generalization from specific images between sets 1 and 2 occurred, since species depicted in both the S+ and S− categories differed quite significantly between the two sets. Although Bonzo preferred images of chimpanzees and bonobos, he also preferred images of humans, which he had not been reinforced for selecting. Thus, he also seemed to prefer images that had been presented previously, although not necessarily reinforced.

Abstract level: animals versus nonanimals

Joe showed positive transfer at the second transfer test (p = .04), but Bonzo did not show positive transfer. Both chimpanzees required a large number of sessions to reach criterion at this level, although, for Joe, the number of sessions required decreased with each subsequent photo set, which is expected if the subject is acquiring the discrimination. These results suggest that the abstract-level discrimination was difficult, but not impossible, for the chimpanzees to acquire. Furthermore, although Joe chose previously presented species like carnivores at a high rate, he also chose reptiles at a high rate. Reptiles are not perceptually similar to any of the categories that had been previously reinforced.

Control tests

Mixed

Only Joe was tested on the control tests, given Bonzo’s unfortunate premature passing. The control tests were presented more than 2 years after Joe had begun testing with the most concrete discriminations. When presented with sets of images that were randomly chosen from the previous discriminations such that a set of 20 S+ and 20 S− images included images from the concrete, intermediate, and abstract discriminations mixed together and randomly paired on each trial within a session, Joe immediately chose the S+ images at above-chance levels, p = .04. The only previously reinforced images that he chose less than 75 % of the time (at 60 % of the time) were both photos of insects. The only incorrect images that he chose more than 25 % of the time were a photo of a pair of bonobos mating (chosen 40 % of the time) and the photo of a face of an anteater that he chose 60 % of the time. Thus, he was highly accurate with almost all photos presented during the control task.

These results can best be explained if Joe had remembered the reinforcement history of the individual images independent of their category membership and selected or chose not to select images on the basis of prior reinforcement. Joe’s memory for the previously reinforced stimuli was impressive given that he received the control test more than 2 years after beginning testing, such that it may have been almost 2 years since he had been tested on some of the images included in this discrimination. Given that finding, the fact that he required only five sessions to reach criterion with these images is an impressive demonstration of memory for the correct images and is consistent with the findings from pigeons that have been shown to remember the reinforcement history of hundreds of images after long retention intervals (Cook, Levison, Gillett, & Blaisdell, 2005; Vaughan & Greene, 1984). His memory appeared superior to that of a black bear similarly tested on these control tests more than 2 years after beginning testing on analogous category discriminations (Vonk et al., 2012). That is, the bear required over 20 sessions to meet criterion and performed at chance levels initially. However, the bear also showed memory for images previously presented in that he was above chance immediately with some of the discriminations that were re-presented 2 years after he began the study. Matsuzawa (2010) has noted the possibility of a trade-off between abstraction and memory, where chimpanzees, and probably other nonhuman animals, are well equipped to remember vast amounts of details over long periods of time, but perhaps they do not process the information to the same level of conceptual depth as do humans.

As we did with the bear, following the control tests, we presented Joe with the discriminations that he had learned earlier (one session with one photo set at each level of abstraction) as a further attempt to assess his memory for the prior tasks. We found that he was above chance on the concrete/intermediate task and both the intermediate and intermediate/abstract tasks (80 % correct on all three tests, p = .006, binomial tests). However, he was not above chance on the most concrete (55 %, p = .41) or the most abstract (65 %, p = .13) tasks. It could be that he was not above chance on the most concrete task because the most time had passed from testing on this discrimination and the retest. Joe’s difficulty with the most abstract test may reflect the difficulty he had with this test initially.

Random

In order to further reinforce the conclusion that Joe was relying on memory to some degree to solve the mixed control test, he was then presented with a set of mixed images that contained stimuli from each of the categories tested in the previous discriminations but were novel images—that is, images that he had not seen before. This task should be difficult, but not impossible, to perform on the basis of recalling prior categories that had previously been reinforced but impossible to perform on the basis of memory for prior stimuli. Joe was at chance on the first session (60 %, p = .50) and required 50 sessions to reach criterion.

Analyses of stimulus features

We examined each subject’s accuracy for each image in each session for all discriminations. We noted the taxonomic classification of the image (e.g., bird, fish, primate, mammal, etc.) in order to conduct analyses similar to those done by Roberts and Mazmanian (1988). We examined the percentage of choices to images depicting animals of particular taxonomic classifications on both the first session and across all sessions for each set of photos within each discrimination. This information is presented in Table 2. We then conducted Friedman tests to compare the rate of selection of images of particular taxonomic groups within each of the following categories: other apes from the orangutan/other-ape task, primates, nonprimates, animals, and nonanimals. These were the categories that included various species that may have been previously presented. We found that there were significant differences for both subjects within each category of stimuli. For other apes, Joe chose bonobos, gorillas, chimpanzees, and gibbons more than he chose siamangs, when they were incorrect, χ 2(4) = 28.84. Bonzo chose gibbons the most often, but siamangs the least often, choosing the other apes at levels in between, χ 2(4) = 37.61. With the primate category, Joe selected humans the most often, which he had been reinforced for selecting previously. He chose orangutans at the lowest rate, which had been reinforced in a previous task. Bonzo also chose orangutans at fairly low rates, but he chose gibbons and prosimians at lower rates, with new world monkeys being his least popular selection. He chose bonobos the most often after chimpanzees and humans, even though he had not been previously reinforced for choosing chimpanzees. Both chose the different species at significantly different rates: Joe, χ 2(8) = 35.57; Bonzo, χ 2(8) = 85.55. They also selected the incorrect nonprimate images at different rates, depending on the species depicted: Joe, χ 2(7) = 31.71; Bonzo, χ 2(7) = 58.66. Bonzo chose birds, insects, and hoofstock the least often. He chose fish and mammals the most often. Joe chose birds and mammals the most often and reptiles, hoofstock, and fish the least often. For the animal category, both Joe and Bonzo selected the images at different rates depending on the taxonomic category, χ 2(7) = 96.68 and 108.83, respectively. Both chimpanzees selected carnivores at high rates, even though Bonzo had not been presented with carnivores previously. Joe, however, chose amphibians and hoofstock at a higher rate than carnivores. Bonzo also chose hoofstock and other mammals at a high rate. Bonzo selected fish, amphibians, and reptiles at the lowest rates, whereas Joe chose fish, birds, reptiles, and insects at low rates. Thus, with the exception of Joe’s preference for amphibians, both subjects chose animals that looked similar to those that had previously been presented, whether reinforced or not. Both Bonzo and Joe chose foods the most often when they were incorrect and did so significantly more than they chose landscapes, toys, and clothing, χ 2(3) = 80.21 and 70.81, respectively. We were not able to analyze differences in selection of animals of different species on the first session only, since there were too few exemplars of each category. We did not have power to detect possible differences between categories (see Table 2).

We classified individual reinforced images as being responded to with a high degree of accuracy (high) if they were deemed correct and were selected 75 % of the time or more or, if they were nonreinforced photos, if they were selected at 25 % or less of the time. We classified images as having been responded to with low accuracy (low) if they were selected less than 25 % of the time (for reinforced images) or more than 75 % of the time (for nonreinforced images). We then had two independent raters who were naive as to the purpose of the study respond to a variety of questions similar to those posed by Roberts and Mazmanian (1988). Raters answered as follows:

For each picture in the folder “XX” please indicate:

  1. (a)

    How much space in the animal picture does the animal(s) take?

  2. (b)

    How colorful is the animal(s)?

  3. (c)

    How much of the animal(s)’ face is shown in the picture?

  4. (d)

    How much of the entire animal(s) is shown?

  5. (e)

    How apparent are eyes in the animal picture?

  6. (f)

    How apparent are limbs in the animal picture?

  7. (g)

    How well does this picture match your ideal picture of a member of that class of animals?

They were instructed to use a 5-point Likert-type rating scale ranging from very little to very much. We conducted bivariate correlations between the raters’ scores for each question to assess reliability. Correlations for all questions were highly significant at p < .001, except for the correlation for the typicality question (g), which was significant at p = .02, indicating a high level of rater reliability for all questions. Pearson r values appear in Table 3.

Table 3 Average scores of rater 1 for high and low stimuli with t statistic for comparison of ratings for high and low photos for each question, with rater correlations (Pearson r)

We paired the ratings for the high photos with those for the low photos for each of the sets of images, considering the correct and incorrect photos separately. We then conducted paired t-tests for each pair of ratings for each question to determine whether high photos were rated differently from low photos. Only ratings regarding prototypicality yielded a significant difference between high and low photos, t(7) = 2.99, p = .02. Interestingly, the raters found the photos that the chimpanzees responded to accurately less typical of the category than the photos that they did not respond to accurately. This finding suggests that chimpanzees did not rely only on perceptual similarity or shared features between exemplars of a category to perform the task. The lack of significant differences for the other variables suggested that raters viewed high and low photos as similar with regard to the questions we asked them (above). This finding also suggests that the chimpanzees were not reliant on particular features, such as size or presence of eyes or limbs, to perform the discriminations.

Discussion

Prior research has suggested that chimpanzees form concepts, rather than relying on perceptual similarity, to make categorical discriminations (Tanaka, 2001). In keeping with this finding, the chimpanzees here did show positive transfer on intermediate and abstract discriminations, in which the exemplars within the categories were perceptually diverse. However, their results were not consistent in that they often showed transfer on one transfer test, but not the other. In addition, although Joe required fewer sessions to reach criterion with subsequent sets of images representing the same category discriminations, the same was not true for Bonzo, calling into question the conclusion that Bonzo had acquired overarching concepts. It is possible that, although different species were depicted in each photo set at the intermediate and abstract levels, the chimpanzees were still able to generalize from the species shown in training to those shown in transfer and were using feature similarity, rather than general concepts, to aid in their choices. However, human raters found that the chimpanzees were not more accurate with exemplars that were more typical of their category. Indeed, they were less accurate with more typical category members! Performance on the control tasks and failure to transfer in some instances suggest that memory for individual images was a factor and may have been used initially to reach criterion on the training sets.

To further reinforce the idea that memory facilitated responding in these tasks, we found that the chimpanzees were more likely to select photographs of species that they had previously been exposed to—regardless of whether they had been reinforced for choosing these photos. Images that appeared more similar to those previously seen were selected more often. In the nonanimal category, they appeared to prefer to select food images, which may have been more familiar to them, unlike some of the other artificial objects depicted. This finding indicates a role for both familiarity and perceptual similarity underlying categorization.

Although Joe clearly outperformed Bonzo, Bonzo showed positive transfer with the first transfer test at the intermediate/abstract level. It is possible that this performance was obtained through generalization of perceptual features from specific images that were reinforced from the training set of photos, as stated above. However, this may not be a likely explanation given that, with the more abstract category discriminations, the images used in each set were so unique. For instance, images used in the abstract category set animal may have included a frog, turtle, blue jay, whale, horse, siamang, Persian cat, wolf, and so on in training and then may have included a snake, salamander, owl, dugong, cow, donkey, fox, panda, roach, and so on in the second set of (transfer) images. This is also true for the intermediate-level discriminations such as primate/nonprimate. There were fewer images across photo sets that were similar and that would have allowed transfer on the basis of perceptual features alone as the discriminations became more abstract. Even when the same species were depicted in multiple photo sets, the images themselves differed in size of the animal, orientation, coloration, age, and so on. Although both chimpanzees did prefer to select species of primates that they had been presented with before, they also preferred some species they had not been presented with, such as old world monkeys (Bonzo) and new world monkeys (Joe). We did not find a significant difference in photos chosen at high or low rates as a function of color, size, presence of eyes, or facial features.

However, it is the case that Joe required more sessions to reach criterion as the problems became more abstract, consistent with the idea that concrete-level discriminations can be more readily acquired—probably because high levels of performance can be achieved by attending to a few readily perceived features, as well as by learning an overarching concept. It is difficult to disentangle the two hypotheses for positive transfer at concrete levels (see also Marsh & MacDonald, 2008). However, at the more abstract level, it is more difficult to use simple physical features to transfer to the novel images, because the transfer sets often depicted species that had not been trained before and were not necessarily physically similar to those previously reinforced. Even with the more abstract sets, it is possible that the subjects used several features that overlapped between sets of images, such as facial features, hair or fur although we did not obtain evidence for the use of obvious features such as the presence of eyes or limbs. In addition, as was noted above, the chimpanzees performed less accurately with more typical exemplars, which suggests that the images they chose more accurately shared fewer features with other category members!

The pattern of findings is somewhat consistent with the results from apes (gorilla and orangutans; Vonk & MacDonald, 2002, 2004) and bears tested previously (Vonk et al., 2012) and with the findings of Roberts and Mazmanian (1988), who found that pigeons and squirrel monkeys could eventually learn the most abstract, but not an intermediate-level, discrimination. The chimpanzees took a long time to learn both the intermediate- and abstract-level discriminations. Intermediate discriminations were learned rapidly by the orangutan subjects (Vonk & MacDonald, 2004), but not by the gorilla (Vonk & MacDonald, 2002). Intermediate-level concepts correspond to the basic-level concepts that are arguably acquired first by young human children (Mervis & Rosch, 1981; Rosch et al., 1976; although see Mandler, 2000). It is of interest that only orangutans demonstrated superior performance at this level. Perhaps the orangutans simply preferred photos of primates to those of nonprimates. The chimpanzees were presented with an additional intermediate-level task: carnivores versus hoofstock.

In addition, the chimpanzees, on average, required more trials to reach criterion than did the gorilla, orangutans, and bears tested previously (Vonk & MacDonald, 2002, 2004; Vonk et al., 2012). Whereas the bears, on average, required 22–33 sessions to reach criterion on the training sets across the different tasks, the chimpanzees required, on average, 36–72 sessions to reach the same criterion. Whereas the orangutans and gorilla received 10 trials within a session (thus, fewer sessions to criterion also means many fewer trials to criterion), the bears and chimpanzees received 20-trial sessions, which, in most cases included the exact same photographs. Also of note is the fact that the orangutans and gorilla were required to meet a criterion of only two consecutive sessions at 80 % (8/10 correct choices) before moving on to a novel set of images, whereas the bears and chimpanzees were required to meet a more stringent criterion of four consecutive sessions at 80 % correct (16/20), an average of 87.5 % correct across four consecutive sessions, or 90 % correct for two consecutive sessions. Had the less stringent criterion been adopted here as well, both bears and chimpanzees would have required fewer sessions to reach criterion. All species were tested on a similar schedule receiving 4–16 sessions a day, 2 or 3 days a week over a period of several years. It is interesting that the chimpanzees did not outperform the bears at the abstract concept formation task, given that it is often assumed that (1) social species and (2) those most closely related to humans should show superior capacities for abstract thought. However, if even the more abstract discriminations are performed by associating a series of features with reward, it may not be that surprising that bears could perform as well or better than chimpanzees. Also, it is possible that the chimpanzees in the present study were not highly motivated to work for food rewards, since they received treats from zoo staff and visitors throughout the day and satiated more quickly than the bears.

Conclusions

These results, in conjunction with prior findings, suggest that the capacity for abstraction is not limited to species most closely related to humans or those that live in complex social groups. Chimpanzees’ performance was inferior to that of black bears tested previously, in terms of both number of trials taken to acquire the discriminations and the level of transfer performance obtained. However, some evidence for abstract concept formation was obtained in that one chimpanzee showed transfer at each level of abstraction and required decreasing number of trials to acquire the discriminations with subsequent photo sets. This chimpanzee also appeared to acquire the most abstract discriminations the most slowly, consistent with the idea that more abstract discriminations, in which exemplars share fewer perceptual features, should be the most difficult for nonverbal organisms to acquire. Within-category variance may have been more important than between-category variance for these chimpanzees (Lazareva et al., 2004; Lazareva et al., 2010). Although Marsh and MacDonald (2008) examined the features used by orangutans to perform concrete level discriminations, the factors underlying more abstract level concepts have yet to be identified. Furthermore, although Marsh and MacDonald’s research indicated that orangutans were reliant on coloration and the presence of eyes to discriminate gorillas or orangutans from other apes, such features are not independent of the overarching category structure, so true concept formation could not be ruled out in their study. In our study, we obtained no evidence that such features controlled performance. We did obtain some evidence suggesting that chimpanzees were more apt to select images that appeared perceptually similar to images that they had previously been presented with, but not necessarily those that they had been reinforced for selecting. Future work is needed to carefully explore nonhumans’ use of specific features, and familiarity, to aid in category discrimination tasks.