Introduction

It has been more than 25 years since the publication that launched the idea of a geometric module used in navigation (Cheng, 1986). The experiments were simple: Rats had to find food in a corner of a rectangular enclosure, a dimly lit box with roof and walls. In one experiment, one wall was white, while the other three walls were black. Distinctive panels filled each corner, with different smells emanating from two of them. In this environment, the rats often made rotational errors, searching at a location that was diagonally opposite to the target location, geometrically speaking a 180° rotation from the target around the center of the space. In a rectangular arena with featural cues on the walls, the rotational error stands in the same geometric relation to the space as the target location does (illustrated in Fig. 1). Patterns of systematic errors showed that the rats used the geometry of the rectangle to look for food, thus narrowing their search from four to two corners. However, the rats failed to further narrow their search from two corners to one, even though visual or olfactory information was available that could have distinguished one congruent corner from another.

Fig. 1
figure 1

An illustration of geometric and featural cues in a hypothetical enclosed rectangular arena with no cues available outside the arena. The overhead view shows the rectangular shape as the geometric cues found in the arena. If the target location is aligned solely with reference to the geometric cues, the rotational error is an equivalently good match, both near a corner having a short wall to the left and a long wall to the right. The walls are shown as different in textures and colors, symbolized by the different fill patterns. These provide featural cues. Matching the target with respect to featural cues would result in a unique match at the target location. The features are wrong at the rotational error

This surprising failure gave rise to the proposal of an encapsulated geometric module (Cheng, 1986; Gallistel, 1990). Intuitively and mathematically, geometry concerns spatial relations between points or collections of points, such as lines. The fact that a point is at a certain distance from a second point is a geometric property, a relation between points qua points. Other nonspatial properties of points have been called nongeometric, or featural. The color of the point, the smell emanating from it, and the texture of how it feels are all examples of featural properties. Given this definition, a typical isolated object such as a bush contains geometric properties in its shape, and also in the fact that it stands at a certain distance and direction from a second bush. In addition, a bush contains nongeometric or featural properties, such as its green color, the odors emanating from it, and the feel of its bark. Thus, discrete landmarks and extended surfaces or boundaries can contain both geometric and featural properties. Gallistel argued that the modularity model makes evolutionary sense, because geometric properties are likely to remain stable in the face of changes in featural properties. For example, trees retain their approximate shape and spatial relations to other trees, while they may drop leaves and be covered with snow.

In the last 2 decades, the “geometry” research enterprise has bloomed, with many variations of experiments conducted on a range of species, from humans of various ages (e.g., Hermer & Spelke, 1994, 1996; Newcombe, Ratliff, Shallcross, & Twyman, 2010; Sturz, Gurley, & Bodily, 2011) to an insect, the rain-forest ant Gigantiops destructor (Wystrach & Beugnon, 2009; Wystrach, Cheng, Sosa, & Beugnon, 2011). Questions have arisen as to when the pattern of results reported by Cheng (1986) is and is not obtained and what the pattern of results means. Some of the findings have shown that the original formulation of the geometric module is in need of revisions. For example, in some cases, geometric cues are not learned, even when they are good predictors of the target location (rats, M. Graham, Good, McGregor, & Pearce, 2006; toddlers, Lew, Gibbons, Murphy, & Bremner, 2010). And in other cases, featural cues are learned and used (pigeons, Kelly, Spetch, & Heth, 1998; toddlers, Learmonth, Nadel, & Newcombe, 2002; Learmonth, Newcombe, & Huttenlocher, 2001).

The purpose of this article is not a thorough review of the geometry literature. Reviews on the topic are plentiful, with a major review in 2005 (Cheng & Newcombe, 2005) and updates since that date (Cheng, 2008; Twyman & Newcombe, 2010; Vallortigara, 2009). Rather, we seek to review the range of explanatory frameworks proposed after the Cheng and Newcombe review. Given the continuing impact of this line of research, the 25-year mark makes a good time for a theoretical reckoning. Issues regarding modularity of mind and the evolution of intelligence across the animal kingdom give this case much general interest. In this article, we consider five key theoretical approaches and, in the course of reviewing each approach, five key issues. In each section, we first present the theoretical approach and then offer a critique in the light of its associated issue and an assessment of its current strengths and weaknesses.

The first approach we take up is modularity theory itself. Because the original proposal made by Cheng (1986) is untenable in its strictest form (Cheng, 2008; Twyman & Newcombe, 2010), we focus on considering a revised modular proposal that has been recently put forward (Lee & Spelke, 2010a, 2010b). In this context, we discuss issues concerning the basic distinction between geometry and features. In the artificial situation in the original experiments, the distinction is unproblematic, but in the natural environments in which various species have evolved and lived, the distinction is controversial, especially given recent data.

Second, we review the view-matching approach, a class of theories that has been successfully applied to insect navigation. View matching has recently been proposed to explain rotational errors as well (Stürzl, Cheung, Cheng, & Zeil, 2008). In the context of considering this proposal, we also review the issue of cross-species commonalities. An appealing aspect of the geometry literature has been its use of the same paradigm with a wide variety of species. However, we believe that there may be some degree of species specificity in the use of geometric information.

Third, we review an associative theory proposed for the learning of geometric and featural cues (Miller & Shettleworth, 2007, 2008), based on the Rescorla–Wagner model of classical conditioning (Rescorla & Wagner, 1972) and adapted for spatial learning. In this context, we focus on phenomena of cue competition. Experiments done on cue competition at first seemed to support modularity, in that absence of competition effects suggested independence of geometric and featural information. However, on further study, it turned out that cue competition or even cue facilitation are sometimes observed. Any successful theory of reorientation must explain these facts.

Fourth, we take up adaptive combination theory (Newcombe & Huttenlocher, 2006). In this context, we focus on discussing development (although previous sections also touch on findings from experiments with children). Age-related changes in how children behave in the geometric module paradigm must be explained by any successful theory. The adaptive combination approach suggests that geometric and featural cues both are encoded, given sufficient perceptual salience, and are used flexibly, according to which cues have proven most useful. In this approach, behavior is affected by experience, both in the short and in the long term.

Fifth, we examine approaches that build on data concerning the neural substrates of navigation. Relating what is known at the neural level to the behavioral findings could potentially provide insight into a plausible theoretical model. One approach is a two-factor hippocampal-striatal theory (Doeller & Burgess, 2008; Doeller, King, & Burgess, 2008). This theory proposes two fundamentally different ways of encoding spatial information, using different neuropsychological mechanisms instantiated in different brain regions. Encoding locations with respect to boundaries is said to differ from encoding locations with respect to isolated landmarks. Another approach is a neurally based two-factor computational theory (which also builds on some ideas of view matching), proposed by Sheynikhovich, Chavarriaga, Strösslin, Arleo, and Gerstner (2009).

Modularity theory

We have already presented the basic elements of the original version of modularity theory as presented in Cheng (1986) and Gallistel (1990). Hermer and Spelke (1994, 1996) endorsed this basic picture and added data from human children. They found that, as with rats, geometric information was used to reorient but featural cues, such as the color or texture of walls, were ignored, and thus they claimed that the search of very young children was guided by the innately available, automatically functioning encapsulated module previously described. They added further to the basic modularity approach by showing that humans do use features from the age of about 6 years on, and they suggested that the combinatorial properties of human language are responsible for this transition.

Given subsequent work (reviewed by Cheng & Newcombe, 2005; Twyman & Newcombe, 2010), it is clear that it is necessary to modify the strong claim that geometric cues are essential for reorientation and that these cues provide the only information used to find a hidden object after loss of bearings. Current investigators who seek to maintain a modular model to cover the range of findings on navigation by animals and young humans generally add ancillary processes or make new distinctions. Spelke and her collaborators (e.g., Lee & Spelke, 2010a; Spelke, Lee, & Izard, 2010) have presented a reconceptualization.

Lee and Spelke proposed two separate systems of navigation. One spatial system, the geometric module, represents large-scale surface layouts, and the other, a landmark system, represents small movable objects. The distinction was prompted by data showing that children often had trouble using the geometric relations of isolated landmarks (Gouteux & Spelke, 2001; Lee & Spelke, 2010b). Lee and Spelke (2010b) manipulated whether objects (landmarks) were against the enclosing surfaces (walls) of the arena. When the landmarks were isolated, children failed to use the geometric properties of the array to reorient. But when the landmarks were flush against the wall, thus interpretable as surface geometry, they succeeded.

The geometric module is said to contain information important for navigation—namely, distance and direction, but not angle or length. The first finding prompting this distinction came from Hupbach and Nadel (2005), who found that 2- and 3-year-olds failed to learn a target location in a rhombic arena, whether a distinctive featural cue was added or not. Because two of the angles in a rhombus are acute and two angles are obtuse, this finding suggested a difficulty in using angle. Although Lee, Sovrano, and Spelke (2012) found that toddlers can perform above chance in a rhombus fully enclosed by walls, in a more definitive paradigm that isolated angular information from distance information completely, they confirmed that angles are not used by toddlers. Therefore, the modified modularity theory posits that the landmark system contains information about angle, which is important for identifying objects, but that the geometric system does not include angular information. On the other hand, the landmark system does not contain sense and is thus susceptible to confusing mirror reflections of objects, whereas the geometric system does contain sense.

Each system thus encodes some, but not all, of the three fundamental geometric properties of space (distance, angle, and direction). “Uniquely human symbolic systems” (Spelke et al., 2010, p. 863) can then construct a full Euclidian geometry. In this view, the coding of large spaces corresponds to some, but not all, basic geometric intuitions in the innate module, present in humans without instruction. In contrast, processing of properties like angle is not automatic but must be acquired through the use of symbol systems—notably, language.

Subsequent to the two articles published in 2010, two further restrictions on the meaning of geometry became evident. Lee et al. (2012) found that the relative lengths of isolated walls are not used to locate hidden objects but that, instead, the distances of the walls are important, perhaps based neurophysiologically on boundary vector cells (discussed further below; Burgess, Jackson, Hartley, & O’Keefe, 2000; Hartley, Burgess, Lever, Caccuci, & O’Keefe, 2000; Lever, Burton, Jeewajee, O’Keefe, & Burgess, 2009; see also Hartley, Trinkler, & Burgess, 2004, for some relevant human behavioral data). These cells (found in rats) fire most when the animal is at a particular distance from a surface. Thus, even sense seems to have a very specialized meaning—namely, farther is left/right of closer, but not longer is left/right of shorter. In addition, Lew et al. (2010) showed that geometric processes are involved only when an enclosure is symmetrical. However, on the basis of learning principles, geometric cues should be more useful, not less useful, in an asymmetric space where they specify a location unambiguously.

Critique

In the natural world, spaces totally (or even partially) enclosed by extended surfaces are rare. Open spaces are filled with prominent objects, such as trees, and extended boundaries that do not enclose, such as the shores of a river (Sutton, 2009). From this perspective, organisms should be able to encode and use geometric relations among separated landmarks. In addition, according to Gallistel’s (1990) definition of geometry, they should be able to use lengths and angles, as well as distances and directions; all of this should be observed, irrespective of the symmetry of the configuration. And yet, these predictions have not been confirmed (Gouteux & Spelke, 2001; Hupbach & Nadel, 2005; Lee et al., 2012; Lew et al., 2010), leading to the recent reformulations of the geometric module with revised definitions of geometry.

However, these results (and the reformulation) are puzzling from an evolutionary perspective because the geometry of isolated objects (e.g., trees) and their geometric arrangement are good cues for navigation, because the angles formed by surfaces make excellent cues in the natural world, and because the natural world is full of this kind of information (Sutton, 2009). Gallistel (1990) argued that the stability of geometric properties, in contrast to featural properties, makes them attractive cues for reorientation. But if stability and reliability are desiderata, then surely the geometric arrangement of isolated objects (such as trees), angles formed by both surfaces and individual objects, and asymmetric geometry all possess the desired characteristics. The redefined geometric module and the results the redefinitions are meant to explain leave a gaping evolutionary puzzle, in contradiction to Gallistel’s arguments. A geometric system without angles or lengths applying only to symmetric spaces composed of extended surfaces erodes a big chunk of the meaning of geometry. Overall, we believe that the basic distinction between geometry and nongeometric features has lost its original crispness and now is not clearly linked to arguments of evolutionary advantage.

Aside from the definitional issues of whether the geometric module is truly geometric and whether it is truly adaptive, there are also a number of empirical problems. First, even the revised modularity theory does not deal with evidence that has been around for over a decade, showing that use of geometry versus features varies with the size of an enclosed space. While geometric cues generally account for search in small enclosures, featural cues are also used in large enclosures and are even preferred in conflict tests (children, Learmonth et al., 2002; Learmonth et al., 2001; Learmonth, Newcombe, Sheridan, & Jones, 2008; adults, Ratliff & Newcombe, 2008; fish, Sovrano, Bisazza, & Vallortigara, 2007; chicks, Sovrano, Bisazza, & Vallortigara, 2005). One possible explanation for effects of enclosure size is that, for smaller enclosures, a greater portion of the space can be seen from a single viewing position, increasing the likelihood that the overall shape of a space will be encoded, so that its geometry is revealed, and can be used in processing (Sovrano & Vallortigara, 2006). However, this explanation assumes that the length of the walls is encoded, and Lee et al. (2012) found that the lengths were not used but, rather, the distance of the walls from the center of the space and/or from each other. Other factors, supported in work by Newcombe et al. (2010), are that larger spaces allow for more action and exploration than do smaller ones and that larger spaces involve surfaces at greater distances from the observer. Objects that can be walked around (those in the space not against the wall) may not make good directional cues. More distal landmarks are better landmarks for directional reorientation than are more proximal ones (Lew, 2011). Neurophysiologically, it may be that objects that are too near fail to engage head direction cells that are thought to be used for determining heading (Cressant, Muller, & Poucet, 1997; see Jeffery, 2010, for a review). In this light, it would be worth replicating Lee et al.’s study in a larger space than they used, which was under 3 m in dimensions. Whatever the explanation for the room size effect, it does not seem easy to square with modularity theory of either the traditional or the revised kind, and it sets yet another limit on the usefulness of the construct in a natural world in which large spaces are more common than very small ones.

Second, not only are there cases when geometric cues fail to be engaged in reorientation tasks, but also there are cases when nongeometric features are engaged in reorientation tasks. For example, Huttenlocher and Lourenco (2007) used a square enclosure that lacked geometric cues but had featural cues on the walls. In this task, the features on alternating walls were ordered (or scalar) cues (e.g., small figures vs. large figures) or unordered cues (e.g., blue figures vs. red figures). The enclosure had identical opposite corners, as in geometry tasks. With scalar cues, search by toddlers of 18–24 months was above chance at both the hiding corner and the geometrically identical opposite corner, although with nonscalar cues, search was random. Lourenco, Addy, and Huttenlocher (2009) provide further support for these findings, and Twyman, Newcombe, and Gould (2009) have shown analogous results for mice. These findings undermine the idea that geometric cues are necessary for success.

The findings with scalar cues strongly suggest that reorientation is explained not by a geometric module but, rather, by a more general principle: the ability to make use of ordered cues, with geometric cues of lengths of walls being only one example. For reorientation tasks with ordered cues, a left–right pattern of cues is systematically mapped on to a viewer-centric sense of left and right. The relation is directional—for example, greater value to the left side and lesser value to the right side. For nonscalar cues, in contrast, directionality is lacking, and unordered cues must be mapped on to a viewer-centric sense of left/right. In this case, discriminating between alternative orders is difficult for toddlers.

In rebuttal, Lee and Spelke (2011) suggested that toddlers may interpret scalar cues as geometric—for example, interpreting sizes of objects as a cue for distance, such that large circles produce an illusion of being closer than small circles. The scalar cues would then engage the same innate geometric module. But this modified thesis also runs into trouble, because recent research suggests that toddlers actually can reorient using cues that are both nongeometric and nonscalar. Nardini, Atkinson, and Burgess (2008) showed that toddlers sometimes do reorient with nonscalar categorical cues (blue vs. white), although the effect was weak (which may explain why Huttenlocher and Lourenco [2007] did not find a significant effect in an analogous condition). In addition, Newcombe et al. (2010) showed that 3- and 4-year-olds can use a nonscalar cue to reorient in a circle. Furthermore, research by Lyons, Huttenlocher, and Ratliff (2012), in which scalar variation is controlled, provides more definitive evidence of reorientation with cues that are unordered and vary only categorically—for example, shape (circle vs. diamond)—completely independently of scalar information (e.g., luminosity, size, density). Lyons et al. used a reorientation task in which cues predicted location reliably (unambiguously, 100 % prediction), rather than probabilistically (ambiguously, only 50 % of the time correct), as in standard reorientation tasks. The enclosure had four distinguishable corners, where identical walls were adjoining rather than opposite one another. Toddlers performed above chance on the task, even though only categorical cues were available.

In summary, it seems that the idea of a geometric module, even in its latest formulation, has trouble accounting for extant data, even restricting the corpus to studies on children. Furthermore, the theory has not tackled other phenomena we discuss in later sections, such as variations in when we observe cue competition, independence, or facilitation. While it is possible that further modifications might save the modular model, as “epicycles” to the idea get added, the idea of “geometry” starts to lose flavor. We are, in any case, left with some puzzling data whose functional significance is unclear.

View-matching approach

View-based matching as an approach encompasses a class of models describing how animals might attempt to recover the view at the target as a strategy for navigation (Stürzl & Zeil, 2007; Zeil, Hofmann, & Chahl, 2003). Different models differ in specifying what a view consists of and how matching is achieved. At its simplest, a pixel-by-pixel panoramic matching strategy can be used as proof of concept, to show that a view-based strategy is viable in some situations (Fig. 2a). A panoramic photo in black and white is taken at the target, rendered at a resolution appropriate for the animal being modeled. Each pixel takes on a grayscale value and mismatches between corresponding pixels in the target (remembered) view and the current view drive behavior (Stürzl & Zeil, 2007; Zeil et al., 2003).

Fig. 2
figure 2

View-based matching in the natural open space (a) and in artificial arenas (b, c). a View-based matching by image differences outdoors. A panoramic picture is taken at the reference location. Panoramic pictures taken at other locations shown on the grid and aligned in the same orientation are then compared with the reference picture on a pixel-by-pixel basis. The root mean squares of the pixel differences are plotted on the z-axis. The agent or animal using this system can move in directions that reduce the mismatch, thus descending the mismatch gradient. Within the catchment area that slopes down toward the reference location, this system will find the goal. (From Zeil, J., Hofmann, M. I., & Chahl, J. S., 2003. Reprinted with permission from the Optical Society of America.) b A panoramic view in a rectangular arena in which ants were tested. In the arena, three walls were black while one wall was striped black and white. The picture was taken a little distance in front of the target goal for one group of ants that headed for the intersection of two black walls. The target direction is shown in the center, marked by a dotted line. In the view-based model used for this study, the agent/ant starts off at the center of the arena. It turns until its current view best matches the target view and heads off in that direction. (Reprinted from Wystrach, A., Cheng, K., Sosa, S., & Beugnon, G., 2011). c Results from Wystrach, Cheng, et al. (2011) that are hard to explain with models that explicitly separate geometry and features as separate elements of a representation. The numbers show percentages of choices. In training (left), all exits were rewarded, but this group of ants preferred to exit by one corner at the intersection of two walls, whose view is shown in panel b. On occasional tests (right), the striped wall was shifted from a long wall to a short wall. One corner, at the bottom right corner, possessed the same geometric and local featural cues as the most chosen target, surely the best match for schemes that tally geometric and featural cues for matching. Yet the ants hardly ever chose this corner

When placed back in the space, the model agent or animal tries to recover the target view. A common matching strategy is gradient descent: The agent moves in the direction that lowers the discrepancy between the currently perceived image and the target image. Such pixel-by-pixel view matching is not considered biologically realistic but does result in rotational errors in a rectangular arena under appropriate conditions (Cheung, Stürzl, Zeil, & Cheng, 2008; Stürzl et al., 2008). Intuitively, such errors occur because diagonally opposite locations share similarities in views. The layout of where the “sky” (ceiling) and ground are, and the contour edges, for example, all line up at opposite corners. The strategy does not require any object identification or any distinction between geometry and features, unlike recent computational models created to learn geometry (Dawson, Kelly, Spetch, & Dupuis, 2010; Miller & Shettleworth, 2007, 2008; Ponticorvo & Miglino, 2010).

We believe that the data suggest that view-based matching may characterize the search behavior of ants. A recently studied species in the geometry literature is an ant, Gigantiops destructor, whose natural habitat lies in the rainforests of South America (Wystrach & Beugnon, 2009; Wystrach et al., 2011). Ants are phylogenetically distant from all the other species tested so far in that they are not of the Chordate lineage. Ants have long been models for the study of navigation (for reviews, see Cheng, Narendra, Sommer, & Wehner, 2009; T. S. Collett & Collett, 2002; Ronacher, 2008; Wehner, 2003, 2009). View-based learning has often been stressed in ant navigation and, in fact, in insect navigation more generally (Cartwright & Collett, 1982, 1983). For example, Australian desert ants (Melophorus bagoti) have been shown to use the panoramic skyline for a directional cue (P. Graham & Cheng, 2009a, 2009b). The skyline is a record of the elevations of the tops of terrestrial objects. A model based on a skyline representation has had some success as a basis for navigating to home in the real world of the ant (Baddeley, Graham, Husbands, & Philippides, 2012; Philippides, Baddeley, Cheng, & Graham, 2011).

Using ants with the biggest and most acute eyes, Gigantiops destructor, Wystrach and Beugnon (2009; Wystrach, 2009) performed “geometry” experiments in the lab on ants motivated to home after finding an item of food (a fly). The ants entered a rectangular arena at the center via a tube and had to find one of the corners to exit, after which they were placed back in their nest. When all exits led home (nondifferential conditioning), the ants spontaneously stuck to one of the two diagonals at far above chance levels, but they did not discriminate between the two panels at the opposite ends of the diagonal. When forced to choose one of the corners in order to exit (differential conditioning), however, they readily learned to use the correct features. Interestingly, in differential conditioning, the ants would often start off toward the diagonally opposite corner (the rotational error) but would correct themselves and turn back before trying to enter the blocked (wrong) corner.

The authors interpreted their data in terms of view matching, but these results could just as well be accounted for in terms of the extraction and encoding of geometric and featural cues. However, in further studies on Gigantiops using nondifferential conditioning and including transformations effected on the training setup, Wystrach, Cheng, et al. (2011) found results that were better interpreted in terms of view matching than models that separate geometric and featural cues. The results as a set could not be accounted for by models that separate featural and geometric cues and then match corners on the basis of shared cues (e.g., Dawson et al., 2010; Ponticorvo & Miglino, 2010). A weighting scheme for features and geometry that works for one set of results would not account for another set, leaving large systematic errors.

In the most telling case, three walls were black, while one long wall consisted of black and white stripes (Fig. 2b, c). While every corner provided an exit home, the majority of ants nevertheless stuck stereotypically to one of the four corners (each distinguishable by the configuration of features). Some ants preferred to head to a corner with two black walls intersecting there (e.g., a corner with a long black wall to the right and a short black wall to the left). The ants were then given a test in which the features were displaced, with the stripes being moved to a short wall. This transformation still left one corner with a long black wall to the right and a short black wall to the left. This corner contained the correct geometric properties and the correct local featural properties. By any model that computes over geometric and featural properties, it ought to be the most chosen. Surprisingly, the ants chose it only 3 % of the time, with the vast majority of choices at the diagonally opposite corner that contained a “wrong” feature. Formal modeling that separated geometric and featural properties failed to come close to accounting for this pattern. And yet, their pixel-by-pixel view-matching model did a reasonable job.

The view-matching model tested by Wystrach, Cheng, et al. (2011) is far from a perfect account, although by model selection standards (Akaike information criterion), it did a better job than any model separating geometry and features. Pixel-by-pixel matching is biologically unrealistic, and the use of better visual parameters for matching might produce better results. In line with the rest of the literature on insect navigation, one would certainly be tempted to propose view-based matching as a basis for the “geometry” task in ants, rather than invoke the separate computation and encoding of geometric and featural properties.

Does view matching work for species other than ants? As the enterprise of research on geometry expanded, more and more species have been placed into rectangular arenas and, sometimes, other shapes of spaces, with and without distinguishing features. Thus, it is natural to ask what, if any, species differences may be found. Chicks (Vallortigara, Zanforlin, & Pasti, 1990), pigeons (Kelly et al., 1998), and fish (redtailed splitfins, Sovrano, Bisazza, & Vallortigara, 2002, 2003; goldfish, Vargas, López, Salas, & Thinus-Blanc, 2004) furnished a key datum suggesting species differences: Unlike for rats (Cheng, 1986) and rhesus monkeys (Gouteux, Thinus-Blanc, & Vauclair, 2001), systematic rotational errors were not found in these species (Cheng & Newcombe, 2005). When errors appeared, they were about equally distributed across the wrong corners. Features were readily learned and used. It was thought, however, that the geometry of space was nevertheless learned as well, because when all useful featural cues were removed, either in training or on a test, leaving a uniformly colored rectangle, the birds and fish still managed to search at one of the two geometrically correct corners.

A substantial number of results and publications on reorientation in domestic chicks have now appeared from Vallortigara’s lab (Vallortigara, 2009), heralding substantial advances since Cheng and Newcombe’s (2005) review. Some of the latest results have suggested a view-matching process in chicks (Pecchia & Vallortigara, 2010a, 2010b). The experiments used discrete landmarks to define both geometry and features. These landmarks stood in the middle of an arena, so that the shapes that they defined did not contain continuous walls. In both studies, as in many past studies, chicks had no problem solving the task when a set of distinctive features defined the target location. With identical landmarks making up the corners of a rectangle, however, the chicks failed to learn the task. That is, they failed to choose one diagonal over the other (Pecchia & Vallortigara, 2010a). Another experiment, however, restricted the access to the feeders that also served as landmarks: Only one quadrant of each was open for access (Pecchia & Vallortigara, 2010b). Under these conditions, the chicks managed to learn the “geometry” (Fig. 3). Having a fixed direction of access was crucial. When the direction of access to the feeders varied from trial to trial, the chicks once again failed to learn the task. The authors interpreted the results as supporting view matching. The fixed direction of approach to the feeder allowed the chicks to associate the feeder with a particular view, which was difficult if the chicks could access the feeder from any direction.

Fig. 3
figure 3

An illustration of an experimental set up used by Pecchia and Vallortigara (2010b), whose results were used to support the hypothesis of view-based matching in chicks. Four round feeders were set up in a rectangular array in the middle of a circular arena. Each feeder (little circles with gap) had four openings, three of which were covered by transparent sheets, symbolized by gray bars. Two of the feeders occupying one diagonal were correct for each chick. The side that provided access to the feeder could thus remain fixed from trial to trial or vary at random. When the access direction was fixed, the chicks learned the task. But when the access direction varied from trial to trial, the chicks failed to learn the task

Some other results from chicks could also be interpreted in terms of view matching. Feature learning was interpreted as finding and matching cues common to all views experienced at the target location (Pecchia & Vallortigara, 2010b). Differences in behavior in larger versus smaller arenas were also subjected to a view-matching interpretation (Sovrano & Vallortigara, 2006). In a larger rectangular arena, chicks tended to match the features (and not geometry) when the feature wall was switched (e.g., from a short wall to a long wall). In a smaller arena, in contrast, the chicks were more likely to match the geometry, now at a corner where the featural cues were mismatched. Large and small spaces have different horizontal dimensions but the same height. The chick’s eyes see a large segment of the panorama, but the range falls well short of 360°. From a fixed distance to the target corner, the chicks see more of the geometry in a small arena, where they can take in three corners while viewing from much of the space. In a larger arena, the overall geometry is less apparent, with the chick mostly seeing two walls converging on a corner. Mismatches in features (e.g., which color is on the right and which on the left) become prominent under such circumstances. The results from Pecchia and Vallortigara are intriguing and suggestive of view matching. However, without any formal modeling and comparison of different models that do and do not extract geometry and features, along the lines that Wystrach, Cheng, et al. (2011) treated data from ants, we are less than fully confident that the chicks are view matching. More empirical data and more formal modeling are both needed.

Critique

We do not think that view-matching theory can be taken as a universal explanation for geometry findings. In particular, the theory may not extend to mammals, especially primates. Most notably, results from several experiments on human children suggest that they do not use view-matching strategies, or at least not primarily. One finding is that children 20–24 months of age are able to solve “geometry” tasks when views change radically—for instance, from being inside an enclosure to approaching corners from outside the enclosure (Huttenlocher & Vasilyeva, 2003). After training to locate a corner within an enclosure, looking from the outside would provide a poor match to any of the corners on a view-matching basis, at least on intuitive grounds. Huttenlocher and Vasilyeva suggested a more abstract geometric representation of the space, with geometric properties identifiable inside or outside of the arena.

An explicit attempt has recently been made to show that children can solve a spatial problem that cannot, in principle, be solved by view matching (Nardini, Thomas, Knowland, Braddick, & Atkinson, 2009). In the critical condition, the location of a key landmark had to be inferred because it was hidden from view, behind another landmark. Children 6–8 years of age solved the task, although 4- and 5-year-olds did not. When a distinctive feature allowed the possibility of view matching, children of all ages (4–8 years) solved the task. We reiterate, however, that toddlers could find the correct corner from outside of the enclosure when they learned the target location from inside it (Huttenlocher & Vasilyeva, 2003), suggesting that they too could do more than view matching under some circumstances.

Lee and Spelke (2011) also found evidence, using rectangular spaces, against any view-matching theory. Their experiments showed that subtle geometric cues (very low enclosure walls or subtle curved hills) allowed 38- to 51-month-old children to choose the correct diagonal of the rectangle that the cues defined. But the children failed when given prominent high-contrast visual cues, such as four tall posts or a sheet on the ground. The latter cues provided salient high-contrast edges, which, in turn, made for far more prominent view differences between correct and incorrect corners.

Why the contrast between ants and children? Differences in visual systems may go some way toward explaining differences in spatial cognition. Primates possess frontal vision with a high-resolution fovea, whereas ants possess wide-field, low-resolution vision (e.g., ~300° span in Melophorus bagoti, with ~4° resolution; Schwarz, Narendra, & Zeil, 2011). Primates also possess a dedicated stream for object perception, the ventral stream (Goodale & Milner, 1992; Milner & Goodale, 1995; Mishkin, Ungerleider, & Macko, 1983), which is lacking in ants and most other animals. From the viewpoint of sensory ecology, it makes some sense for ants to capitalize on wide-field, coarse-grained representations such as skylines, while primates can capitalize on objects and beacons delivered by their visual systems.

In one recent study on the Australian desert ant M. bagoti (Wystrach, Beugnon, & Cheng, 2011), the ants were provided with a giant 3 × 2 m beacon made of black cloth, placed right behind their nest from the viewpoint of a feeder provisioning them cookie crumbs. The beacon was obvious and salient to humans working there. And yet the experimental results showed that this informative beacon was not extracted or treated as a beacon by the ants. When they were moved to a distant test field at which the rest of the scenery differed, the ants failed to home toward the beacon, even when they were released only a couple of meters in front of the beacon. Other results suggested that the beacon formed part of a much larger panorama that the ants were using to guide their journey home. Much work remains to be done to figure out just what aspects of the views drive navigation in ants and other insects. A pixel-by-pixel matching strategy is unrealistic, but insect perceptual systems deliver a palette of parameters that might be used for navigation, from skyline heights to contour edges to total contour length.

Comparatively, the much-studied rat constitutes an interesting case in sensory ecology. The rat’s vision spans a large angular range, like that of ants. Its resolution is also more like ants than like primates. Prusky, Harker, Douglas, and Whishaw (2002) found that wild rats and some domesticated strains have a threshold of spatial resolution around 1 cycle per degree. This is comparable to the resolution of the Gigantiops ants that Wystrach and Beugnon (2009) tested in arenas—the best eyes found in ants—and an order of magnitude poorer than that found in humans, which measures in minutes (Cavonius & Schumacher, 1966). We could find no publications on the ventral stream in rats, but a division in processing object and spatial information similar to that for humans has at least been suggested for rats (Knierim, Lee, & Hargreaves, 2006). Of pathways coming into the hippocampus, the medial enthorhinal cortex is said to provide spatial information, whereas the lateral enthorhinal cortex is said to provide nonspatial (object) information. To what extent rats use individually identified objects as landmarks, as opposed to entire panoramas, remains an open question. With a wide visual span, however, we might expect panoramic view matching to play a significant role in rat navigation, and the two-factor computational theory reviewed later does place view matching at the core of navigation in rats. It would be worthwhile to test rats with transformations of visual scenes in explicit attempts to test view-matching theories.

The primate object-perception stream may not, however, wholly replace scene analysis for navigational purposes. Scene analysis mediated by the parahippocampal and retrosplenial areas is thought to play a crucial role in human navigation, although the object-perception stream contributes to recognizing landmarks (Epstein, 2008). Much work remains to work out just how views contribute to human navigation and what a view consists of. Given the different visual Umwelts of primates and insects, views are different experiences and contribute different packages of information to different taxa.

View matching is a new idea in the geometry literature, and a host of results in diverse animals have not been examined from this perspective. Cue competition is one case in point. A view-matching model needs to be supplemented with a learning model to have a chance of predicting cue competition results. A typical view-matching model (e.g., Cheung et al., 2008) specifies only how a model performs in a particular situation, with the target view, however it is specified, already learned. Some other theoretical assumptions, and consequent modeling, are needed to predict how learning one target view affects the learning of other target views. The views to learn for matching are typically taken to be holistic and panoramic—in this sense, configural in nature. Thus, configural theories of learning, such as those proposed by Pearce (1994), might be most amenable to this kind of modeling. The suggestive data on view matching in chicks need to be followed up with formal modeling. And a host of other vertebrate species have simply not been examined with respect to view matching. In an age of digital cameras, to which commercially available panoramic lenses may be attached, obtaining panoramic views in experimental setups is easy. Coming up with realistic models based on the views is a challenge, but we deem it worth doing for finding evidence for and against this idea (for applications to ant navigation, see Baddeley et al., 2012; P. Graham & Cheng, 2009a; Philippides et al., 2011).

Associative theory

Rescorla and Wagner (1972) described a theory of cue competition in classical conditioning experiments. There are two common kinds of cue competition. When cues are redundant with each other and they are presented sequentially, the model predicts blocking (in which training with the first cue prevents learning of a second, redundant cue added later). When redundant cues are presented simultaneously, the model predicts overshadowing (reduced learning of each cue relative to learning of either cue when presented alone). In the case of reorientation experiments, there are often redundant cues, such as a landmark in a geometrically defined corner. When unique features are available that mark where something is, there is no logical need to also encode location with respect to a geometric surround (and vice versa). Do we see cue competition (blocking or overshadowing) in these situations?

One initially attractive argument in favor of modularity was based on the absence of cue competition between geometry and features (e.g., Wall, Botly, Black, & Shettleworth, 2004). This lack of interaction seemed to favor the existence of independent systems. However, both empirical and theoretical problems with this argument arose. Empirically, experimenters have actually sometimes observed cue competition (e.g., Gray, Bloomfield, Ferrey, Spetch, & Sturdy, 2005) and have also observed enhancement of learning of one type of information in the presence of the other (e.g., M. Graham et al., 2006; Pearce, Ward-Robinson, Good, Fussell, & Aydin, 2001). Theoretically, it is not clear that we should expect cue competition even within a single spatial learning system and, hence, not clear that independence would diagnose the existence of two systems. For example, a highly salient stimulus in a system might not suffer cue competition from other stimuli used by that system. Something to this effect has been reported in the realm of classical conditioning (Denniston, Miller, & Matute, 1996). Empirically, from the perspective of cognitive map theory, Hardt, Hupbach, and Nadel (2009) have found that human adults do not show blocking phenomena between two sets of landmarks, even when one set is redundant to the other, in conditions that encourage exploration of the environment. Since no one would suggest that one arbitrary set of landmarks is modularly separable from another set of landmarks, their data make the point that blocking or its absence will not reliably adjudicate the issue of modularity.

However, even if cue competition does not adjudicate modularity, either empirically or theoretically, the complex pattern of both the existence and the absence of cue competition, and even of its opposite (facilitation), does cry out for explanation. What variations in experimental paradigms can explain the variety of findings? To address this puzzling set of data, without postulating a special status for geometric information, Miller and Shettleworth (2007) proposed a model based on the Rescorla–Wagner model, adapted for spatial learning tasks that are operant in nature. That is, in spatial learning studies, participants typically choose where to search for food, hidden platforms, and so on, presumably on the basis of their experiences of success and failure, rather than being offered a fixed menu of exposures to stimuli, as in classical conditioning. In dealing with operant situations, modelers must define the probability with which certain cues—that is, particular locations—will be sampled.

The core insight of the Miller–Shettleworth (MS) model is that participants learn about multiple cues when they encounter success (or failure) at a given location. For example, they may learn that a food reward was present in a corner where a long wall is to the left of a short wall and also learn that a food reward was present where there is a striped panel. If a striped panel predicts a reward with 100 % accuracy, the marked corner will rapidly be visited more often. Learning of the correct geometry in a rectangular enclosure will then be better than in a situation in which there is no stable predictive feature and in which geometry predicts the correct corner only 50 % of the time. Depending on the task parameters, one may see either actual enhancement or merely the absence of blocking. Furthermore, in modeling the cases in which blocking or overshadowing have been observed, Miller and Shettleworth noted that such effects have occurred in experiments done in water tanks, where multiple corners are visited until success is achieved (as opposed to searches for food, where the search is often terminated after the first choice). When multiple corners are visited, a feature in a correct (or in an incorrect) corner can affect the probabilities with which corners are visited, creating blocking or overshadowing effects.

Although generally favorable to the MS model, Dawson, Kelly, Spetch, and Dupuis (2008) discovered a theoretical problem with it—namely, that the MS equations require that all elements have positive values and yet it is possible for associative strength to be negative (i.e., inhibitory). They suggested an alternative approach to the MS style of model, involving use of a simple neural network. However, Miller and Shettleworth (2008) advocated an alternative remedy, a simple correction to the way in which associative strength is calculated that prevents negative probabilities, or probabilities greater than 1. They showed that the revised model did at least as well as their original one in simulating the results of experiments.

The ability to predict and model a very complex pattern of findings is a strong attraction of the MS model. An additional positive aspect of the MS model is that it includes both a term for cue salience, hence allowing for the modeling of the differential effects of factors such as the size of a feature (shown to be important for monkeys by Gouteux et al., 2001), and a term for learning rate, hence potentially allowing for modeling differences between species or between ages. In fact, Miller (2009) has since extended the MS model to address some of these phenomena in detail. One phenomenon is that older children are more likely than younger ones to use features even in small enclosures, and Miller modeled this finding by adjusting the parameter of learning rate. Another phenomenon in the geometric module literature is that the size of an enclosure matters, with features easier to use in larger enclosures (e.g., Learmonth et al., 2002) and more likely to be chosen in conflict experiments (e.g., Ratliff & Newcombe, 2008). Miller modeled the effects of enclosure size by changing the salience of geometry or features (or both).

Critique

There are theoretical and conceptual, as well as empirical, challenges to the MS model. One issue derives from the model’s characterization of geometry, which is simply lumped together as one element. Yet geometric properties are clearly composed of a host of cues, with some used and others not used, as discussed in the section on modularity theory. Furthermore, some evidence suggests that distance and directional components may be independently calculated (bees, Cheng, 1998; pigeons, Cheng, 1994; Clark’s nutcrackers, Kelly, Kamil, & Cheng, 2010), perhaps making up different elements. An elemental model such as the MS model could break down “geometry” (and features) into any number of pieces, but that theoretical work still needs to be done. Other elements in addition to geometry and features may need to be added. For example, the slope of the arena floor has been shown to be an important cue, often more important than geometric cues in conflict situations for pigeons (Nardi & Bingman, 2009; Nardi, Nitsch, & Bingman, 2010), and slope is also used by humans, although possibly less consistently (Nardi, Newcombe, & Shipley, 2011). The cast of elemental pieces is crucial to predicting how the agent would behave in various kinds of transformed spaces—for example, when a rectangle is elongated or made into a square or trapezoid. As an alternative strategy, an associative model based on configural cues, in the spirit of Pearce (1994), might use abstract holistic geometric representations, with the possibility that such a characterization can generate predictions regarding transfers to different spaces. But such a model has not been developed.

A second issue is whether the model works in an ad hoc way. Changing salience values or learning rates to model known findings is quite different from independently deriving salience values or learning rates and predicting novel results. The use of adjustments in learning rates to explain developmental change is particularly questionable. Studies with young children involve very few trials, often only four trials, and hence changing the learning rate as a function of age may model the data only in cases where there are more trials than could ever be realistically obtained or than are needed; Miller (2009) used 30 trials in his model of age differences. In fact, changes in feature use by young children are obtained with very few trials (Learmonth et al., 2008; Twyman, Friedman, & Spetch, 2007), suggesting activation, rather than learning, or else extremely rapid learning.

Adaptive combination theory

Spatial memory and judgments are typically based on a variety of cues, and there is evidence that these cues are combined in a Bayesian fashion (Cheng, Shettleworth, Huttenlocher, & Rieser, 2007; Huttenlocher, Hedges, & Duncan, 1991). This idea can be applied to the data on use of geometric and featural cues. In contrast to modularity theory, adaptive combination theory proposes that both geometric and featural cues can be used for reorientation in a fashion that depends on a combination of cue weights, with the weights determined by factors such as the perceptual salience of the cues (which affects their initial encoding), the reliability of the memory traces (i.e., subjective uncertainty, which is related to the variability of estimates), and the validity of that kind of cue given prior experience (Newcombe & Huttenlocher, 2006; Newcombe & Ratliff, 2007). Learning of the relative cue validity of various kinds of feature and geometric cues takes place over developmental time, so that children of different ages bring different expectations into the experimental rooms with them. However, because seeing distal landmarks while searching for objects requires upright locomotion, the necessary experience might not start to accumulate until children become confident walkers, which generally occurs at some point in the first few months after the first birthday. The flexibility of adaptive combination theory allows one to explain several aspects of the “geometry” data.

First, as we have seen, the dominance of geometric information over feature use has turned out to depend critically on the size of the enclosure, with geometry more likely to be used in small spaces and features more likely to be used in large spaces, for a wide variety of species and ages. These data cannot be explained by any interesting version of modularity theory, because an adaptive module should operate across variations in scale and should especially operate in large spaces. It is true that there might be a module that applies only to very small enclosures, but it is hard to see how such a module would be central to survival and reproduction in any plausible environment of evolutionary adaptation. By contrast, the changing relative use of geometric and feature cues based on the scale of space as a function of their cue validity is an integral part of adaptive combination theory. We have already discussed why cue validity might change with enclosure size, including factors such as the ease of encoding relative magnitudes, the possibility of action, and how far away the distal information is. In addition, use of geometry not only varies with enclosure size, but also is crucially dependent on disorientation. An oft-neglected fact is that geometry does not affect spatial behavior when organisms remain oriented, as shown both in the original work in the 1980s and also more recently by Knight, Hayman, Ginzberg, and Jeffery (2011), who studied head direction cells in the rat. In fact, Knight et al. noted that their data are best accounted for by a Bayesian integration process.

Second, adaptive combination predicts effects of experience, over both the short and the long term. Such experience effects are not predicted by modularity theory; modules are generally characterized as inflexible and impermeable. Adaptive combination theory tackles the effects of experience head on, suggesting that gaining information about cue validity would be an important determinant of the use of features and geometry. There are several training experiments that provide support for the effects of short-term experience. Twyman et al. (2007) gave children practice using a feature for reorientation in an equilateral triangle with three differently colored walls (i.e., no useful geometry). After only four practice trials of this kind, 4- and 5-year-old children used the feature wall to reorient in the small rectangular spaces used by Hermer and Spelke (1994, 1996), in which children of this age generally rely exclusively on geometric cues. The short training period was effective in either the presence (a rectangle) or absence (an equilateral triangle) of relevant geometric information. Along similar lines, four trials of experience in a larger rectangular enclosure have been found to lead to young children’s use of features in the smaller enclosure (Learmonth et al., 2008).

Ratliff and Newcombe (2008) demonstrated similar effects of brief experience for adults. Participants were asked to perform a reorientation task in either a small or a large room. Then some of them switched rooms halfway through the experiment. People who had started in the large room (where features are salient) relied more heavily on the feature cue than did people who spent all trials in a small room. In contrast, individuals who had started in the small room (where geometry is salient) began to use feature information when moved to the larger room; in fact, they performed no differently than individuals who had remained in the large room for all trials. It seems likely that successful search based on using the feature in the large space increased the relative salience of the feature cue; this change was reflected when the same task was performed in the smaller space.

Short-term experience also matters for pigeons. Kelly and Spetch (2004) trained pigeons on a reorientation task presented on a computer screen. Some of the pigeons were initially trained with geometry, and others with features. The pigeons then experienced training with both kinds of cues. The pigeons with the geometry pretraining relied on both geometric and feature cues, while the pigeons with the feature pretraining mainly relied on the feature cues. It is not clear why one group used both cues while the other group used only the trained cue, but the difference between groups is important.

These experiments with children, adults, and pigeons all indicate that reorientation is a flexible system updated on the basis of prior experiences. Is the same true for experiences over a longer period of time and earlier in development? A series of rearing experiments have examined this question, beginning with a study of wild-caught mountain chickadees (Gray et al., 2005). Wild-caught birds are likely to have experienced rich feature information in their natural habitat—in this case, forested areas near streams and mountains, which clearly do not include uniform rectangular enclosures. The wild-caught mountain chickadees relied more heavily on feature cues than did lab-reared birds. However, when the reorientation abilities of wild-caught and lab-reared black-capped chickadees were examined, few differences were found (Batty, Bloomfield, Spetch, & Sturdy, 2009). It is unclear whether there is something different about the experiences of black-capped and mountain chickadees that causes these differences or whether there is a difference between the two species.

An alternative approach is to manipulate the rearing environment in the lab, a strategy that has been used with chicks, fish, and mice. For chicks, there does not seem to be any difference between rearing in a circular environment (lacking relevant geometry) and rearing in a rectangular environment in relative use of feature or geometric cues (Chiandetti & Vallortigara, 2008, 2010). However, we note that only 2 days elapsed before training began and that chicks are a precocial species with perhaps less of a sensitive period. In experiments with other species, a different pattern has emerged. Convict fish reared in circular environments relied more heavily on feature cues than did fish reared in rectangular ones (Brown, Spetch, & Hurd, 2007). Similarly, mice raised in a feature-rich environment lacking in geometry (a circle with one half painted white and one half blue) differed from mice reared in a geometrically rich environment (rectangular enclosure with a triangular nest box). Although there were no significant differences in the acquisition of geometric information alone, the circular-reared mice were faster to learn a feature panel task. Additionally, and crucially, on a test of incidental geometry encoding (a rectangle with a feature panel marking the correct location), results showed that the rectangular-reared mice had encoded the geometry, while the circular-reared mice had not (Twyman, Newcombe, & Gould, 2013).

Critique

Adaptive combination theory needs to be quantitatively specified to be rigorously evaluated. For example, consider a recent criticism of the approach from Lee and Spelke (2011).Footnote 1 They found that children reoriented in accord with edges that were raised off the floor but had low visual contrast but did not use flat rectangles with prominent brightness contours. However, unlike view-matching theory, adaptive combination is not committed to brightness contrast as a determinant of salience or cue validity, and in the real world, bumps are a greater impediment to navigation than brightness contrasts. Children would have had ample opportunity to learn these facts in homes that include different colors of floor surfaces, rugs of varying thicknesses, shallow, unmarked steps to a patio, and so forth. Nevertheless, a priori specification of these issues is needed for rigorous evaluation. In sum, future work needs to more rigorously specify the parameters in a well-defined model, determine whether the model can explain phenomena of cue competition, variations across species, and paradigms and test novel predictions.

Additionally, adaptive combination theory should consider whether age-related change in the underlying neural substrates for navigation (reviewed in more detail in the next section) should be included in the model, rather than just the effects of experience. Sutton, Joanisse, and Newcombe (2010) and Sutton, Twyman, Joanisse, and Newcombe (2013) showed in fMRI experiments with human adults that the hippocampus is involved in geometric processing and in binding together features and geometry. The hippocampus is now known to undergo age-related change through at least 5 years (Gogtay et al., 2006). However, there are few data bearing directly on the possibility of a link between hippocampal maturation and behavior in the reorientation paradigm. Lakusta, Dessalegn, and Landau (2010) found that individuals with Williams syndrome, which is associated with abnormalities in both the hippocampus and parietal cortex, behaved oddly in the reorientation paradigm. With no features present, they searched randomly, failing to use geometry, although they did better with a colored wall added. However, the parietal damage in Williams syndrome may be as important as the hippocampal abnormalities in accounting for this pattern. Experiments with nonhuman animals would help to fill this gap.

Neurally based theories

Most of the research on the geometric module hypothesis has been behavioral. It may seem, however, that a more direct and natural way to evaluate the existence of a geometric module would be to determine whether there is a neural substrate for processing geometric information, which failed to process other kinds of spatial information. Especially compelling evidence would involve direct control of behavior in the reorientation paradigm by such an area, without any later processing that integrated its output with the output of other spatially relevant areas.

Cheng and Newcombe (2005) reviewed two candidate neural mechanisms for processing geometric information, and the first part of this section updates what we have learned about those mechanisms. We then examine two more overarching theories, both of which are two-factor theories involving the hippocampus and the striatum. Thus, before turning to either theory in detail, we offer an overview of those mechanisms, which have been discussed for some time, beginning with O’Keefe and Nadel (1978).

Two candidate mechanisms for processing geometry

The first candidate discussed by Cheng and Newcombe (2005) was the boundary vector cell, whose existence at that time was merely hypothetical (Barry et al., 2006; Burgess et al., 2000; Hartley et al., 2000). The basic idea is that a combination of distances from walls, as signaled by the conjunction of two or more such boundary vector cells (perhaps AND-gated) defines a place for a place cell (O’Keefe, 1976; O’Keefe & Dostrovsky, 1971). Place cells in rats, found in the hippocampus proper, fire the most when the rat is at a particular place in the environment, irrespective of how it got there or which way it is facing. Cells that signal a particular distance from a boundary have since been located.

Solstad, Boccara, Kropff, Moser, and Moser (2008) recorded from border cells in the entorhinal cortex of the rat that were found to fire when the rat was a specified distance from the walls of a square enclosure. The walls could be low as well as high, and the cells also fired when a boundary was defined by a drop rather than a rise in elevation. Fascinating as these border cells are, however, most of the border cells discovered by Solstad et al. fired right at a boundary. Very few cells actually fired at any distance from the boundary. Solstad et al. interpreted border cells pretty much according to their name, as representing the borders surrounding a space.

Far more promising is a report with “boundary vector cells” in its title (Lever et al., 2009). These cells were found in the subiculum of the hippocampal formation. While most of these cells fire the most when the rat is close to a boundary, a good number fire the most at some distance from a boundary. As in border cells, the boundaries can be edges with a drop-off, as well as walls. And like border cells, these boundary vector cells often keep their constancy in light of environmental changes, including changes in the color or material of walls or in the shape of the space or changes from a wall to a drop-off. Although the subiculum is usually considered downstream from the hippocampus, Lever et al. argued that a loop from subiculum to enthorhinal cortex to hippocampus has been identified and, thus, the boundary vector cells may be upstream from the place cells in the hippocampus.

The constancy in light of environmental changes seems to abstract the geometric property of distance from a boundary, but it is a double-edged sword when it comes to a geometric module. Abstracting distance from a boundary is good, but being insensitive to the shape of a space is not good for building a geometric module. The place cells in any case do not show such constancy. They “remap” to all kinds of environmental changes (Jeffery & Anderson, 2003; Jeffery, Gilbert, Burton, & Strudwick, 2003; Lever et al., 2009)—that is, changes in where place cells fire the most are found. Identification of a system that abstracts out the geometric properties posited in Cheng (1986) still seems to remain elusive.

The second candidate for a neural mechanism for geometric processing reviewed by Cheng and Newcombe (2005) was the parahippocampal place area (PPA). The PPA had been found to preferentially process environmental surrounds, including both buildings and natural environments. However, even in 2005, the PPA did not seem to have quite the right properties for encoding geometric information for reorientation. Most notably, the information it encoded seems to be viewpoint specific, in accord with the extraction of a view for matching, rather than the extraction of the geometric layout. Epstein (2008) has since reviewed the evidence on the navigational relevance of the PPA and of another brain area, the retrosplenial cortex (RSC). He argued that it is the RSC that specifies the relation of a scene to other scenes and, hence, to the wider world. He suggested that the viewpoint-specific geometric representations recognized by the PPA might be the starting point of reorientation when an organism is lost, with subsequent processing by the RSC required to link that starting point to other locations—that is, to complete the act of reorientation. On this view, the PPA processes geometric information, but it is not sufficient for reorientation on its own, and it is neither modular nor the complete source of information termed “geometric” in the behavioral paradigm.

Two-factor theories: locale and taxon systems

Other neurally inspired theories build on decades-old ideas regarding locale and taxon spatial learning in rats. Locale and taxon systems of navigation were first proposed by O’Keefe and Nadel (1978) to characterize two different modes of navigation in rodents. Loosely, the locale system is map based, and the taxon system is route based. The hippocampus and its place cells (O’Keefe, 1976; O’Keefe & Dostrovsky, 1971) play the major role in the map-like navigation of the locale system. Taxon behaviors are more closely tied to particular stimulus characteristics (going to a beacon or following the shore of a lake) and motor outputs (turning to the right by 90°) (O’Keefe & Nadel, 1978) and depend more on the striatum (Packard & McGaugh, 1996; White & McDonald, 2002).

Place cells are found in mammals other than rats (humans, Ekstrom et al., 2003; bats, Ulanovsky & Moss, 2007) and in birds (pigeons, Siegel, Nitz, & Bingman, 2005), so that locale systems may be widespread among vertebrate animals. However, map-like representations are thought to be absent in ants (Cheng, 2012; Cruse & Wehner, 2011). The existence of a locale system and its nature might thus mark substantial differences in spatial representation across taxa. Place cells are now known to be supported (i.e., fed information) by head direction cells (Taube, 2007; Taube, Muller, & Ranck, 1990a, 1990b) and grid cells (Fyhn, Molden, Witter, Moser, & Moser, 2004; Hafting, Fyhn, Molden, Moser, & Moser, 2005), as well as by border cells (Solstad et al., 2008) and boundary vector cells (Lever et al., 2009).

The taxon system is instantiated in the dorsal striatum. Thus, the two systems form independent ways of learning, with the taxon system operating only when the learning conditions are stereotypical (i.e., have the same start location from trial to trial). It is interesting to note that both locale and taxon systems are founded on views. The taxon system is based directly on views, but the locale system is based on views indirectly, with place cells playing the key mediating role between views and behavior.

A two-factor hippocampal-striatal theory

One major proposal regarding processing of geometric information has emerged in two related articles that build on these ideas of a hippocampal (or locale) system and a striatal (or taxon) system (Doeller & Burgess, 2008; Doeller et al., 2008). The first article is purely behavioral, and the second article reports on an fMRI study. Together, the articles provide evidence for the hypothesis that the right posterior hippocampus encodes geometric information incidentally and without requiring reinforcement, while a striatal system encodes landmark information in a fashion governed by associative reinforcement. With regard to the first system, Doeller et al. (2008) write specifically that “the distinct incidental hippocampal processing of boundaries is suggestive of a ‘geometric module’” (p. 5915).

Both experiments were conducted in a virtual reality environment, in which human participants were asked to learn the locations of objects in a circular area surrounded by a wall and containing local landmarks (resembling traffic pylons) within the enclosure. Outside the wall, extra-maze orientation cues such as mountains appeared so that participants knew which way they were facing, but these cues were rendered at infinity so that distance from them could not be used to locate objects. Doeller and Burgess (2008) found that, when a boundary was present during learning, landmark learning was overshadowed but learning in relation to a boundary was unaffected (i.e., not overshadowed). These data support the idea that the boundary system is not associative, while the landmark system is. Along similar lines, Doeller and Burgess found that blocking effects occurred with landmarks but not with boundaries. Doeller et al. (2008), using fMRI and the same paradigm, found that boundary learning activated the right posterior hippocampus and landmark learning activated the right dorsal striatum. They also found evidence that, when the two systems are similarly active, so that neither can dominate behavior, ventromedial prefrontal involvement may occur to adjudicate between competing outputs.

Critique

These data are elegant and thought-provoking, but the “suggestive” link to the geometric module should not be pushed too far. Indeed, the authors pointed out differences. Doeller and Burgess (2008) pointed out that boundary learning does not dominate, in that learning rates with the two kinds of cues are similar (p. 5913), while Doeller et al. (2008) pointed out that the system is used for “determining location rather than orientation” (pp. 5918–5919), because the participants were not disoriented. There are other issues as well. First, as we saw in the section on the MS associative model, blocking and overshadowing effects are quite variable across experiments, and in fact, we have sometimes seen facilitation effects. How would the hippocampal-striatal model handle these complex phenomena? Second, while the time course of landmark and boundary learning in these experiments was the same, as noted by Doeller and Burgess, it is puzzling that reinforcement learning did not take longer. Work on rats show that they learn “place-based” behavior on a maze before they learn “response-based” behavior (Packard & McGaugh, 1996). Why were the time courses identical in this study? Third, investigators have found individual differences in reliance on a hippocampal and a striatal system (Bohbot, Iaria, & Petrides, 2004; Bohbot, Lerch, Thorndycraft, Iaria, & Zijdenbos, 2007; Iaria, Petrides, Dagher, Pike, & Bohbot, 2003; Schinazi, Nardi, Newcombe, Shipley, & Epstein, 2013), including an apparent trade-off between the two. If the hippocampal system operates in an incidental and obligatory fashion, why does it vary across individuals and appear to be responsive to experience (Lerch et al., 2011)?

There are also several aspects of the experimental design and the virtual environment that suggest that caution is required in extending the Doeller et al. (2008) model to the geometric module studies and debate. Most notably, purely geometric cues are minimally helpful in the virtual reality used in their studies, because the enclosure is circular and features and geometry are bound together in landmarks that have a defined shape and volume. Thus, at the least, one would have to exclude isolated landmarks from the geometric module, a tack that Spelke and colleagues have actually taken (reviewed above). Furthermore, given that the participants were not disoriented, it is unclear to what extent the results bear on the very specific claim that geometry is used for reorientation. In an fMRI study that did use the geometric module paradigm, Sutton et al. (2010) found that processing in either a square or a rectangular room with one colored wall recruited hippocampal activation, as compared with a condition in an all-gray rectangular room. These data suggest a very different picture from the Doeller et al. approach; it would appear that the processing of featural information, the binding of features and geometry, or both are hippocampal tasks, instead of or in addition to processing of geometry or boundaries per se. The hippocampal system might play a far more reconstructive role in learning and memory than envisioned in the two-factor theory, as actually proposed by Burgess with other colleagues (Byrne, Becker, & Burgess, 2007; see also Bird & Burgess, 2008).

Much more work remains to be done using fMRI and virtual environments to elucidate these phenomena. One can imagine a variety of relevant environments, varying in the characteristics of the enclosures (circular and various kinds of geometric shapes) and varying in the nature of landmarks (how many, how distal, how distinctive, whether or not “pasted on” to the enclosures as are colored walls, and so forth). One can also imagine that work with nonhuman animals using single-cell recording techniques in these environments would be illuminating. There is a rich field of inquiry here.

A neurally based two-factor computational theory

A different neurally inspired model that also builds on ideas of the locale and taxon systems is a formal computational model proposed to explain some of the phenomena observed in research on the geometric module, as well as other data on spatial learning, such as behavior in the Morris swimming pool (Sheynikhovich et al., 2009). Sheynikhovich et al.’s taxon system is based directly on views, in that representations of views are used to direct behavior. Many kinds of visual characteristics may be encoded for navigation, but the proposed views are based on edge contours. A panoramic field of oriented edges forms the basis for view matching (Fig. 4). In the model, views are associated with hypothetical view cells, which fire preferentially to particular views in particular orientations. Views can be associated with rewards in spatial learning. This associative process is based on reinforcement learning and is subject to cue competition. The process of encoding a view, however, is nonassociative, takes place incidentally, and is not subject to cue competition. Views can drive matching processes directly, as we have reviewed above, and they can be linked to motor behaviors (e.g., turn right 90°, head to this part of the view).

Fig. 4
figure 4

Hypothesized view matching in rats, from Sheynikhovich et al. (2009). a A panoramic image from a location inside a square arena with a view of the surrounding room. b The edge-based view hypothesized to be encoded by the rat. The lines show the orientations of contrast edges, with the length of the line showing the strength of the edge orientation. The inset shows an example of the oriented Gabor filters for extracting edge information (not to scale). This one is sensitive to a vertical contrast edge. c The “performance” of a hypothetical view cell that functions to encode this view seen in a particular orientation. The cell fires the most when the rat is at the correct location facing the correct direction for the view. (Reprinted from Sheynikhovich, D., Chavarriaga R., Strösslin,T., Arleo, A., & Gerstner, W., 2009)

The taxon system can learn both geometric and featural cues, although in the model, these cues are not separated but both contained implicitly in the edge-based views. They are different characteristics of the same view. It is the taxon system that accounts for the success of rats in using features as well as geometry in a reference memory task (Cheng, 1986). Crucially, the taxon system could work in those experiments because the rats were always released at the center of the arena, facing random directions. Had the rats been started at multiple locations, as they were in the working memory experiment, the model would predict a locale system for learning, with a larger prevalence of rotational errors, results resembling more what was found in Cheng’s (1986) working memory experiment. These predictions are easy to test, but the suggested experiments have not been done as yet.

The locale system is hypothesized to be based on the “performance” of place cells, much in the spirit of O’Keefe and Nadel (1978). A place cell’s firing is said to be based on integrating multiple views available at a place, together with information from grid cells and head direction cells. We should point out, however, that views are not necessary for driving place cells, since blind rats have functioning place cells (Save, Cressant, Thinus-Blanc, & Poucet, 1998). In using visual cues, a view alignment process akin to figuring out which way one is facing needs to take place. In symmetrical spaces such as a rectangular arena, systematic misalignments (by 180° in this case) can take place and cause a rotational error. Head direction cells are known to misalign in symmetric spaces (Golob, Stackman, Wong, & Taube, 2001), although intriguingly, the “mistakes” of the head direction cells in that study do not match the mistakes made by the rats at above chance levels. Basically, a local minimum in view matching is found at the rotational error, as is the case with view matching based on pixel-by-pixel matching (Cheung et al., 2008, Stürzl et al., 2008). Once misaligned, the error is not corrected. This prediction is consistent with the nature of rotational errors found in Cheng (1986): Although not reported, the rats that made the error never corrected themselves when they had plenty of time to do so, a test being 2 min long no matter what the rats did.

The locale representation is map-like, again in the spirit of O’Keefe and Nadel (1978), because different places are linked by vectors supplied by path integration. Locale place learning is a matter of associating the firing of a place cell with reward. This kind of learning too is reinforcement learning based on associative principles and subject to cue competition, just like the taxon system. Building the map system from views, however, takes place incidentally, much in the spirit of Tolman (1948).

This conception of a locale system based on place cells driven by views has limits in accomplishing cognitive mapping. In particular, it would have problems piecing together different bits of views seen sequentially but not simultaneously. Benhamou (1996) has evidence to this effect, which he interpreted as a lack of cognitive mapping in rats. Rats had the task of finding a hidden platform in a round swimming pool surrounded by a rich array of cues in a typical lab room. The view, however, was restricted during training by a three-quarter circular “marquee” that allowed only 90° of view from the target. Different training phases provided different 90° views. The rats had actually seen all the landmarks during training, because they were started from outside the marquee during training. The question was whether they could figure out where the platform was with a new view, a location they had to infer (map out) from integrating previous partial views of the surround. The rats generally failed. Because no place cells are associated with the new view on a test, Sheynikhovich et al.’s (2009) locale system would also fail this task (D. Sheynikhovich, personal communication, April 2011). However, in considering this question, we also need to keep possible cross-species differences in mind. There is evidence from humans that suggests that our species is capable of piecing together separately viewed pieces of spatial information (Schinazi et al., 2013).

Critique

We note that, if the model is correct in essence, there is no separation of geometry and features. Both kinds of information are contained in edge-based views. However, this lack of distinction may be a virtue of the model, to the extent that it can predict all of the “geometry” results. The model does seem to predict the basic pattern of results found in Cheng’s (1986) study, and it is consistent with a good deal of neurophysiological data on grid cells, head direction cells, and place cells, as well as with behavioral data from the swimming pool that we have not reviewed. The model has the virtue of being clear, and it makes predictions that are easy to test.

However, there is much more work to be done to test the range of application of the model. We have mentioned that an experiment using the reference memory paradigm in Cheng (1986), but starting the rats at multiple starting points, should result in a higher level of rotational errors, because the locale system is subject to view misalignments. Another prediction comes from the fact that view misalignments are based on a comparison of edge-based views, one particular kind of features. Cheng (1986) made features more obvious in one experiment by making one entire long wall white while leaving three walls black. From an edge-based perspective, this change has not added salient edges, and consistent with the model, rotational errors continued apace. A straightforward prediction that can be easily tested is that with more salient edges in the rectangular arena, rotational errors should diminish. Thus, if a long wall had a white patch in its middle, that change would add two salient edges and should lower, if not eliminate, rotational errors, even in the working memory paradigm.

In addition, the model has not been applied to phenomena of cue competition, species commonalities or differences, or development. Sheynikhovich et al. (2009) pointed out that predicting whether cue competition will be found is a complex business, depending on the mix of incidental view learning and associative learning, and they did not provide an account of these phenomena, leaving that task for the future. The model is clearly focused on rodents and makes no pretence at accounting for all taxa of animals. Even among vertebrate animals, different neurally based models might be necessary. For example, birds show hemispheric specializations in using geometric and nongeometric cues, patterns that differ across species (Vallortigara, Pagni, & Sovrano, 2004; Wilzeck, Prior, & Kelly, 2009). Sheynikhovich et al.’s model involves learning and could perhaps be applied to models of change over short-term experience, but it has not made any attempt to deal with development over the longer term.

Future directions

In the course of this review, we have identified several lacunae in the data base and suggested various directions that future research needs to take. Here, we would like to highlight two overall themes.

Relative magnitude as a starting point?

Several domains were proposed by Spelke and Kinzler (2007) as having core knowledge components. They include core knowledge about objects, agents, numbers, and in-group membership (vs. out-groups). The geometric module was said to constitute core knowledge in the spatial domain. As we reviewed, the redefinitions of geometry under the assault of recent data substantially dent the classic, formal definition of geometry. We were left with the puzzle that some highly useful geometric cues seem to be omitted from core knowledge: the geometry of individual, separate objects, asymmetric geometry, angles formed by objects and surfaces. The remaining package makes little functional sense to us.

It is time to consider that core spatial knowledge, if any, might be something more abstract and open, such as a propensity to represent magnitudes of space (perhaps magnitudes across multiple dimensions of experience; e.g., Cheng, Spetch, & Miceli, 1996; Lourenco & Longo, 2010; Walsh, 2003; for a review, see Lourenco & Longo, 2011), and extract and compare the useful geometric properties. This idea builds on findings of early metric coding of distance in infants and toddlers (Newcombe, Huttenlocher, & Learmonth, 1999; Newcombe, Sluzenski, & Huttenlocher, 2005). It is consistent with the finding that scalar magnitudes are easier to use for reorientation than are nonscalar magnitudes (Huttenlocher & Lourenco, 2007; Lourenco et al., 2009; Twyman et al., 2009), and it makes contact with a growing literature on the use of relative magnitude for spatial judgments (Duffy, Huttenlocher, & Levine, 2005; Huttenlocher, Duffy, & Levine, 2002) and in spatial scaling tasks (Huttenlocher, Newcombe, & Vasilyeva, 1999; Huttenlocher, Vasilyeva, Newcombe, & Duffy, 2008). As these articles make clear, the use of relative magnitude changes in many ways in the course of development, partly as a result of experience with navigation, but also as a result of many other factors, including learning to count and measure, and having experiences related to quantity such as sharing food. Dimensions such as length, distance, number, and area get differentiated from each other, and their appropriate scopes of application become better delineated; absolute as well as relative magnitude can be appreciated.

Situating the geometry debate in broader models of spatial cognition

The “geometry” literature often seems to have a life of its own outside of the broader study of spatial cognition and navigation. Some models, for example, are formulated specifically for learning geometry and features (Dawson et al., 2010; Miller & Shettleworth, 2007, 2008; Ponticorvo & Miglino, 2010). However, if the geometric module, in any version, is not part of core spatial knowledge, then perhaps it should not be a separate “cottage industry.” Predictions about how an animal performs in rectangular arenas may be derived better from more general models of spatial cognition (Lew, 2011). In this regard, the model proposed by Sheynikhovich et al. (2009) is exemplary in situating geometry experiments in a broader context. Performance in the geometry tasks is predicted from a formal model that incorporates data from neurophysiology (grid cells, head direction cells, and place cells), as well as from other tasks; in the article, various performance patterns in the Morris swimming pool were also predicted. The model is comprehensive enough that it can be applied to other tasks to generate predictions with suitable choice of parameters. Formal models of this kind have the advantage of being able to make interesting predictions that can be empirically tested.

What was missing from Sheynikhovich et al.’s (2009) model was a treatment of cue competition. Cue competition formed a major impetus for Miller and Shettleworth’s (2007, 2008) model. In this light, we suggest that it would be profitable to add some learning theory components, explicit in the MS model, to a neurophysiologically inspired model such as Sheynikhovich et al.’s model. Sheynikhovich et al.’s model does not contain “geometric” and “featural” properties, so that the “carving” of cues is different from that found in the MS model. Thus, the MS model cannot simply be slotted in. We are thinking far more of the spirit of incorporating a learning process into a neurophysiological model, one that might allow the modeler to predict phenomena of cue competition that have played such a large role of late in the geometry literature. With the holistic configurations represented in views playing a key role, the model is likely to need some configural theory of learning, perhaps in the spirit of those proposed by Pearce (1994). Additionally, or instead, a comprehensive model could incorporate Bayesian principles of the kind advocated for the spatial domain at large (Cheng et al., 2007) and shown to characterize many phenomena in spatial memory and its development (for a review, see Holden & Newcombe, 2013).

Conclusion

As should be clear by now, none of the five theories we have reviewed provides a satisfying account of all of the many phenomena now documented concerning search after disorientation. Modular theory, even in its revised form, has sacrificed the elegance of its original formulation, view-matching theory may only explain the behavior of some species, and associative learning, adaptive combination theory, and the two-factor neurally inspired models have only been applied to some of the extant data. In addition, adaptive combination theory needs to be more precisely quantified. The challenge is clear—namely, to formulate a precise model that accounts for data about what information is and is not used when and by whom, including phenomena of cross-species commonalities and differences, cue competition, and development, preferably in the context of overall approaches to navigation and spatial representation in paradigms other than reorientation, and preferably accompanied by specification of the neural bases of the behaviors. It is a tall order, but the endeavor should be aided by the fact that the data base is extremely rich, given the considerable interest in this set of questions over a quarter century.