For some years now, comparative psychologists have tried to determine whether metacognitive abilities (cognition about cognition) could be attributed to animals. The question remains open (see reviews in Beran, 2019). Kononowicz et al. (2022) adopt a cogent approach to this problem. Instead of asking whether rats have metacognitive abilities, they make the case that they monitor errors during a timing task. Whether this is sophisticated enough to qualify as metacognition or as a precursor to it is up to the reader. But what do the authors mean by error monitoring, and do their data make a convincing case that rats actually do perform error monitoring?

Let us look at the procedure and the data first. In order to be reinforced, rats were trained to press a lever, and then to either hold pressing it for at least 3.2 s (HOLD group) or press it again after at least 3.2 s (PRESS group). The authors measured the time between the first lever press and either the moment the rat stopped pressing the lever (HOLD group) or the moment it pressed it again (PRESS group). They called this measure time production (TP). In both conditions, the rats were able to learn the task and produce TPs slightly longer than 3.2 s, even though the standard deviation of responses was slightly lower in the HOLD group. Moreover, the TPs displayed the signature property of interval timing: the standard deviation of the TP was proportional to its mean (scalar property). The contingencies of reinforcement were then complexified. After producing a TP, the rats were given the choice between two ports: Choosing one port was reinforced with two food pellets (2p port) if the rat had produced a short TP (defined as a TP ranging between 3.2 s and the mode of the TP distribution) while the other port was reinforced with 1 pellet of food (1p port) if the rat had produced a long TP (defined as a TP longer than the mode of the TP distribution). In both groups, the rats were more likely to pick the 2p port after a short TP and the 1p port after a long TP, even though the PRESS group was slightly more accurate than the HOLD group (accuracy was 58% for the HOLD group while it was 64% for the PRESS group).

The short description above skipped over some procedural details (like the use of forced-choice and free-choice trials) and some features of the data (there were time-dependency in the TPs, with notably the current TP value being influenced by the prior ones). In the end, it does not matter because, however you look at them, the procedure and the data are solid. The question is whether it warrants the conclusion that rats are monitoring their errors. This cannot be provided by the data alone: we need a model to answer this question, if only to understand more precisely what is meant by error monitoring in that context. Unfortunately, the authors do not provide an explicit model of their data. I will try to offer my own, which I think, does not betray the intentions of the authors.

Let us assume that the behavior of the rat is based on an internal decision variable D. If D is above a threshold T1, the rat stops holding the lever or it presses it again. Then, if D is between T1 and a second threshold T2 (T1 < T2), the rat chooses the 2p port which is reinforced after short TPs; if D is above T2, it chooses the 1p port which is reinforced after long TPs. This framework is general enough to fit many models of timing, from the scalar expectancy theory (SET; Gibbon et al., 1984) to associative models such as Machado et al.’s (2009) learning-to-time model (LeT), or Jozefowiez et al.’s (2009) behavioral economic model (BEM). The difference would lay in the definition of D.

What would SET’s account look like? We could take inspiration from the way it explains performance in the peak procedure (Gibbon et al., 1984). When the rat presses the lever for the first time, pulses emitted by an internal pacemaker starts accumulating in short-term memory, providing the animal with a representation f(t) of the time elapsed since pressing the lever. If reinforcement occurs, the animal stores in long-term memory the number of pulses f(t*) that had accumulated when it either stopped pressing the lever or pressed it again. At the beginning of a trial, it retrieves a sample f(t*) from that long-term memory distribution of times of reinforcement. In such a framework, D would be the difference between f(t) and f(t*), computed either according to a ratio rule (in which case T1 would be equal to 1) or to a difference rule (in which case T1 would be equal to 0). The value of T2 would be a function of the contingencies of reinforcement. I believe that the authors’ view of the process underlying their data is very similar to this SET-based account. From that perspective, what they mean by error monitoring is that the rats compute the difference between f(t) and f(t*) and can use it to guide their behavior.

Hence, the conclusion that the data reported by the authors are indicative of error monitoring in rats is as good as the proof that the SET account is correct. But alternative accounts are possible and, at this point, there is no reason to prefer the SET account over those ones. For instance, one could assume that D is simply f(t) (the short-term memory representation of the time elapsed since the rat pressed the lever) while the values of T1 and T2 are set by the payoff for each decision the rat can take (choosing the 1p port or the 2p port). This would be mathematically equivalent to the SET account. Such an account would, for instance, be provided by associative theories of timing such as LeT and BEM. Hence, while the data reported by the authors are certainly interesting, notably because they demonstrate that rats can use their own behavior as a time-marker, I do not believe they warrant their conclusion that rats are monitoring their errors.

Would there be a way to distinguish between the SET and the associative accounts? Suppose the animal is trained on two tasks simultaneously. In Task 1, it has to wait at least I1 s after pressing the lever to be reinforced, while in Task 2, it has to wait I2 s. The error monitoring task is then introduced in Task 1 until performance stabilizes. It is then introduced in Task 2. Should we observe any transfer of performance between Task 1 and Task 2? According to an associative account, it should not be the case as the values of the thresholds T1 and T2 needs to be readjusted. To the contrary, because of the scalar property, the SET accounts would predict that the rats should perform in Task 2 as well as in Task 1, despite having no training with the error monitoring task in Task 2.

In conclusion, it is very rare in psychology for data to speak for themselves as their meaning changes depending on assumptions one makes looking at them. As far as I can tell, Kononowicz et al.’s conclusion that their data demonstrate error monitoring in rats rely on such a specific set of assumption whose validity is not universally accepted. Making those assumptions explicit in the form of a computational model and testing this model against alternatives models relying on different assumptions would allow to better understand whether Kononowicz et al.’s conclusions are warranted.