Elsevier

Behavioural Brain Research

Volume 154, Issue 1, 23 September 2004, Pages 165-169
Behavioural Brain Research

Research report
Differential reward outcome learning in adult humans

https://doi.org/10.1016/j.bbr.2004.02.023Get rights and content

Abstract

Differential reward outcome learning provides a unique reward outcome for each condition of a conditional discrimination task, and this increases the rate at which these tasks can be learned. The present experiment aims to show that irrespective of any difference in learning rate, conditional discrimination tasks and differential reward outcome tasks are solved using different strategies. Two groups of 30 adult subjects were taught a series of conditional visual discriminations. One group received different reward outcomes for each condition, whilst the other group received the same reward outcome for each condition. Both groups learned the visual discriminations at the same rate. Subjects were then taught a new conditional cue with a subset of previously learnt discrimination problems and then required to transfer the newly learnt instruction cue to the remaining discrimination problems that had been previously learnt. Although both groups appeared to transfer the newly learnt cue at the same rate, the subjects performing the differential reward outcome task learnt to a criterion level of performance with fewer errors. The results are discussed in relation to evidence from monkeys indicating different neural mechanisms underlying the learning of both tasks.

Introduction

Differential reward outcome (DRO) tasks have been used for over 30 years to examine the way in which unique reward outcomes affect the way in which subjects learn about different conditions of a task. The original study with rats [1] demonstrated that introducing differential rewards in a conditional discrimination task facilitated learning. For a traditional conditional discrimination experiment a conditional cue dictates which of two choice objects is rewarded on a given trial. However, the reward for a correct response is the same no matter which conditional cue has been presented, and therefore no matter which choice object has been responded to. Because of this single reward outcome, there can be no direct association learnt between the conditional cue and the reward outcome, or between the choice object and the reward outcome, as there is an equal likelihood of the reward occurring in the presence of both cues and both choice objects. For example, consider the conditional discrimination of the nature “in the presence of cue A choose object X” and “in the presence of cue B choose object Y”. In this task neither A nor B predict reward themselves, but instead only predict a reward outcome if followed by an appropriate choice of object X or Y, respectively. Similarly, both objects X and Y are equally associated with the reward outcome, but are only rewarded if preceded by the appropriate cue object. Therefore, the task can only be solved by forming configural associations between the cue and choice objects. For example, the configuration of cue A and object X is always rewarded, whilst the configuration of cue A and object Y is never rewarded. Using this configural link between cue and choice object, therefore, is the only way to solve the conditional discrimination.

In the DRO task, however, there is no need for configural associations between cue and object to solve the task. DRO tasks differ from conditional discrimination tasks in having a reward outcome that is unique for each conditional cue. Therefore, when these unique reward outcomes are used in the same task to differentiate the conditions of the task then we can outline a description of the DRO task analogous to the conditional discrimination task described above. A DRO task would follow the pattern “in the presence of cue A choose object X and get reward type 1” and “in the presence of cue B choose object Y and get reward type 2”. This simple alteration in the reward outcomes means that the task can be solved without configural associations. Now cue A is only ever associated with reward type 1, as is choice object X. Therefore cue A and choice object X both form direct associations with reward type 1, whilst cue B and choice object Y both form direct associations with reward type 2. Therefore cue A would establish a reward expectation for reward type 1, which would then activate the association with choice object X, and the task can be solved through object–reward associations alone, and this increases the rate at which such a discrimination can be learnt [1].

DRO learning has been seen in a variety of animals including rats [1], [2], pigeons [3], [4], horses [5] and monkeys [6]. However, in humans, the enhanced rate of learning from differential reward outcomes (called the differential outcomes effect; DOE) has been reported in children [7], but has never been investigated in adults.

Increased rate of learning for DRO tasks is not the only difference between these tasks and conditional discrimination tasks. Recent evidence from monkeys has demonstrated that the way in which the frontal and inferior temporal cortices interact differs when animals learn a DRO task as compared to a conditional discrimination [6]. Crossed unilateral lesions of frontal cortex in one hemisphere and inferior temporal cortex in the opposite hemisphere prevent all unilateral communication between these two cortical areas. This combination of lesions prevents all learning of conditional discriminations in monkeys (e.g. [8], [9]). However, the same pattern of lesions has no effect on the performance of the DRO task [6]. In this respect, DRO learning resembles object–reward association learning, which is also independent of frontal interaction with the inferior temporal cortex [8], [9]. This supports the explanation put forward earlier that DRO tasks are solved using object–reward associations (which do not require the influence of frontal cortex) rather than configural associations (which do require the influence of frontal cortex) [6], [9], [10].

If conditional discriminations rely on configural learning whilst the learning of DRO tasks relies on object–reward association learning, then it should be possible to demonstrate that the two types of task are learnt differently in adult humans, even though there may be no demonstrable increase in the rate of learning through the DOE. The present experiment aimed to demonstrate that conditional tasks and DRO tasks in humans rely on configural learning and object–reward association learning, respectively. Two groups of subjects were taught the same conditional discrimination, one group with the same reward outcome in each condition and one group with different reward outcomes (delayed or immediate reward) for each condition of the task. They were taught a number of choice problems with the same conditional cues, and then in a later stage of the experiment half of the learnt choice cues were presented in the presence of a new cue. Once learnt the new cue was presented with the remaining choice objects. If a task is learnt using configural associations between cue and choice object then the subjects should require at least one experience of every cue–object configuration before being able to solve the task. In contrast if the task can be solved by object–reward association learning then once a new cue has been learnt it should be possible to generalise the reward expectation of that cue to the remaining choice objects, even though they have not been paired together before. Therefore in transferring the new cue to the learnt, but not previously paired, choice objects subjects learning the DRO task should perform better than subjects learning a conditional discrimination.

Section snippets

Subjects

There were 60 subjects split into two groups of 30 subjects each. There were an equal number or male and female subjects in each group and all subjects were undergraduate students at the University of Nottingham, ranging in age from 18 to 22 years of age, with normal or corrected to normal vision.

Apparatus

Subjects were tested using an automated apparatus, and their response was to an infrared touchscreen (28 cm × 22 cm). Stimuli were presented on the touchscreen as outlined below. Each stimulus measured

Results

The performance of the groups performing both the DRO task and the conditional discrimination tasks are shown in Fig. 2 for each stage of the experiment.

For learning in the first three stages of the experiment (leaning of six problems with the first cue pair in three stages) a two-way ANOVA of stages 1–3 of the experiment by task (conditional or DRO). This analysis was carried out on the average number of errors to criterion per choice problem (one choice problem in stage 1, two choice problems

Discussion

As outlined in the introduction, a DRO task in animals and human children shows a differential outcome effect [3], [5], [7] in that the unique reward outcome associated with each condition speeds up learning as compared to a conditional task in which there is an ambiguous reward outcome. In the current experiment it can be seen that there is no evidence of a differential outcome effect in adult humans in that there is no difference in the speed of learning new choice problems for the DRO task

Acknowledgements

I would like to thank Chris Vincent for his assistance with the programming of the task.

References (11)

There are more references available in the full text version of this article.

Cited by (12)

View all citing articles on Scopus
View full text