Research reportDifferential reward outcome learning in adult humans
Introduction
Differential reward outcome (DRO) tasks have been used for over 30 years to examine the way in which unique reward outcomes affect the way in which subjects learn about different conditions of a task. The original study with rats [1] demonstrated that introducing differential rewards in a conditional discrimination task facilitated learning. For a traditional conditional discrimination experiment a conditional cue dictates which of two choice objects is rewarded on a given trial. However, the reward for a correct response is the same no matter which conditional cue has been presented, and therefore no matter which choice object has been responded to. Because of this single reward outcome, there can be no direct association learnt between the conditional cue and the reward outcome, or between the choice object and the reward outcome, as there is an equal likelihood of the reward occurring in the presence of both cues and both choice objects. For example, consider the conditional discrimination of the nature “in the presence of cue A choose object X” and “in the presence of cue B choose object Y”. In this task neither A nor B predict reward themselves, but instead only predict a reward outcome if followed by an appropriate choice of object X or Y, respectively. Similarly, both objects X and Y are equally associated with the reward outcome, but are only rewarded if preceded by the appropriate cue object. Therefore, the task can only be solved by forming configural associations between the cue and choice objects. For example, the configuration of cue A and object X is always rewarded, whilst the configuration of cue A and object Y is never rewarded. Using this configural link between cue and choice object, therefore, is the only way to solve the conditional discrimination.
In the DRO task, however, there is no need for configural associations between cue and object to solve the task. DRO tasks differ from conditional discrimination tasks in having a reward outcome that is unique for each conditional cue. Therefore, when these unique reward outcomes are used in the same task to differentiate the conditions of the task then we can outline a description of the DRO task analogous to the conditional discrimination task described above. A DRO task would follow the pattern “in the presence of cue A choose object X and get reward type 1” and “in the presence of cue B choose object Y and get reward type 2”. This simple alteration in the reward outcomes means that the task can be solved without configural associations. Now cue A is only ever associated with reward type 1, as is choice object X. Therefore cue A and choice object X both form direct associations with reward type 1, whilst cue B and choice object Y both form direct associations with reward type 2. Therefore cue A would establish a reward expectation for reward type 1, which would then activate the association with choice object X, and the task can be solved through object–reward associations alone, and this increases the rate at which such a discrimination can be learnt [1].
DRO learning has been seen in a variety of animals including rats [1], [2], pigeons [3], [4], horses [5] and monkeys [6]. However, in humans, the enhanced rate of learning from differential reward outcomes (called the differential outcomes effect; DOE) has been reported in children [7], but has never been investigated in adults.
Increased rate of learning for DRO tasks is not the only difference between these tasks and conditional discrimination tasks. Recent evidence from monkeys has demonstrated that the way in which the frontal and inferior temporal cortices interact differs when animals learn a DRO task as compared to a conditional discrimination [6]. Crossed unilateral lesions of frontal cortex in one hemisphere and inferior temporal cortex in the opposite hemisphere prevent all unilateral communication between these two cortical areas. This combination of lesions prevents all learning of conditional discriminations in monkeys (e.g. [8], [9]). However, the same pattern of lesions has no effect on the performance of the DRO task [6]. In this respect, DRO learning resembles object–reward association learning, which is also independent of frontal interaction with the inferior temporal cortex [8], [9]. This supports the explanation put forward earlier that DRO tasks are solved using object–reward associations (which do not require the influence of frontal cortex) rather than configural associations (which do require the influence of frontal cortex) [6], [9], [10].
If conditional discriminations rely on configural learning whilst the learning of DRO tasks relies on object–reward association learning, then it should be possible to demonstrate that the two types of task are learnt differently in adult humans, even though there may be no demonstrable increase in the rate of learning through the DOE. The present experiment aimed to demonstrate that conditional tasks and DRO tasks in humans rely on configural learning and object–reward association learning, respectively. Two groups of subjects were taught the same conditional discrimination, one group with the same reward outcome in each condition and one group with different reward outcomes (delayed or immediate reward) for each condition of the task. They were taught a number of choice problems with the same conditional cues, and then in a later stage of the experiment half of the learnt choice cues were presented in the presence of a new cue. Once learnt the new cue was presented with the remaining choice objects. If a task is learnt using configural associations between cue and choice object then the subjects should require at least one experience of every cue–object configuration before being able to solve the task. In contrast if the task can be solved by object–reward association learning then once a new cue has been learnt it should be possible to generalise the reward expectation of that cue to the remaining choice objects, even though they have not been paired together before. Therefore in transferring the new cue to the learnt, but not previously paired, choice objects subjects learning the DRO task should perform better than subjects learning a conditional discrimination.
Section snippets
Subjects
There were 60 subjects split into two groups of 30 subjects each. There were an equal number or male and female subjects in each group and all subjects were undergraduate students at the University of Nottingham, ranging in age from 18 to 22 years of age, with normal or corrected to normal vision.
Apparatus
Subjects were tested using an automated apparatus, and their response was to an infrared touchscreen (28 cm × 22 cm). Stimuli were presented on the touchscreen as outlined below. Each stimulus measured
Results
The performance of the groups performing both the DRO task and the conditional discrimination tasks are shown in Fig. 2 for each stage of the experiment.
For learning in the first three stages of the experiment (leaning of six problems with the first cue pair in three stages) a two-way ANOVA of stages 1–3 of the experiment by task (conditional or DRO). This analysis was carried out on the average number of errors to criterion per choice problem (one choice problem in stage 1, two choice problems
Discussion
As outlined in the introduction, a DRO task in animals and human children shows a differential outcome effect [3], [5], [7] in that the unique reward outcome associated with each condition speeds up learning as compared to a conditional task in which there is an ambiguous reward outcome. In the current experiment it can be seen that there is no evidence of a differential outcome effect in adult humans in that there is no difference in the speed of learning new choice problems for the DRO task
Acknowledgements
I would like to thank Chris Vincent for his assistance with the programming of the task.
References (11)
Are expectancies based upon different positive reinforcing events discriminably different?
Learn. Motiv.
(1970)- et al.
Insights into the nature of fronto-temporal interactions from a biconditional discrimination task in the monkey
Behav. Brain Res.
(2002) - et al.
The differential outcome effect as a useful tool to improve conditional discrimination learning in children
Learn. Motiv.
(2001) - et al.
Memory after frontal-temporal disconnection in monkeys: conditional and nonconditional tasks
Neuropsychologia
(1998) - et al.
Delay of reinforcement in instrumental discrimination learning in rats
J. Comp. Physiol. Psychol.
(1972)
Cited by (12)
Minimizing sleep deprivation effects in healthy adults by differential outcomes
2012, Acta PsychologicaDifferential outcomes aid the formation of categorical relationships between stimuli
2011, Behavioural Brain ResearchEnhancing recognition memory in adults through differential outcomes
2011, Acta PsychologicaBingo as a Novel Approach to Skill Building in the Initial Months of Surgical Internship: A Pilot Implementation
2024, Journal of Advances in Medical Education and Professionalism