Elsevier

Cognitive Systems Research

Volume 52, December 2018, Pages 816-827
Cognitive Systems Research

Joint action with a virtual robotic vs. human agent

https://doi.org/10.1016/j.cogsys.2018.09.017Get rights and content

Abstract

Prior research has revealed that when performing joint action tasks with a human co-actor, we automatically form representations not only of our own action, but also of the action of the co-actor we are interacting with, creating an action discrimination problem. Studies suggest these processes are affected by the human/non-human nature of the agent the task is shared with. In two experiments (Experiments 1 and 2), we measured the Joint Simon Effect (JSE) as an index of action discrimination, using a virtual version of the joint go/no-go task in which the task was shared with a virtual robotic vs. human hand. Furthermore, both experiments tested whether the JSE was affected by sensorimotor experience during which the participant manipulated the virtual robotic hand via an exoskeleton (vs. passive observation of movements of the virtual robotic hand). Experiment 2 replicated Experiment 1, except that prior to the joint action task, participants were informed about the robotic vs. human nature of the two virtual hands (no such information was given in Experiment 1). Both experiments demonstrated a significant JSE, which did not differ between robotic and human partner. Analysis of the results further indicates that the JSE obtained in the robotic condition was not modified after manipulating the virtual robotic hand. These results suggest that the human vs. non-human appearance of the partner is not a determinant of joint action performance in virtual settings.

Introduction

There are numerous examples of situations in daily life where people have to perform joint actions with others in order to accomplish a task successfully (dancing, carrying heavy objects, etc.). Because machines, robots, and virtual agents are taking a growing place in professional or private areas, in order to serve various purposes, it is particularly important to develop a better understanding of the mechanisms underlying the way we share a task with non-human agents or devices (Broadbent, 2017). In fact, it is now well documented that cognitive processes involved in the perception of actions performed by our conspecifics differ from those involved in the perception of non-human agents such as robots (Press, 2011). This raises the question of whether we share a task with a virtual non-human agent, such as a robot, the same way as we share that task with a virtual human partner. Furthermore, since humans will be increasingly exposed (physically or virtually) to robots in the near future and thus potentially involved in more and more interactions with those agents, be they real or virtual, another crucial question investigated in the present work is whether experiences and interactions with virtual robotic agents can influence the way we represent their actions when jointly performing a task.

Joint actions refer to “any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment” (Sebanz, Bekkering, & Knoblich, 2006, p. 70). Over the last ten years, the mechanisms and determinants of joint action have been the focus of several cognitive psychology experiments (for reviews, see Dolk et al., 2014, Sebanz et al., 2006). Joint action studies have revealed that individuals form representations of their own response, but they also represent the response of the person they are interacting with, even when this is not necessary (Dolk et al., 2013, Hommel et al., 2009, Sebanz et al., 2003). This phenomenon can be assessed through the so called Joint Simon Effect (JSE) (Hommel et al., 2009, Sebanz et al., 2003).

The JSE has been demonstrated using a modified version of the standard Simon task adapted to a shared task situation (e.g., Hommel et al., 2009, Sebanz et al., 2003, Sebanz et al., 2006). In a classic example of the standard Simon task, two colored target stimuli (e.g., orange and blue) are associated with two response keys (left and right). The target stimuli are presented on the left or right side of a screen and the participant is required to respond to the identity of the stimulus, irrespective of its location. The classical Simon effect (Simon & Rudell, 1967) refers to a spatial compatibility effect such that participants are faster to respond when the stimulus and the response locations match (compatible trials) compared to when they do not match (incompatible trials), even if the stimulus location is not relevant for the task. For example, if the participant answers to the blue target with the left key, left key responses will be faster when the blue dot appears on the left side compared to when it appears on the right side. The Simon effect corresponds to the Reaction Time (RT) difference between incompatible and compatible trials. This compatibility effect is thought to arise because the spatial stimulus feature automatically activates the spatially corresponding response, which can (when they mismatch) interfere with the correct response.

A seminal study by Sebanz et al. (2003) used a go/no-go version of the Simon task which was performed alone or in the context of joint action. Participants responded to only one of the two stimuli (i.e. in a ‘go/no-go’ way). When they performed the task alone (i.e. individual go/no-go), no Simon effect was present, which confirms that the effect stems from the conflict between two alternative responses. However, when the task was realized with another individual responding to the alternative stimulus (i.e. joint go/no-go), the Simon effect was re-established. This Simon effect measured in the joint go/no-go task is called the JSE (Hommel et al., 2009, Sebanz et al., 2003, Sebanz et al., 2006).

The presence of the JSE in the ‘joint go/no-go’ situation brings about the following question: how can the stimulus location induce the activation of a conflicting response, given that only one response is available and that stimulus location is not relevant for the task? Several explanatory theories have been provided to explain the JSE. Among others, Dolk et al. (2014) proposed the referential coding account, which postulates that the JSE is due to response conflict induced by the presence of another (real or virtual) responding agent.

This referential coding hypothesis is rooted into the Theory of Event Coding (Hommel, 2009, Hommel et al., 2001), which is itself an extension of the ideomotor theory (Greenwald, 1970, James, 1890). According to the ideomotor theory, actions are represented by the code of their sensory consequences (Hommel, 2009, James, 1890, Shin et al., 2010). In line with this, the Theory of Event Coding suggests that action representation consists in a network of codes representing the characteristics of all the perceived effects of action (e.g., heard or felt location, the direction and the speediness of the action, the involved effector, concerned objects, etc.) (Hommel et al., 2001). Within this frame, action selection consists in the activation of codes of the desired effects (i.e. perceptual consequences of action). An important point of this theory is that our own actions, but also those from others are represented in the same way, i.e. through the codes of their sensory effects (Hommel et al., 2001, Prinz, 2005). In other words, when performing a joint task, representations concerning the sensory consequences of our own action, but also representations of the consequences of the partner’s action are activated (Dolk et al., 2013, Klempova and Liepelt, 2016). Representing the partner’s action would then create a conflict of the same type as the one created when we represent many possible responses. More precisely, the activation of different representations creates a discrimination problem, and in order to select the appropriate action, the agent has to select the representation of the desired effect among those that are activated. Most of the effects (effector, speed, action's sound) are shared between the two possible events (i.e. left or right response). The discrimination problem can be resolved by a focus on characteristics that discriminate the relevant action from the irrelevant one (Memelink & Hommel, 2013). In the case of the Simon task, such a characteristic is the spatial response location. Participants are therefore susceptible to include spatial features in response codes (i.e. to code their response as left or right), and consequently, every event sharing this attribute (as a stimulus on the left) can interact with response selection, creating the Simon effect (for a related account see also Dittrich et al., 2017, Dittrich et al., 2013). Finally, according to this account, the more the perceived events are similar, the greater the discrimination problem is, which therefore requires giving more weight to the discriminative dimension (Dolk et al., 2014). In contrast, a decrease of the similarity reduces the response discrimination problem, hence resulting in a decrease or absence of JSE. This in turn may explain why an absence of JSE has been found when the task was shared with a non-human agent, such as a wooden hand (Tsai & Brass, 2007). A recent formulation of the referential coding account further outlined the role of the social context in shaping response codes (Prinz, 2015). A critical claim is that because the task is embedded in a social context, agent-related features have a prominent role, such that response codes include not only the agents’ actions but also the agents who respond. Accordingly, any feature in which the agents differ from each other is included in their respective response code, as these distinctive features can help solving the discrimination problem, hence reducing the JSE. Conversely, the more the agents are similar, the greater weight will be given to other distinctive features included in the response codes, such as spatial features, hence favoring the JSE. Importantly, these agent-related features go beyond the physical domain, including features such as interpersonal closeness (e.g. a friend vs. stranger), likeability (friendly vs. unfriendly). This integrative account can explain how social factors decreasing the distance between self and others can increase the JSE (e.g., Colzato et al., 2012, Quintard et al., 2018). It also has important implications for the understanding of joint task performance with a non-human agent.

Both for theoretical reasons and through potential implications regarding interactions between human and non-biological agents, studies have investigated the impact of the humanness of the partner on joint action performance.

The first study dealing with this issue was reported by Tsai and Brass (2007). They used a virtual version of the joint go/no-go task, in which participants controlled, via a button press, a virtual human hand that was displayed on the right side of a screen. This virtual human hand pressed a button whenever the participant pressed the response button. On the left side of the screen, another virtual hand was displayed. This hand, materializing the co-actor, was controlled by the computer program and was either a human or wooden hand. A significant JSE was obtained when the task shared with the human hand but not when it was shared with the wooden hand. Importantly, in this virtual version of the joint go/no-go task, the human and the wooden hand displayed on the screen performed the same action. Thus, this finding indicates that the human vs. non-human visual appearance of the partner is a crucial determinant of JSE.

Stenzel et al. (2012) investigated the influence of participants’ beliefs on the JSE measured in a task shared with a real humanoid robot. When participants believed that they were interacting with a robot whose behavior was biologically inspired, a JSE was present. In contrast, when the robot’s behavior was described as functioning in a purely deterministic manner, the JSE was absent. Both conditions were perceptually identical and differed only in participants’ beliefs about the functional principle of the robot. Therefore, it is not only the surface characteristics of the co-actor (human vs. non-human appearance) which could be decisive for the occurrence of the JSE (Tsai & Brass, 2007), but also the perceived intentionality of this agent. In the same vein, Müller et al. (2011) tested the presence of a JSE with a co-actor materialized by a wooden vs. human hand in the virtual version of the joint go/nogo task (Tsai & Brass, 2007). Moreover, prior to the task, participants watched either a video of ‘Pinocchio’ or a video about humans. When participants watched the ‘human video’ prior to the task, the results replicated those of Tsai and Brass (2007): a JSE was present when sharing the task with a virtual human hand, but no significant JSE was obtained with a virtual wooden hand. However, when participants watched a video of 'Pinocchio' prior to the task, the JSE was present in the wooden-hand condition. Thus, these findings fit well with recent accounts of the JSE (see previous section), suggesting that not only response features, but also both physical and social features of the partners are important determinant of the JSE (Prinz, 2015). In line with this, recent work suggests that whether a robot is perceived as a “social entity” (determined by its appearance, motion or observer's beliefs) is a crucial determinant of human-robot interactions (Shen, Kose-Bagci, Saunders, & Dautenhahn, 2011).

Recently, conflicting results have been reported by Stenzel and Liepelt (2016). In a series of experiments, using the virtual version of the joint go/nogo task, a significant JSE was obtained and its size was not different between human and non-human partners. Especially, a JSE was obtained with a co-actor materialized by a wooden hand. This finding thus conflicts with the absence of JSE with a wooden hand reported by Tsai and Brass (2007).

The reason for such discrepancies is unclear. However, co-representation of action is a highly flexible process that can be modulated by various factors. In light of the results reviewed above, and considering recent theoretical views of the JSE (Dolk et al., 2014, Prinz, 2015), whether or not a JSE is present when co-acting with a non-human agent seems to result from a complex interplay between physical cues defining the (non)humanness of the agent as well as participants beliefs, attitudes, and knowledge about this agent (Müller et al., 2011, Prinz, 2015, Stenzel et al., 2012).

The aim of this work was to extend our knowledge about joint action with a human vs. a non-human virtual agent. Our first goal was to further investigate whether the human vs. non-human appearance of the (virtual) partner can influence the JSE. More specifically, we used a virtual version of the joint go/no-go task (Stenzel and Liepelt, 2016, Tsai and Brass, 2007), in which the task was shared either with a human or a robotic hand. Recently, Stenzel and Liepelt (2016, Experiment 4) demonstrated a significant JSE with a wooden hand. It is unclear whether this finding can be extended to another kind of non-human virtual agent such as a robotic hand. Indeed, people are increasingly exposed to robots (in real life or through media) and acquire knowledge or beliefs about these agents that may influence social-cognitive processes involved in task sharing (Prinz, 2015). For instance, if people consider robots as non-autonomous agents or machines, this would reduce the JSE, or conversely, if they consider them as autonomous, intentional agents, this would favor the JSE. The results obtained with a virtual robotic hand may thus differ from that previously reported by Stenzel and Liepelt (2016) with a virtual wooden hand. Furthermore, whether the size of the JSE with a robotic partner would be the same as with a human partner remains uncertain (Tsai and Brass, 2007, Tsai et al., 2011, Tsai et al., 2008).

In addition, we explored the influence of experiencing a similarity between one’s own action and that of the virtual robotic agent on subsequent joint task performance. Most relevant for the present study, it has been proposed that the detection of equivalence between the observed actions and the actions we produce ourselves could allow us to attribute to others the characteristics that underlie our own actions (Meltzoff, 2005, Meltzoff, 2007a). As suggested by the ‘Like Me’ hypothesis – an influential theory of social-cognitive development (Meltzoff, 2005, Meltzoff, 2007b, Meltzoff and Moore, 1995) – young children’s ability to imitate allows them to understand the similarity between their actions and those of others. It is suggested that through understanding that my actions are driven by mental states, I can understand from others’ actions they are also like me in having mental states. From this perspective, it is therefore through the similarity between actions of self and others that we can perceive others as intentional agents with mental states. Moreover, it is also interesting to note that studies in adults highlight the positive effects (such as increase in perceived interpersonal closeness) triggered by the perception of actions that match our own actions (Hale & Hamilton, 2016).

On the basis of this evidence, we tested whether the JSE was affected by prior performance of a procedure in which the participant controlled the virtual robotic hand via an exoskeleton. The control of the robotic hand via the exoskeleton corresponded to a sensorimotor experience in which the consequence of the movement produced by the participant was the visual perception of a similar movement of the robotic hand. We suggest that experiencing this correspondence between one’s own movement and that of the robotic hand should increase the perceived similarity with this virtual robotic agent (Meltzoff, 2005, Meltzoff, 2007a, Press, 2011). According to the referential coding approach (Dolk et al., 2014, Prinz, 2015), a closer similarity between participant and virtual robotic partner should increase the response conflict in subsequent joint task performance, leading to a larger JSE. This type of visuo-motor experience was contrasted with a purely visual experience of the robotic hand actions.

Section snippets

Participants

Fifty1 students from the University of Poitiers took part in this experiment, in exchange for course credit. The participants (15 males), ranged in age from 18 to 29 years (Mage = 20.1 years, SDage = 2.7 years) were right-handed. All

Participants

Sixty-eight new right-handed participants were recruited (17 males). Participants were randomly assigned to the Active (34 participants, 7 males) or Passive group (34 participants, 10 males). Due to a technical problem with the exoskeleton, one participant from the Active group was excluded. The remaining participants ranged in age from 18 to 23 years (Mage = 19.8 years, SDage = 1.48 years). All had normal or corrected to normal vision, and were naive with respect to the purpose of the

General discussion

The first aim of this study was to investigate whether the JSE with a virtual robotic partner differed from that with a virtual human partner. Our second aim was to test the influence of two types of experience on the JSE. In the Passive condition, participants observed movements of the virtual robotic hand. In the Active condition, participants experienced the control of the virtual robotic hand via an exoskeleton.

In both experiments, we found a reliable JSE, irrespective of the nature of the

Compliance with ethical standards

Conflict of interest: The authors declare that they have no conflict of interest.

Ethical approval: All procedures performed were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent: Informed consent was obtained from all individual participants included in the study.

Acknowledgements

We would like to thank Yves Almecija for his help in the construction of the stimulus materials.

References (42)

  • K. Dautenhahn

    Socially intelligent robots: Dimensions of human robot interaction

    Philosophical Transactions of the Royal Society B: Biological Sciences

    (2007)
  • K. Dittrich et al.

    The joint flanker effect and the joint Simon effect: On the comparability of processes underlying joint compatibility effects

    Quarterly Journal of Experimental Psychology

    (2017)
  • K. Dittrich et al.

    Keys and seats: Spatial response coding underlying the joint spatial compatibility effect

    Attention, Perception, and Psychophysics

    (2013)
  • T. Dolk et al.

    The joint Simon effect: A review and theoretical integration

    Frontiers in Psychology

    (2014)
  • T. Dolk et al.

    The (not so) social Simon effect: A referential coding account

    Journal of Experimental Psychology: Human Perception and Performance

    (2013)
  • A.G. Greenwald

    Sensory feedback mechanisms in performance control: With special reference to the ideo-motor mechanism

    Psychological Review

    (1970)
  • J. Hale et al.

    Cognitive mechanisms for responding to mimicry from others

    Neuroscience and Biobehavioral Reviews

    (2016)
  • B. Hommel

    Action control according to TEC (theory of event coding)

    Psychological Research

    (2009)
  • B. Hommel et al.

    How social are task representations?

    Psychological Science

    (2009)
  • B. Hommel et al.

    The Theory of Event Coding (TEC): A framework for perception and action planning

    Journal of Behavioral and Brain Science

    (2001)
  • W. James

    The principles of psychology

    (1890)
  • View full text