1 Introduction

The research area of social robotics is developing rapidly as social robots are employed for diverse tasks. Next to social robots that assist people in medical situations [13], social robots can also help improve the user’s spatial cognition through pointing gestures [47], improve the imitation skills of children with autism [811], or greet and communicate with people [1215]. Therefore, we argue that one of the most important current challenges in research on social robotics is to develop social robots that can effectively influence human behavior or attitudes. For this, research needs to develop a thorough understanding of how social robots can effectively influence their users (e.g., generating standards for influencing strategies and levels of influencing power). Crucial in their interactions with humans is that social robots are very often created to influence the human to some extent. For example, social robots are developed for helping a person walk better (i.e. actual behavioural change), signalling a person about an impending danger (i.e. attitudinal change), or helping a person learn better (i.e. change in cognitive processing). Recent research started to investigate the fundamentals of persuasive robotics [1620]. For example, [17, 20] suggested that social robots can be very effective persuaders when they employ the persuasive strategy of giving feedback, and especially when giving negative, social feedback (e.g., expressing social disapproval by saying “Your energy consumption is terrible” while showing a frowning facial expression [20]). Just as humans, social robots might make use of a variety of different types of social persuasive strategies [21].

Recently, studies have started investigating the persuasiveness of social robots that employ persuasive strategies. However, these studies were limited to investigating the effectiveness of persuasive robots that use one type of persuasive strategy. For example, results by [22] showed that participants complied with the suggestions of a social robot more when it used one type of cue (nonverbal cues) than when the robot did not use these cues.

In the current research, we will investigate the crucial, basic question of whether social robots become more persuasive when they combine multiple types of persuasive strategies. The current manuscript builds on our earlier work [23] by incorporating the literature more thoroughly, and by extending theoretical discussions of the findings. That is, we investigate whether a social robot that uses multiple types of persuasive strategy becomes more persuasive than a social robot that uses only one type of persuasive strategy. This question is in accordance to the social agency theory [24] that suggested that (artificial) agents that combine multiple persuasive strategies become more persuasive. This theory proposed that social cues can activate a social conversation schema in users, and that this will cause people to behave as if they were in a conversation with another person [25]. This proposal is in line with earlier theories on interactions of humans with technology as for example the Media Equation hypothesis [26]. Accordingly, in general, one could argue that combining more persuasive strategies should lead to an increase in persuasion. Based on this reasoning, we investigated whether a robot that uses two persuasive strategies is more persuasive than a robot that uses only one.

Earlier research indicated that in human face-to-face communication and persuasion, two crucial persuasive strategies that increase persuasion are gazing ([27]; for a general overview see [28]) and gestures ([25]; for a general overview see [29, 30]). These persuasive strategies might be effective by steering attention of the persuadee, and by increasing the comprehensibility of the persuader [27, 31]. As the Media Equation hypothesis and research suggests [26], in many forms of human–computer interaction as well as human–robot interaction, people tend to respond to and interact with technology (e.g., robots or computers) as they would to another person. For example, when a computer addresses a person in a polite manner, the person is prone to treat the computer politely as well [26]. Likewise, people are quick to ascribe personality characteristics (e.g., aggressive, smart, friendly, male) to robots that show certain behavior (see e.g., [32]). Therefore, the importance of the use of gestures and gazing behavior for robots that attempt to persuade humans will also be very high. The current research investigated the individual and combined influence of gazing and gestures on a storytelling robot’s persuasiveness.

A robot’s gazing behavior (i.e. looking at a participants’ face) can influence its persuasiveness and various variables that are related to persuasion. Shinozawa et al. [33] demonstrated that a robot was more persuasive (i.e., influencing user’s decisions) when it moved its eyes or its head in the direction of the interaction partner than a robot that did not show the same behavior. A number of studies investigated the role of gazing on retention. Mutlu et al. [34] showed that participants who were gazed at by a robot storyteller remembered the story better as compared to participants who were gazed less by a robot. Such result indicate that a robot’s gazing behavior influences information retention by their human interaction partners. In a more recent study by Dijk et al. [5] only an effect of iconic co-speech gestures on message retention was observed, but not of gazing. One explanation is that ‘looking at another person’ is as important a gaze cue as looking at a person. This cue was absent in Dijk et al.’s study [5] (the robot looked away in an arbitrary direction), but it may have caused reduced attention in Mutlu’s study [34].

To achieve likeability, co-presence and spatial direction, it was suggested that gazing and the direction of the head movement were crucial [35]. The sense of being involved in a conversation with an artificial conversation partner was found to be dependent on the ability to see what the artificial conversation partner is looking at [36]. Previous work [37] has shown that a higher level of co-presence was achieved when participants can control the head movements and eye gazes of avatars in a virtual environment as compared to conditions where head movements and gaze are not possible.

In addition to gazing, a robot’s gesture can influence its persuasiveness (see e.g., [38]), and also various variables that are related to persuasion (see e.g., [39]). For example, a robot’s pointing gestures can improve a human observer’s comprehension of spatial information and have been found to be useful for an observer to identify objects [40, 41]. In [42], it was shown that gestures of nodding, clapping, hugging, expressing anger, walking, and flying can be understood by a human observer if these are performed by a robot or a human. In general, speech-related gestures can influence the evaluation and judgments about the speaker more positively as compared to gestures that are not speech-related or speech without gestures [5, 25].

It seems that gazing and gestures would significantly improve the persuasive power of a robot that delivers a persuasive message. However, although gazing and gestures support various aspects of human–robot communication (like memory, understanding and co-presence), their combined effects on persuasiveness were not tested directly. Therefore the core question of the current research is: will a robot that uses two persuasive strategies (i.e. both gazing and gestures) be more persuasive than a robot that uses only one persuasive strategy (i.e. either gazing or gestures)? The current research might (a) find evidence for the assumption of social agency theory [24] where it was proposed that persuasive strategies will have additive effects. In addition, the current research might (b) replicate earlier research [33] by providing evidence for the persuasive power of robot gazing, and (c) provide evidence for the persuasive power of gestures used by a robot.

To construct the robot’s gestures and gazing, we employed the methodology of earlier research [34], and videotaped a storyteller. This storyteller was asked to tell the same story to a third person as the one told by the robot in our experiment, and was asked to accompany telling that story with persuasive gestures. The robot was programmed to mimic the human storyteller’s gaze and gestures. In all four conditions of the experiment, we manipulated how the robot told the participant a persuasive story about lying. In half of the conditions, the robot showed gestures that accompanied this story, whereas in the other half of the conditions it did not. Moreover, in half the conditions, the robot told the persuasive story while part of the time gazing at the participant, whereas it did not gaze the participant in the other half of the conditions.

We let the robot tell a story in which the main character is lying with dramatic consequences. To measure persuasion we evaluated the participants’ ratings of the lying character. We expected that a story about lying from a storytelling robot will be more persuasive when that robot (i) employs human-like gazing behavior and (ii) employs human-like gestures. Moreover we expected that (iii) the effect of both gazing behavior and gestures will significantly increase persuasion compared to the effect of either cue alone.

2 Method

2.1 Participants and Study Design

Sixty-four participants (33 female and 31 male) participated in this experiment. Participants were students or researchers of the National University of Singapore or their friends or family. Their average age was between 13 and 32 years old (M \(=\) 22.50,  SD \(=\) 2.63). They were recruited by email, flyers and by means of social media networks. They received a compensation of $8 for their participation in the experiment. We employed a 2 (looking behavior: present vs. absent) \(\times \) 2 (gestures: present vs. absent) between-participants design.

2.2 Materials and Measures

We use as a persuasive story one of Aesop’s fables “The boy who cried wolf” [43]. The story is about a shepherd boy who out of fun tricks people in a nearby village that a wolf is attacking his flock of sheep. When the wolf indeed came, none of the villagers believed him and all his flock were eaten by the wolf. For us to develop the library of movements of the robot, we recruited a professional stage actor to tell the story with both gestures and gaze movements. The actor told the story while he sat on a high chair so as to keep him from moving about. The actor told the story to two listeners, identical to the experimental setup we used, in which two participants were present at the same time (see Figure 2). We employed this two-listener setup (both for recording the original motions, as in the experimental setup) because our actor used only very limited shifts of gazing when telling the current persuasive story to only one listener (in a test recording of the motions). We argue that the persuasive (moral) story as we used limited the gaze shifts when a story teller tells this story to only one listener. The two-listener setup demanded of the actor to shift his gaze more often.

The actors narration (gestures and gazing) was recorded by video. We then programed the actor’s movements to a robot (Nao, Aldebaran Robotics). The actor’s movements were categorized according to 21 different gestures and 8 different gaze behaviors (e.g., looking at listener 1 or 2, or at a point in-between the listeners). Gazing behavior of the robot was programmed such that in the gazing conditions, the robot looked at both participants an equal amount of time. This was done by making the robot shift its gaze at approximately (but not precisely) the same moments in the story in which also the actors moved his gaze. In the non-gazing conditions, the robot looked at a point right between the two participants, not moving its gaze.

By definition of gaze [36, 44], the robot selected has two LED lights as eyes and the robot is unable to show directional focus except for head movements to specific directions. However, it can be argued that the robot’s head movements can qualify as a gaze movement. To control for any misinterpretations of gazing, we included several questions to evaluate whether the participants interpreted the robotic gaze as we intended to. Figure 1 shows examples of storytelling as performed by a human and a robot. Some of the gestures included shouting (with both hands placed close to the mouth) and running (both arms were made to swing). The total script for the robot ran for less than 3 min.

Fig. 1
figure 1

Two examples (a and b) of gestures as performed by a storyteller and the robot

We asked the participants nine questions [45] in order for us to assess their attitude about lying. Participants would have to make judgments on a 7-point scale (\(-\)3 for negative, neutral, and \(+\)3 for positive). More specifically, the ratings are ranging from “bad” to “good”, “negative” to “positive”, “unfriendly” to “friendly”, “dislikable” to “likable”, “unpleasant” to “pleasant”, “not nice” to “nice”, “unagreeable” to “agreeable”, “wrong” to “right”, and “incorrect” to “correct”. The nine answers were averaged to construct a reliable measure of persuasion (alpha \(=\) .7). In all four conditions, the distribution of the measure was normal.

2.3 Procedure

On the day of the experiments, the experimenters met the participants at the lobby where the lab was situated. They were escorted to the meeting room where they were briefed about the experiment. The robot was then placed at a set location. Two participants were seated at the same time for each session (Fig. 2). In doing this, the robot could be shown to avoid its gaze from one participant and focus its attention to another participant. The participants were naïve that they were assigned to one of four experimental conditions, and two participants in one interaction session were always in the same condition. Each condition determined whether the robot provided gaze or gestural cues during the story telling.

Fig. 2
figure 2

Participants A and B listen to Nao’s story as a pair

After the robot had told the story, the participants answered the question to evaluate the extent on whether they were persuaded that lying is bad. The participants then completed the Godspeed questionnaire [46] to evaluate the robot. In summary, this questionnaire measures the users’ perception of robots according to anthropomorphism, animacy, likeability, perceived intelligence and perceived safety. Three additional questions about the robot and the story were asked: (“Did the robot ever look at you during the story?”, “Did you perceive the robot as male or female?”, “Did you understand the story?”).

3 Results

In the conditions in which the robot gazed at the participants, 75  % of them said that the robot gazed at them while the remaining 25  % did not. In the conditions that the robot did not gaze at the participants, 25  % of them said that the robot gazed at them. For these participants, we did not intend them to perceive that the robot is gazing at them. Thus, we excluded the data for these participants for further analyses. The remaining participants were 24 males and 24 females between the ages of 13 and 32 (M \(=\) 22.48,  SD \(=\) 2.51). Importantly, analyses that used all 64 participants showed completely the same pattern of results. We found no significant effects of the gender of the participants on persuasion or likeability, neither independently nor in interaction with our two manipulations (gazing and gestures). All the \(F\) values were less than 1.

3.1 Persuasion

In order to evaluate the amount of persuasion by the robot, we analyzed the participants’ evaluation of the story’s character who was lying, The mean of the answers to the nine evaluation questions about character who was lying in the story was analyzed using a 2 (gazing: absent vs. present) \(\times \) 2 (gestures: absent vs. present) ANOVA, in which both factors were manipulated between participants. Results showed that participants at whom the storytelling robot gazed evaluated the lying story character more negatively (M \(=\) 0.26,  SD \(=\) 0.59) than participants at whom the storytelling robot did not gaze (M \(=\) 0.61,  SD \(=\) 0.61),  \(F(1, 44) =\) 5.07,  \(p =\) 0.03. Thereby, confirming our first hypothesis, these results present evidence that participants at whom the storytelling robot gazed were persuaded more strongly by its persuasive message (which was that it is wrong to lie).

This analysis presented no evidence supporting our second hypothesis, that is, that when a participants had been told the persuasive message by a robot using gestures, this participant was not persuaded more or less (and evaluated the lying story character more negatively or more positively) than when a participants had been told the persuasive message by a robot not using gestures, \(F < 1\).

In line with our third hypothesis, the current results show an interaction between gazing and gestures, F(1, 44) \(=\) 11.82, p \(=\) 0.001. Table 1 presents an overview of the means and standard deviations in all four cells. That is, when the robot did not gaze at the participants, participants evaluated the lying story character more negatively when the robot had not been using gestures (M \(=\) 0.27, SD \(=\) 0.29) than when it had been using gestures (M \(=\) 0.95, SD \(=\) 0.66), F(1, 45) \(=\) 8.82, p \(=\) 0.01. On the other hand, when the robot gazed at the participants, they evaluated the lying person more positively (M \(=\) 0.45, SD \(=\) 0.59) when the robot did not use gestures than when it did use gestures (M \(=\) 0.07, SD \(=\) 0.56), F(1, 45) \(=\) 2.84, p \(=\) 0.10.

Table 1 Participant’s evaluation of the lying character in the story (the extent of persuasion) by gazing absent versus present and gestures absent versus used

In a final analysis, we investigate effects of our manipulations on participant’s general evaluation of the robot, that is, on the Godspeed questionnaire [46] (see Fig. 3).

Fig. 3
figure 3

Results of the Godspeed questionnaire for each condition. Error bars indicate standard error of the mean

First, we performed a reliability analysis on the items of each dimension of the Godspeed questionnaire. We found that Cronbach’s alpha was \(>\)0.68 (anthropomorphism: 0.678, animacy: 0.787, perceived likeability: 0.88, perceived intelligence: 0.681) except for Perceived Safety (0.245). After removal of the third item of the Perceived Safety dimension Cronbach’s alpha was 0.744. A multivariate ANOVA with the average of each dimension of the Godspeed questionnaire as dependent variable and gazing and gestures as independent factors revealed no significant effects of gestures and gazing (Gestures: F(5,56) \(=\) 1.355, p \(=\) 0.255; Gazing: F(5,56) \(=\) 1.64, p \(=\) 0.164; Gestures \(\times \) Gazing: F(5,56) \(=\) 1.353, p \(=\) 0.256). These findings suggested that our manipulations (of gazing and using gestures) had no effects on a participant’s evaluation of the robot, even though they did affect the robot’s persuasiveness.

4 Discussion

In the current research, we argued that, in line with the Media Equation hypothesis [26] and social agency theory [24], gazing and gestures are crucial persuasive strategies that will, additively, increase the persuasiveness of a robot comparable to the effects these strategies have in human-human persuasion. In the current research we investigated this assumption by measuring the (combined and separate) influence of the robot gazing at the participant, and the robot using persuasive gestures on a storytelling robot’s persuasiveness. That is, in our lab experiment a robot told each participant a persuasive (moral) story about the aversive consequences of lying. The robot used (or did not use) gestures and gazing (or did not gaze at the participant) while telling the participant its persuasive story. Results showed that only gazing independently led to lower evaluations of the lying character suggesting increased persuasiveness. This finding is in line with our first hypothesis and replicates findings of earlier research that investigated the persuasive effects gazing can have in communication between humans [27]. When the robot used persuasive gestures the lying character was only evaluated more negatively when the robot also used the persuasive strategy of gazing. When the robot did not use gazing, the lying character was evaluated more positively when it used persuasive gestures. So using gestures is more persuasive when the robot is looking at the listener, but it is less persuasive when it is looking at another person.

Thus, our second hypothesis (that the persuasiveness of the robot would become stronger when it used gestures while telling a persuasive message) is not supported. Earlier research investigating human–human interaction presented evidence that using gestures increases persuasiveness [31], and likewise, research in human–robot interaction [22] suggested that a robot’s persuasiveness can be increased when it uses non-verbal, bodily cues (a set of cues that included gestures). Related research suggested that robot gestures (and gaze) can increase message retention [34]. Now, our findings do not replicate those effects of gestures (on persuasiveness). Although we copied the timing and form of the gestures from the actor as faithfully as possible to the robot, it is still possible that the gestures are not properly recognized. Because of the morphological differences between a human body and the robot, robot motion fluidity and the form of the gestures may have been compromised. This might have had the consequence that the robot’s gestures were not recognized correctly by its human interaction partner such that the robot’s persuasiveness increased.

Furthermore, the current results showed an interaction of gestures and gazing. In line with our third hypothesis, we found that a social robot can be persuasive when it employs gestures, but only when it also gazes at the listener while telling its persuasive message. When the robot did not gaze at the listener, its persuasiveness decreased when it used gestures. Thus, a robot that uses persuasive gestures can be an effective persuader (just as a human who uses gestures can be an effective persuader), but it is necessary that the robot gazes at the human. Reasons for this might be found in human perceptions of the robot: when the robot does not look at its human interaction partner while telling a persuasive story using persuasive gestures, a person might get the impression that the robot was not addressing him or her, and experienced himself or herself not as the target person of the robots persuasive attempts. In other words, the participant may have refrained from activating relevant social conversation schemas sufficiently enough to be persuaded [24]. Indeed, this could also explain why gazing improves memory of a story in [34], but not in [5]. In the first study the robot could look to another person reducing the impression of being addressed, whereas in the second study no other person was present, so looking away has no effect on the impression of being addressed. We argue that looking at another person is a cue for signaling that the story is not meant for the person in question. Indeed, when a robot told a participant a persuasive message accompanied by persuasive gestures while not gazing at the participant, the robot might have activated the impression that it was trying to persuade another person—the other participant present in the experimental setup.

On the surface, this latter finding goes against the social agency theory [24] that would argue that when an artificial social agent employs more social cues it will activate more human social interaction schemata, and might become more persuasive. Extending and nuancing such theories, our research suggested that the addition of social cues might also lead to the activation of qualitatively different social interaction schemata (e.g., it must be trying to vigorously persuade that other person) instead of leading to a quantitative accumulation of separate social cues and their related social interaction schemata.

Interestingly, studies performed earlier [17] present evidence that an artificial agent, trying to persuade users, using either voice only (to utter persuasive messages vs. using changes of a lighting source to convey the same messages) or using embodiment only (using an actual robot embodiment as the source of that voice or lighting changes, vs. using a computer case as the source) both led to an improvement of the persuasive strength of that artificial agent. But at the same time, that earlier research also showed that an artificial agent that combined both embodiment and voice was not more persuasive than when using only one of these two persuasive strategies. So, this earlier study presented no evidence for additive effects of social cues for an artificial social agent’s persuasive power. These earlier findings might be explained using a core consideration of the Media Equation hypothesis: Based on that hypothesis one could argue that merely a single social cue (in this case voice only or embodiment only) emitted by an artificial agent suffices for triggering a complete set of social responses in the human user. Given the current findings, we argue that future research could investigate the separate roles of social cues (e.g., embodiment, voice, gazing or using gestures) and social persuasion strategies (using those social cues to persuade) and the way in which they might increase in persuasive power when combined with themselves or with each other and potentially increase the persuasiveness of robots and influence perceptions of robot agency and level of anthropomorphism.

In conclusion, the current research makes clear that employing one specific persuasive strategy (e.g., gesturing) in combination with another persuasive strategy (e.g., gazing) does not necessarily lead to a combination of each separate persuasive strategy’s persuasive power. In fact, it was found that using (a specific set of) persuasive gestures while refraining from gazing at the participant diminishes persuasiveness, whereas using (the same specific set of) persuasive gestures while gazing at the participant does not. In our analysis of this interaction we were able to compare the effects of the same specific gestures under different conditions of gazing, and therefore the specific characteristics of the specific set of persuasive gestures that we used are less relevant for that comparison. We argue that future research can address the persuasiveness of robotic gestures and the fit and alignment between robot gestures, user expectations and message contents. Also, future research might investigate the effectiveness for increasing the persuasiveness of social robots of using combinations of different social cues, in different task settings, different designs of social robots, and for different user target groups. We argue that the psychological characteristics of users are relatively stable (across different situations and different users), and that the current research presents evidence for a psychological mechanism (concerning the fit and misfit of social cues) that future research might also find also for other combinations of social cues, other users and other robots with which users interact.

Thereby, the current study added to our knowledge on how social robots can be made more persuasive. In this endeavor, future research should also investigate personalization of persuasion (see e.g., [47]). A persuasive robot might for example assess a user’s personality type, and personalize its own persuasive attempts to fit that person. Or the persuasive robot might measure the user’s physiological responses (e.g., related to arousal) to detect for example evidence of reactance [48] caused by its persuasive attempts. This will allow persuasive robotic systems to be easily scalable and fundamentally flexible as these robots are supposed to interact with different people with varying characteristics whose characteristics the robot might only discover during the interaction.

The current research also provides us with a persuasive story that can be successfully used by a robot to persuade its users, and future research might profit from that, and use such or comparable persuasive messages to study interactions between persuasive effects of other persuasive strategies. Certainly, different stories that have different characteristics will have different persuasive effects. Currently it is unclear what these characteristics would be and whether the results of the current research would also be found using other persuasive stories. We argue that future research specifically aimed at investigating the persuasive effects of gestures might use a persuasive story that less strongly persuasive, because that might allow the persuasive effects of gestures to be effective and detectable more easily.

5 Conclusion

Would a robot that combines several persuasive strategies (i.e. gestures and gazing in the current research) be a more effective persuader than a robot that employs a single persuasive strategy (e.g., gestures only or gazing only)? Results of the current study confirmed the idea that persuasive strategies can influence and change each other’s persuasive power, at least for artificial agents (a robot in this case). This influence might be comparable to the way in which persuasive strategies influence and change each other’s persuasive power in human–human persuasion [26]. Importantly, however, we found that the influence of these persuasive processes on each other is not purely additive. That is, results showed that a robot that used gestures when looking at another person became less persuasive. Interestingly, this phenomenon is also found in human–human interaction [31]. When the robot looked at the persuadee, this research replicated earlier studies that gazing behavior by a robot can have persuasive effects [37].

To conclude, we found that robots can become more persuasive when they look at the person to persuade. This effect is stronger when the robot uses gestures, but only when looking at the person to persuade. When the robot looks away to another person, gestures make the robot less persuasive. Our results show that combining multiple social cues does not automatically increase persuasiveness and that the results from human–human interaction do not automatically carry over to human–robot interaction, especially when multiple actors are involved. It seems safe to say that the effects of non-verbal social cues on persuasion of verbal communication are still far from being fully understood.