1 Introduction

The idea of human-like machines that free mankind from labor has been addressed extensively since decades in literature, movies and research. The transfer of human-like features to non-living objects like machines that accompanies this idea is referred to as anthropomorphism and has been widely adapted in different forms and contexts in human–robot interaction (HRI) [1]. The main advantage of this multifaceted design approach is the activation of human-human interaction schemes to form appropriate expectations [2], improve coordination strategies [3] or increase empathy [4] towards robotic interaction partners. Especially, current research in social HRI shows the constant trend that anthropomorphic design can facilitate the building of meaningful relationships to achieve a more fluent and socially situated interaction [5]. Even though the term robot seems to be intuitively bonded with an association of anthropomorphism, the question arises, whether this really is desirable in all working environments.

In work-related interactions, anthropomorphism of robots, implemented via appearance, communication or movement style and robot description [1], can lead to an underestimation of the functional character and perceived value of the robot for task fulfillment [6,7,8].

Moreover, facial features [9] or anthropomorphic personality descriptions [10] can reinforce existing gender stereotypes of different work domains.

In conclusion, anthropomorphism might not be universally beneficial, as it can elicit a violation of the formed expectations [11] as well as a promotion of gender stereotypes [9, 10] in specific domains.

1.1 Anthropomorphism in different contexts

The utility of anthropomorphism as a supporting design feature seems to depend on the context in which robots are used [6, 8]. Whereas positive effects emerge in social HRI, anthropomorphic features can mask the functionality of and decrease the trust in robots in task-related interactions [7, 8]. Those arbitrary effects of interaction context seem to mirror people’s preferences for robot design in specific domains.

One of the most essential triggers of anthropomorphism is the visual appearance of a robot [12]. As one of the first, Goetz et al. [13] investigated, if anthropomorphism determines for what jobs robots are preferred. The results showed that participants preferred more anthropomorphic robots for jobs that required more sociability, whereas more machine-like robots were favored for jobs with less sociability. Thus, this study indicates that the appearance of the robot should match the nature of the task. Złotowski et al. [14] investigated the relationship of anthropomorphism and specific tasks more detailed for the social and service domain. The result showed that a more anthropomorphic robot that was easily distinguishable from humans, was preferred for occupations where social skills played a key role (e.g., child-minding). The machine-like robot with unfamiliar, but not threatening appearance was perceived as optimal for “dull and dirty tasks” (e.g., cleaning).

The results of the mentioned studies are a first indicator that the preferred degree of anthropomorphism varies between application domains. Nonetheless, the results are hardly generalizable since the robotic appearance was manipulated on multiple dimensions like age and gender [13] or threatening and non-threatening [14]. Furthermore, the existing research focuses more on specific tasks in the social (and partially service) domain, than on the differences between global application domains like industrial, service and social HRI. Building on earlier research [13, 14] it is assumed that robot design should match the according sociability in the respective domain. Based on the exemplary discussed research we assume that robots with different levels of anthropomorphism are preferred in different domains.

H1::

In the industrial domain robots with a low level of anthropomorphism are preferred.

H2::

In the service domain robots with a medium level of anthropomorphism are preferred.

H3::

In the social domain robots with a high level of anthropomorphism are preferred.

In addition to explicit preference judgments, differences should also show up in more indirect measures. Socio-psychological findings suggest that faster responses occur when concepts are more strongly associated [15, 16]. Therefore, we assume faster responses for the associated level of anthropomorphism in the according domain.

H4::

In the industrial domain robots with a low level of anthropomorphism are chosen most quickly.

H5::

In the service domain robots with a medium level of anthropomorphism are chosen most quickly.

H6::

In the social domain robots with a high level of anthropomorphism are chosen most quickly.

Additionally, the preferred level of anthropomorphism might not be as explicitly associated in the service domain as in the industrial or social domain. Because this domain includes many diverse tasks ranging in their sociability [14], we assume longer response times in this domain compared to the others.

H7::

The overall response time is higher in the service domain compared to the social and industrial domain.

1.2 Gender stereotypes

The industrial and social domain in particular are associated with gender stereotypical occupational roles [17]. Basfirinci et al. [18], for example, used a scenario technique, where participants had to assign names to persons only described by their occupational role. Whereas, most occupations were implicitly perceived as masculine, the job of a nurse was perceived feminine. The results illustrated that implicit occupational stereotypes still exist to date. In order to prevent further task-specific consolidation as well as transfer of stereotypes to robots, anthropomorphic design choices should be made with caution [9]. Tay et al. [17] provide empirical support for this claim. The study revealed that a robot with female features was more accepted and positively evaluated in healthcare, while a robot with male features was preferred as security guard. Additionally, a study by Ladwig and Ferstl [19] showed that humanoid robots are implicitly assigned a gender. However, the transfer of human stereotypes to robots has not been found in all cases. Rea et al. [20], for example, did not find an effect of the robot’s gender on its perceived suitability for several stereotypically male or female tasks, though they note themselves that the robot’s gender manipulation changing only the pronoun may not have been sufficient. In line with that, other research which gendered robots only via name and voice did not find main effects of robot gender on the acceptance of the robot for stereotypically female and male tasks [21].

Overall, research on gender stereotypes in robots reports controversial findings [22]. It is therefore necessary to conduct further systematic research.

The industrial context includes mainly technical tasks traditionally associated with men rather than women [9, 23]. In the service context, less strong stereotypes may exist, but as Ladwig and Ferstl [19] point out, robots are generally given male names more often. In contrast and as different studies involving healthcare stereotypes suggest [17, 18], the predominantly associated gender in this area is female. Hence, we assume that the perception of robots as male or female is triggered by the associated gender stereotype prevalent for jobs in the respective domain.

H8::

In the industrial domain, male names are chosen more frequently for robots than female names.

H9::

In the service domain, male names are chosen more frequently for robots than female names.

H10::

In the social domain, female names are chosen more frequently for robots than male names.

As stereotypes are a sensitive issue that is associated with response biases like social desirability [15, 24], this study additionally assessed implicit associations at the end of the experiment via the implicit association test (IAT) to tap unintentional and uncontrolled responses [25]. Therefore, we investigated associations between the application fields and already female or male looking robots. The IAT is an instrument developed by Greenwald and colleagues [15] for measuring implicit attitudes. It is nowadays used in diverse scientific fields including HRI [26,27,28] to uncover underlying, implicit stereotypes and associations [15, 23]. Though the IAT has been criticized for poor psychometric properties [16, 29, 30], it is still accepted as a valid measure for the relative association strength between constructs [30, 31]. Based on the expected association of gender and context, it is assumed that the association strength is higher for stereotypically matched domains and robot appearances.

H11::

A strong association between the industrial context and male robots exists.

H12::

A strong association between the social context and female robots exists.

Overall, the study investigated the influence of the application field (industrial vs. service vs. social) on the preference for the degree of anthropomorphism and the ascribed gender of a robot, in order to allow a domain-overlapping comparison of robot preferences and existing gender stereotypes.

2 Method

2.1 Participants

Due to the pandemic-related constraints for laboratory research, the study was performed as an online study. Based on an a priori power analysis using GPower [32] with a power (1 − β) set at 0.90 and α = 0.05 the targeted sample size was 117. Of all 169 participants who started the study, 36 dropped out before finishing the study. This drop-out rate of 21.3% is comparable to other online studies in German-speaking regions [33]. From a total of 133 subjects participating in the study, six participants had to be excluded because of invalid scores in the IAT. Furthermore, eight subjects were excluded after a visual examination of response time outliers, resulting in a final sample of 119 participants. The sample consisted of 73 participants that were recruited via the local university participant pool and received course credit, as well as 46 participants that were recruited from the platform prolific, receiving a small monetary compensation for participation which was calculated on the basis of the German minimum wage (3,36 € for 20 minutes). The same inclusion criteria regarding age, ranging from 18 to 50, and German nationality as well as first language were used for both recruitment approaches. Besides those criteria no restrictions were made for participation. As a result the sample represents various domains to allow more extensive insights with regard to the expectations of general users in terms of preferences and public stereotypes.

For both sampling strategies participants did not differ in terms of age (MPool = 26.62, SDPool = 4.62, MProlific = 28.61, SDProlific = 7.54, p = 0.112), or gender (Pool 59% female, Prolific 43% female, = 0.146) nor with regard to the control variable tendency to anthropomorphize (MPool = 43.29, SDPool = 11.62, MProlific  = 42.15, SDProlific = 11.76, p = 0.608). Taken together, participants of both recruitment strategies were on average 27.38 years old (SD = 5.97) and 53% of them identified themselves as female. Moreover, participants were asked about their profession and how they would classify their own professional background. Most of the participants were students (57.98%) or employees (32.77%). The sample included people with backgrounds in the industrial domain (n = 36), in the service domain (n = 20), in the social domain (n = 36) and in other domains (n = 27).

2.2 Task and materials

2.2.1 Context descriptions

The context descriptions were generic textual representations of context-specific joint human–robot interactions (Table 1). The description of the industrial field of application included a robotic assistance that supported workers with assembling products, moving objects from one workstation to another and placing parts in designated areas. In the service context the robotic system delivered goods to a respective destination, sorted parcels into designated areas, cleaned work surfaces and supported employees in potential customer care. The social context described a robot that supported workers in the care of other people on an organizational, social and emotional level and could be used for social interactions such as sport exercises.

Table 1 Translated context descriptions for the industrial, service, and social domain [originally presented in German, accessible via https://osf.io/6zq9e/ (OSF)]

2.2.2 Robot stimuli

The ABOT (Anthropomorphic roBOT) Database was used to select robots with different degrees of anthropomorphism. This database contains over 250 standardized images of existing robots with differently anthropomorphic features with every robot having a score ranging from zero to 100 to indicate the degree of the robot’s anthropomorphism [34]. This overall score contains four dimensions of robot appearance features that were identified with a principal component analysis: the surface features (e.g., skin, gender, hair, eyebrows), facial features (head, face, eyes, mouth), body-manipulators (e.g., legs, arms, torso) and mechanical locomotion (treads, wheels). Following the approach of previous research [13, 35, 36], three different degrees of anthropomorphism represented by the overall score were considered for the study (low, medium, high). For every anthropomorphism degree three robots were chosen to minimize carryover effects within each domain, as each context description was presented three times. Apart from differences in perceived anthropomorphism, all robots had similar color schemes, similar abilities based on their appearance and no obvious gender cues like hairstyle [37] or body proportion [38].

See Fig. 1 for examples of low, medium and high anthropomorphic robots. The scores within each category were comparable, whereas the scores between the low (M = 9.14, SD = 0.56), medium (M = 23.06, SD = 0.54) and high (M = 49.2, SD=1.82) level of anthropomorphism differed substantially. It was a deliberate decision to not select robots with extremely high perceived anthropomorphism, because on the one hand, these often already have an assigned gender or at least gender specific cues (like long hair or wearing a dress). On the other hand, robots that are too anthropomorphic might in general be perceived negatively and generate a feeling of uncanniness [39].

Fig. 1
figure 1

Examples of the robots used in this study with low (row 1), medium (row 2), and high (row 3) degrees of anthropomorphism

2.2.3 Implicit association test

The IAT is a computer-based discrimination task, in which subjects are asked to classify individual stimuli representing concepts or attributes as quickly as possible into four different categories by pressing two possible answer keys [16]. For the four categories suitable stimuli that are easily categorizable have to be selected [16]. Typically, IAT stimuli [15] are represented by words, but images or symbols can be used as well [16]. Because gender categories with regard to robots are difficult to realize verbally, the stimuli were implemented using images of robots with typically male and female associated features. Categories in the IAT are usually represented by eight stimuli each [15]. Therefore, eight images of male and female looking robots as well as eight images of an industrial and a social context were selected. The robot stimuli were mostly derived from the ABOT database while the context stimuli were extracted from free stock image databanks (see Fig. 2 for examples of the robot and context stimuli). A pre-test was conducted to find the most suitable robot stimuli. Eighteen participants (12 female) with a mean age of 34.44 (SD = 15.35) years, evaluated 20 stimuli with regard to the perceived gender of the robot on a scale from zero (male) to 100 (female). The mean scores for every robot were calculated and respectively the eight most male (scores between 4.9 and 30.3) and most female (scores between 66.2 and 99.4) looking robots were selected. The final stimuli can be accessed at the Open Science Framework (OSF) via https://osf.io/6zq9e/.

Fig. 2
figure 2

Components of the IAT stimuli with a robot rated as male, a robot rated as female, a context picture of the industrial and social domain (left to right)

For the analysis of the IAT the improved D-Score was calculated according to Greenwald et al. [43]. This score consists of the average response time difference between the combined stages in the IAT, thus the stage where “social” and “female robot” share an answer key and “industry” and “male robot” share the other answer key as well as the stage where this pairing is reversed (social + male robot; industry + female robot) divided by the standard deviation of the respondent’s response times in both combined stages [16]. For the exact procedure see Greenwald et al. [43].

2.3 Dependent measures

2.3.1 Control measures

Though all robots had a specific score from the ABOT database, qualifying them as stimuli with low, medium or high anthropomorphism, it was still necessary to verify that the participants did perceive the differences in anthropomorphic robot design. A single item was therefore used as a manipulation check to assess the perceived anthropomorphism for each robot. The nine robots were displayed in a randomized order and participants had to indicate the human-likeness of each robot on a scale ranging from 0 (not at all human-like) to 100 (completely human-like). The scale was chosen to enable a comparison with the ABOT score which ranges from zero to 100, too.

Furthermore, to prevent confounding effects that influence the participants responses, the individual tendency to anthropomorphize was measured. Research has shown that the tendency to anthropomorphize non-human entities is not universal [40, 41]. To assess stable individual differences in this tendency, the individual differences in anthropomorphism questionnaire (IDAQ) by Waytz et al. [41] was used in the study.

2.3.2 Preferred degree of anthropomorphism

The main dependent variable to assess the preference for differently anthropomorphic robots was the frequency with which the different degrees of anthropomorphism were chosen with regard to each context. In addition to the frequencies of the chosen robots, the response latency (in milliseconds) of every selection was measured.

2.3.3 Gender attribution: naming frequencies

In order to examine gender associations in the application contexts, a naming technique was used that was derived from previous research [18, 19]. After the selection of a robot in a specific context, the participants were asked to give the robot a name. This open format was used in order to not impose answer options on the participants. Further, it opened up the possibility for the participants to not just give traditional male or female names but any name they could imagine, like neutral or technical ones, which is a tendency that has been observed by Keay [42] in the naming of robots for robot competitions. For the analysis, the names had to be coded into categories. For this purpose, the categories employed in earlier research [41] were modified and also applied here. The used categories are female, male, nickname (including names of unknown gender, popular robot names, typical animal names) and functional (including technical and mechanical qualities). Three raters coded the names in the different categories independent of each other in a first rating round. In a second round, the raters discussed and resolved ambiguities together. All three raters were associated with the department and two of the raters are authors of the paper (E. Roesler & L. Naendrup-Poell). The inter-rater reliability of the coded names after the first iteration was κFleiss = 0.74. After discussing the diverging categories, in almost all cases an agreement was reached, resulting in a nearly perfect inter-rater reliability (κFleiss = 0.96). In cases where no absolute agreement could be achieved, the category that was chosen by two third of the raters was selected.

2.3.4 Gender attribution: implicit association

In this work the automatic semantic association of the concepts “industry” and “social” with the concepts “male robot” and “female robot” was investigated. For the analysis, the so-called improved D-score was calculated. The D-score represents an index of the relative strength of association and consists of the response time differences between the expected association of congruent and incongruent category pairings [43]. It is assumed that response times are faster when two strongly associated concepts share an answer key (congruent pairing: industry and male robot or social and female robot) compared to less strongly associated concepts (incongruent pairing: industry and female robot or social and male robot) [15].

2.4 Study design and procedure

The study was conducted as an online survey using the platform SoSci Survey. Participants completed the study on their private computers without the presence of the experimenter.

First, participants were informed about the general terms and conditions. Afterwards, the procedure of the study was presented, and they were instructed that all the robots shown in this study are equipped with the same functional capabilities.

Subsequently, participants read one of three different context descriptions, whereby the presented order of the descriptions was randomly assigned to every subject. The descriptions of the industrial, service and social domain represented the levels of the factor “application field”. Since every participant read every domain description, the study consisted of a one factorial within-subjects design. After reading a context description, participants were asked to decide which robot they would prefer in this context based on three depicted robots. The displayed robots varied in their degree of anthropomorphism with three different levels: low, medium and high anthropomorphism (Fig. 1). After selecting a robot, subjects were asked to provide a name for the robot. This procedure was repeated in total nine times–three times for each application domain.

Thereafter the implicit association test was conducted. Participants were instructed that they had to do a categorization task and were then presented with the standardized instruction of the IAT [15, 16]. After the IAT, participants had to indicate how anthropomorphic they perceived the nine robots they had seen before as a manipulation check. Then the IDAQ [41] with fifteen items rated on an eleven-point scale from zero “not at all” to ten “very much” and several socio-demographic questions were filled in. The entire experiment lasted 15–20 min.

3 Results

3.1 Control measures

As intended, the low anthropomorphic robots were also rated least anthropomorphic (M = 18.07, SD = 18.79), medium anthropomorphic robots more anthropomorphic (M = 44.17, SD = 22.94) and high anthropomorphic robots most anthropomorphic (M = 79.06, SD = 17.51). A robust one-way repeated measures ANOVA and pairwise post-hoc tests based on trimmed means revealed that all differences were significant, F(1.96,139.13) = 667.34, p < 0.001. To control for individual differences in the tendency to anthropomorphize, the IDAQ was assessed and an individual sum score for every participant was computed. To examine whether the anthropomorphism tendency had an influence on the choice of anthropomorphic robots, it was analyzed whether within the three fields of application, individuals with a higher or lower IDAQ score were more likely to choose certain anthropomorphic robots. The only significant correlation was found in the social context, which, however, was negative, r = −0.275, p = 0.003. Accordingly, participants with a higher tendency to anthropomorphize tended to choose robots with a lower degree of anthropomorphism in the social domain.

3.2 Preferred degree of anthropomorphism

Overall, the preferred level of anthropomorphism varied substantially between the three contexts (Fig. 3).

Fig. 3
figure 3

Frequencies of robot selection (in case of anthropomorphism: low vs. medium vs. high) in percentage for the three context descriptions (industrial vs. service vs. social)

Whereas a clear majority of participants preferred robots with a low level of anthropomorphism in the industrial context (58.82%) and robots with high anthropomorphism in the social context (84.03%), no clear preference emerged in the service context with 37.53%, 34.45% and 28.01% of participants preferring robots with a medium, low and high level of anthropomorphism, respectively. For the analysis non-parametric Friedman tests of differences among repeated measures were used to account for the relatedness of the groups as well as the ordinal scale level of the anthropomorphism degree. The analysis revealed a significant difference in the industrial context (X2(2,N=119) = 80.26, p < 0.001). Post-hoc Wilcoxon Signed Rank tests indicated that participants significantly preferred robots with a low level of anthropomorphism over robots with medium (32.77%, r = 0.37) and high (8.4%, r = 0.71) levels of anthropomorphism, as well as robots with a medium level of anthropomorphism over robots with a high (r = 0.5) level of anthropomorphism in the industrial context. The significant difference between degrees of anthropomorphism in the service domain (X2(2,N=119) = 7.06, p = 0.029) revealed only a preference for robots with a medium level of anthropomorphism over robots with a high (r = 0.16) level of anthropomorphism. In contrast to the industrial context, the significant differences in the social domain (X2(2,N=119) = 164.62, p < 0.001) showed that robots with a high level of anthropomorphism were favored over robots with low (7.28%, r = 0.84) and medium (8.63%, r = 0.85) levels of anthropomorphism.

Besides the selection frequencies, the response latencies of these choices (Fig. 3) were analyzed via one factorial ANOVAs separately for every context.

Fig. 4
figure 4

Response times of robot selection (in case of anthropomorphism: low vs. medium vs. high) in ms for the three contexts (industrial vs. service vs. social)

In the industrial domain, the analysis revealed a significant main effect of anthropomorphism (F(2,135.87) = 3.63, p = 0.029). Post hoc tests using Bonferroni correction for multiple comparisons showed that the response time was significantly faster (p = 0.029), if a robot with a low level (M = 4759 ms, SE = 343 ms) of anthropomorphism was chosen compared to a robot with a medium level of anthropomorphism (M = 6076 ms, SE = 398 ms). No significant differences were found for the choice of the robot with a high level of anthropomorphism (M = 5041 ms, SE = 740 ms) compared to both the low (p = 1) and medium (p = 0.629) anthropomorphic robot conditions. The analysis of the response latency in the service context showed no significant differences (F(2,143.83) = 0.17, p = 0.848) in response times between the low (M = 5418 ms, SE = 444 ms), medium (M = 5526 ms, SE = 381 ms) and high (M = 5758 ms, SE = 443 ms) degree of anthropomorphism.

However, the selection latencies differed significantly (F(2,120.58) = 8.38, p < 0.001) in the social context. Participants selected the robots with a high (M = 3782 ms, SE = 246 ms) level of anthropomorphism faster than robots with a medium (M = 5704 ms, SE = 524 ms, p = 0.003) and a low (M = 5679 ms, SE = 630 ms, p = 0.016) level of anthropomorphism. In total, the analysis revealed a significant main effect of context (F(2,442.7) = 9.44, p < 0.001). Post hoc tests using Bonferroni correction for multiple comparisons showed that the response time in the social domain (M = 4319 ms, SE = 271 ms) was significantly faster than the response times in the industrial (M = 5266 ms, SE = 253 ms, p = 0.007) and service (M = 5620 ms, SE = 251 ms, p < 0.001) domain.

3.3 Gender associations

The naming of the selected robots (Fig. 5) showed a domain-overlapping preference for functional robot names like “industrial helper”, “liftbot” or “helpbot”. More precisely, the analysis of the industrial context revealed significant differences between the selected names (X2(3,N=119) = 85.32, p < 0.001). Bonferroni corrected post hoc pairwise comparisons showed that female names (2.54%) were chosen significantly less often than male names (16.67%, p = 0.003), nicknames (20.9%, p < 0.001) and functional (59.89%, p < 0.001) names. Furthermore, functional names were given significantly more often than male names (p < 0.001) and nicknames (p < 0.001).

Fig. 5
figure 5

Frequencies of chosen names (female vs. male vs. nickname vs. functional) in percentage for the three contexts (industrial vs. service vs. social)

The results were comparable in the service domain (X2(3,N=119) = 52.69, p < 0.001), as female names (5.37%) were chosen again significantly less often than male names (18.08%, p = 0.03), nicknames (25.42%, p < 0.001) and functional (51.13%, p < 0.001) names. Additionally, again functional names were given significantly more often than male names (p < 0.001) and nicknames (p = 0.008).

Surprisingly and against the expected stereotypes, the differences between gender associations showed a similar pattern in the social domain (X2(3,N=119) = 19.91, p < 0.001). Again, female names (9.04%) were chosen significantly less often than male names (27.97%, p = 0.004), nicknames (25.42%, p = 0.015) and functional names (37.57%, p < 0.001).

3.4 Implicit associations

On average, participants had a stronger association between the concepts industry + male robot and social + female robot than vice versa (industry + female robot, social + male robot), which is apparent from the positive improved D-Score (M = 0.82, SD = 0.38). This effect was not significantly different (t(114.76) = 0.49, p = 0.623) for participants identifying themselves as female (M = 0.84, SD = 0.42) or male (M = 0.81, SD = 0.32). Additionally, the IAT scores did neither correlate with the chosen gendered name in the industrial (r = 0.05, p = 0.70) nor in the social (r = 0.15, p = 0.106) context.

4 Discussion

The aim of this study was to examine the effect of context on the preferred degree of anthropomorphism and ascribed gender in HRI. Therefore, the selection of robots with different degrees of anthropomorphism as well as the naming of those robots in three different contexts were analyzed.

The assumed differences in the preferred level of anthropomorphism between the contexts were mostly supported by the results. As expected in H1, participants showed a clear preference for a robot with a low level of anthropomorphism in the industrial context. This result is in line with previous research indicating that low levels of anthropomorphism are preferred for robots performing “dull and dirty tasks” [14]. Previous results already suggest that anthropomorphic robots might not be always beneficial for task performance in the industrial domain [7, 8]. Going beyond these results, our research further suggests that the anthropomorphic design of robots does not even seem to be desired in the industrial domain. This is especially interesting as the current implementations of robots in this domain show a contrary trend. Particularly, collaborative robots (as characterized in our context description) are equipped with anthropomorphic features (e.g., Sawyer/ Baxter from HAHN Group or the Workerbot from Pi4 robotics) to be more preferred by users. This assumption is not supported by our findings. Moreover, it is questionable whether this design approach is effective as those anthropomorphic features are implemented in a task irrelevant manner (e.g., randomly moving eyes) [8].

Also, in line with H3 and previous research [13, 14], the results were contrary in the social domain in which robots with the highest level of anthropomorphism were significantly preferred over low and medium anthropomorphic robots. Both, the result of the social and industrial domain, are supporting the matching hypothesis [13, 14] stating that robot design should match the according sociability in the respective domain.

However, the preferences were less clearly pronounced as expected in H2 in the service domain, where only the medium anthropomorphic robots were preferred significantly over the high anthropomorphic ones. One possible reason for the ambiguous preferences in the service domain might be that the written vignette allowed for a wide range of interpretation.

The service domain includes many diverse tasks ranging in their sociability [14]. Participants might have developed different ideas about the robots’ scope of duty and the required sociability.

The preference for anthropomorphism in different domains therefore might have been moderated through the expected sociability of the task. Future research is thus needed to further detail the preferred degree of anthropomorphism by taking the domain and the specific task sociability into account.

In addition, the association of anthropomorphism with the sociability of domains is supported by the analysis of the response time data. It was assumed that faster responses occur when concepts are more strongly associated [15, 16]. Whereas participants reacted fastest to the low anthropomorphic robot in the industrial domain (H4), the fastest response times in the social domain were found for the robot with a high degree of anthropomorphism (H6). Again, our expectation in the service domain was not met (H5), as no clear differences in response times occurred in this domain.

Moreover, the overall response times were faster in the social domain (with predominantly chosen robots with a high degree of anthropomorphism) compared to the service and industrial domain. This result only partially supports H7. We assumed that the choice in both, the social and industrial domain, would be faster than in the service domain as the association of a degree of anthropomorphism was expected to be higher in those domains. For the social domain there indeed seems to be a strong association between the social field of application and highly anthropomorphic robots, which is in line with fast response times revealed in different studies [15, 16]. In the industrial domain the association with low anthropomorphism might not be as clearly present as expected, which was additionally supported by the higher variability of the choice frequency in the industrial domain compared to the social domain.

In summary, anthropomorphism is associated and preferred in the social domain, but not in the industrial one. This result empathizes that the transfer of features, which facilitate social HRI [3] might not be always beneficial in other contexts.

However, it is important to keep in mind that the performed online study limited the possible ways of perceiving robots to a visual depiction only and thus important interactional factors of HRI associated with physical embodiment like movements, sounds or communication could not be experienced [1]. As commonly emphasized in HRI research [20, 26] there is a strong need for real interaction studies and actual field studies as the generalizability of online studies is not always clear cut due to the low external validity. Studies with embodied robots therefore represent a necessary requirement to further investigate people’s preferences when they actually interact with a robot and whether environmental aspects like the social and organizational context influence this preference. Furthermore, it is necessary to systematically investigate the influence of other dimensions of anthropomorphism. Anthropomorphism in this study was implemented via the appearance of the robot but anthropomorphism can go beyond appearance by including movements, communication or the context [1].

Additionally, the negative correlation of the individual tendency to anthropomorphize [41] and the selection of robots in the social domain is noteworthy. It could be presumed that individuals with a higher tendency to anthropomorphize robots perceived robots in general as more anthropomorphic and therefore even the lower anthropomorphic robots were perceived as suitable in a social context. This assumption needs to be addressed in future research, to supplement general design recommendations with knowledge about interindividual preferences [40, 41].

Nonetheless, the results enable a first domain overlapping comparison. Whereas earlier research investigated the preference for robot anthropomorphism on specific tasks and primarily in the social and service domain [13, 14], the current study examined the preferred level of anthropomorphism under situational circumstances of mentioned domains and additionally the industrial domain. Hereby, it is important to state that the context descriptions clearly presented all robots with similar capabilities and the same interactional aim of collaboration. This standardized approach with regard to robot capabilities and task relevance further supports the meaning of application domains as a central influencing factor for HRI. Overall, the results strengthen the assumption that the preferred morphology of the robot is depending on the expected social functions needed in the different domains [13]. Anthropomorphic robots are preferred for the social domain, in which tasks are more associated with human communication behavior and low anthropomorphic robots are favored in the industrial domain, in which tasks are more associated with physical demands.

Surprisingly and contrary to previous research [9, 17, 18, 23], the analysis of gender associations revealed that a clear majority of participants always chose a functional name for the robots in the different contexts. Furthermore, and in line with H8 and H9, participants choosing a gendered name preferred male names over female names, in the service and industrial context. Unexpectedly and against H10 male names were also chosen more frequently than female ones in the social domain. This latter result indicates a generally stronger association of technology with masculinity than with femininity [22]. Future research needs to investigate this assumption of a possible robot-masculinity bias by comparing the ascribed gender of robots with and without explicit gender cues. Noteworthy, however, gendered human names (either male or female) for robots were more often considered in the social (37%), than in the service (23.45%) and industrial (19.12%) domain. This is in line with the clearly preferred anthropomorphic robot design in the social domain.

Nevertheless, the association that is prevalent in all contexts seems to be technical or functional rather than gendered. This is an interesting finding as it challenges the existing trend to gender robots [44] because it is supposed to be socio-economically profitable [22]. Gendering robots according to human stereotypes is re-enforcing those stereotypes and thus, an ethically questionable approach. The IAT results in this work indicate that the assumed (H11 and H12) stereotypical gender-occupation associations already exist in HRI, which is alarming. However, it is important to state that the participants of this study were mainly young adults with different professional backgrounds. On the one hand, this enables insights on a broader public preference for anthropomorphism and stereotypes in HRI. On the other hand, further research is needed to investigate if those results hold true for professionals in the respective domain.

The pressing question of whether such a stereotypical fit between gender and task is even necessary was addressed in a study by Dufour and Nihan [19]. They revealed that stereotypical judgment and perception of robots could be mostly diminished by providing technical features and skills of the robot. Moreover, Bryant et al. [45] showed that a gender-neutral robot brings no disadvantages in comparison to a gendered one. Those insights along with the results of this work, indicating that the prevalent association irrespective of the application field is a functional one, support the opportunity for designing robots as functional and ungendered entities. Therefore, framing robots functionally and emphasizing their technical features by e.g., not giving gendered human names should be the favored alternative since it is not ethically concerning and perpetuating stereotypes.

5 Conclusions

Overall, the results of the study are in line with previous research about anthropomorphism preferences in different occupational fields, by showing that anthropomorphism is desired in social but not industrial domains. Our study therefore strengthens the body of research, which points out that anthropomorphic features are not domain-overlapping desired by the general public.

Further the results suggest that mainly a functional association of robots in the public perception prevails. However, if a gender association occurs, it is, regardless of the context, predominantly an association of masculinity with robots, and not as expected a replication of context specific gender stereotypes. Nonetheless, the predominantly functional naming of the robots in our study without explicit gender cues shows that ungendered morphology can lead to mainly functional associations.

Robot design and framing focusing on functionalities, rather than specific gendered anthropomorphic attributes can therefore be consciously used to avoid stereotypes. Both aspects, the possibility to encourage less biased HRI and the not generalizable requirement of anthropomorphic features can facilitate design approaches in HRI.