Our major question in both studies was how differences in morphological features of physically embodied robots affect the perceived capabilities of a robot and the accompanying assessment. We built upon our earlier developed framework that initially addressed differences in the perception and evaluation of artificial entities with different embodiments [
13], and expand this idea to the overall assessment of and variables. Therefore, we investigated how body-related capabilities are determined by body features incorporated in varying robot morphologies. The results of our two large online studies reveal that the capabilities humans infer from the mere appearance of robots on photographs depend on morphological features in the robots’ design. As Phillips et al. [
27] argued, the presence of these features set certain expectations, for example, that a robot with body manipulators (e.g., grippers) is designed for the purpose of grasping and manipulating objects. Furthermore, our findings show that expectations and attributions evoked by the morphological appearance explain differences in robots’ assessment in terms of technology acceptance, and also on dimensions relevant for person perception, i.e., warmth, discomfort, and competence. The following sections summarize and interpret the findings gathered from the different robot clusters.
6.1 Which Morphological Features Determine Capability Perceptions in the Clusters?
Cluster 1 - Caricatured robots with eyes. Their abstract, comic-like appearance might explain why robots in this cluster were rated the least capable on all dimensions. The comic-like morphologies might trigger toy-related mental models and thus lower peoples’ expectations for a robot’s capabilities in general. The absence of manipulators and legs or wheels explains why they were perceived as incapable to touch, transport objects, and move. Although the majority of the robots possesses eye-like features, they score low on (Shared) Perception and Interpretation as well as on Nonverbal Expressiveness. This might be due to the aforementioned associations with toys that are predominantly static and incapable of perception. However, the majority of the robots in this cluster are in fact quite expressive, e.g., Keepon can move and make sounds. These dynamic features of the robots were of course not discernible from the pictures, especially when participants had not encountered the robots in real-live or video before, which is a limitation of the static approach. Corporeality was lower than in the other clusters, but still moderate (3.18 on a 6-point scale). It seems plausible that abstract appearances that resemble comic characters are rated as less “real”.
Cluster 2 - Full body, humanoid, and android robots. Robots with a full humanoid body, including legs to walk, arms and hands to grasp, a torso and a head with facial features, were rated the most sophisticated regarding all capabilities. Although only the upper body of the android robots were depicted in three cases, the robots were still assumed to be highly capable. The same could be observed for anthropomorphic, expressive robot heads (C4). It seems that humans assume high capabilities only based on the sophisticated design of visible parts. Robots with a humanoid or android shape were also perceived as most (physically) human-like. Similarity to the human shape thus appears to be an indicator of high capabilities: if it looks like a human, it must be capable to do what humans do. Eliciting such high capability expectations through design might backfire, if the robots’ actual capabilities do not match their perceived ones. The robots were further rated as the “realest” robots (Corporeality), which suggests that humans’ definition of what makes an entity corporeal is not necessarily the differentiation between is real (robot) or is not (virtual character). Instead it seems related to a degree of realism in the kind of an entity, e.g., human-like entities are more real than functional objects, which are, however, still more real than cartoon-like entities, at least with regard to robot embodiment.
Cluster 3 - Anthropomorphic but functional robots with grippers. While robots in this cluster were perceived as capable for tactile interaction, mobility, and corporeality as were the full body, humanoid and android robots (C2), they were evaluated less capable of shared perception and expressiveness. The comparison of the robots heads shows that robots in C2 possess more surface features and more sophisticated faces (in the case of the androids), whereas the majority of the robots in C3 only have eyes, and some eyes and a mouth. Hence, possessing more body features that allow for perception and resemble human faces increase the perceived capability for shared perception and interpretation. This is an important information, since not all facial features of robots are designed for perception. Nevertheless, they do trigger expectations. On the contrary, sensors that allow for visual or sound perception might be invisible to humans but still incorporated in a robot. If it is important that the user has knowledge about the robots capabilities it might be worth to considering features that are related to humans’ perceptions of the indicated capabilities, e.g., microphones in ear-like features, or speakers in mouth-like features of a robot.
Cluster 4 - Anthropomorphic, expressive robot heads. These robots are rated moderately high on the dimensions that can be related to facial features, i.e., shared perception and nonverbal expressiveness. Despite the lack of a torso, arms, legs, or wheels, their mobility and corporeality was still rated moderately, but low in contrast to the other clusters, except for caricatured robots (C1). This supports the assumption that not only the presence of a feature (e.g., eyes) determines a capability, but beyond that the human-likeness of the feature’s design seems to produce a Halo effect and might affect judgments for other capabilities. Therefore, expressive human-like robot heads (C4) were rated as more capable than comic-like robots with eye-like features (C1) to perceive their environment. Moderate, and not high, ratings in (Nonverbal) Expressiveness could be a result of the robot’s restriction in gestures, since the items for expressiveness cover gestures as well. However, it is surprising that the same robots were rated as capable to move and manipulate objects although they do not possess a torso or legs. It seems as if the sophisticated surface looks of these robots trigger mental completion processes: human-like heads belong to full bodies, although they are not present on the pictures. This would be the same for a portrait picture of a human, where it is clear that the human has a body, although it is not visible. It is vital to mention that such misconceptions would not arise in actual encounters where the absence of a torso would be inevitably visible to the viewer.
Cluster 5 - Mobile robots with facial features but no manipulators. While the robots in C5 look quite similar as those in C3, they are rated lower regarding movement, touch, and expressiveness. This is surprising, since most of them are capable to move. However, their bodies do not resemble a human shape. They are animal-like or abstractly formed, move on a pedestal, or it is unclear from the picture whether the robot has wheels or not (e.g., Karotz). Together with the abstractness of some facial features, this reason could also explain the low ratings on (Nonverbal) Expressiveness. It has to be noted that the perception of these robots might change in a real interaction where the features that seem static on the picture are actually quite expressive (e.g., mouth and eyebrows of iCat). This positive discrepancy between the expected low capabilities and the actual higher extent might result in highly positive evaluations of these robots in actual encounters, presumed that an initial impression has been formed based on the static appearance before.
Cluster 6 - Functional robots with grippers or wheels. The function of these robots can be more easily derived from their appearance, e.g., grasp and manipulate objects. The robots are either wheeled (Roomba, Clocky) or possess manipulators (Kuka, Franka Emica) resulting in overall moderately high ratings on Tactile Interaction and Mobility. They were perceived as restricted in nonverbal expressiveness since they neither possess facial features, nor a body that could show gestures. Also, their ability to perceive the world was considered low, perhaps due to the lack of mammalian-like features indicative of perceptual abilities. Of course, many of these robots have sensors that allow for perception, e.g., collision detection. However, due to the absence of visible features, these capabilities are not as salient for these morphologies compared to the others in our study. Higher ratings in corporeality compared to more cartoon or toy-like robots as in C1 can be explained as above, i.e., industrial robots or vacuum cleaner robots can be regarded as tools that are more “real” than fictional cartoon characters. Conclusively, humans have rather low expectations for functionally-looking robots in terms of social capabilities. However, if these robots should become more interactive and integrated in social contexts, it is a socio-technical challenge to equip these robots with cues that allow human users to form adequate mental models about their capabilities which might go beyond pick-and-place.
6.3 Limitations
First, making judgments about robots’ assessment on the basis of static pictures is a major limitation of the present work. Albeit, the short exposure to a picture was the most controllable possibility to expose participants to a large variety of robots. Nonetheless, we have to admit that the exposure to a picture of a robot on a screen does not equal standing in front of a robot, even if it is not moving at all. Also, with regard to the robotic heads (C4) that were rated as capable to move and touch, it is vital to mention that such misconceptions would not arise in actual encounters where the absence of a torso would be inevitably visible to the viewer. Hence, direct comparisons of capability ratings based on pictures and live exposure to the same robot will be highly informative in the future. In addition, comparisons of just observing a co-present robot versus directly interacting with it should be taken into account. As summarized above, humans seem to assume high capabilities based on the design visible in the pictures. Whether these (high) expectations endure live encounters remains an open question. Furthermore, initial expectations of a robot’s capabilities can easily change based on interaction experiences. For example, if a robot with visible features that imply vision (e.g., eyes) does not respond to motion in the environment, this should change perceptual expectations. Eliciting high capability expectations through design can hence backfire, if the robots’ actual capabilities do not match their perceived ones (e.g., [
26]). On the one hand, this suggests that the evocation of certain capability expectations through a robot’s morphology should be taken seriously. On the other hand, it suggests that actual interaction experience can overwrite initial impressions in a good (“better as expected”) as well as a bad sense. Second, conducting online surveys with MTurk comes with limitations. We took several steps to ensure data quality (check questions at several points in the survey, in-depth analyses of answers). However, potential inattentiveness of respondents remains a problem. Furthermore, the rating of the robots was based on static pictures that did not reveal the size, sound, or movements of the robots. Because of that the actual co-presence of the robots, which is one key feature of physically embodied robots [
18,
37], was not given. Although we believe that no actual interaction with the robots is necessary to answer the question whether morphology (which can be regarded as a stable and static feature) impacts capabilities inferred from robot appearance, it remains unclear whether the findings still apply to the same extent in dynamic environments (video or actual interaction). Third, some findings from the presented studies suggest to rethink the combination of the theoretically separated capabilities
Tactile Interaction and
Mobility into one sub-scale. This becomes especially visible with regard to the high ratings in
Tactile Interaction and Mobility assigned to robots in cluster 5 that do not possess body manipulators to touch or carry objects. However, these robots have wheels or four legs that make them mobile. Last, the clusters in their current form subsume very different categories of robots with similar morphological features. For instance, Cluster 2 consists of full body, humanoid and android robots such as NAO and Geminoid (cf. Figure
5), which might evoke quite different reactions. Hence, further subdivisons, e.g., into humanoid and android robots, could allow for more fine grained comparisons within the presented clusters.
6.4 Contributions and Outlook
Our findings expand previous work on robot perception by adding perceived capabilities as an explanatory variable to untangle the assessment of artificial entities with varying morphology. Our results reveal that initial exposure to visual cues of robots incorporated in their morphology trigger certain expectation about their body-related capabilities, i.e., the capabilities to move in space and to touch objects, to express oneself, to share perceptions, and to be corporeal. These capability related expectations further explain why robots with different morphologies receive varying assessments in terms of acceptance but also socially relevant evaluations. This knowledge informs researchers, on the one hand, to better understand why visible morphological features of artificial entities or social robots trigger varying evaluations. On the other hand, the results are relevant to engineers and designers that aim at building robots with morphologies that match human expectations of robot’s capabilities. Regarding previous work, our results suggest that conflicting findings could be to some extent caused by different capability attributions that were caused by different morphologies of the used robots (e.g., humanoid, full body robots [
8,
12,
15,
16] or zoomorphic, toy-like robots [
14,
17,
19]). As HRI researchers, we are aware that viewing pictures of robots is different from experiencing the co-presence of a social robot, its size, its movement, and the accompanying motor sound. An important open question for future research is thus: Which role do morphological differences play in the assessment of robots during actual HRI? Do visible static features significantly alter how humans appraise a robot and how they react towards it? Or do static features become less salient and thus less important during live HRI when the attention is directed towards the task and the performance of a robot? Are initial impressions, as we tried to infer from the ratings of pictures, actually consequential for humans’ decision to approach or avoid a robot in real life? Can these initial expectations predict how people will behave in front of a robot? An important step to answer these questions will be the systematical variation of morphological features in live interactions. This can be realized through comparative studies that utilize different robots, e.g., robot from different clusters as presented here. Or, through variations of the visible features of one robot, e.g., by covering parts of a robot (cf. [
5], or dismounting grippers (if possible). Virtual and augmented reality applications further seem to be a fruitful test bed to study the impact of morphological differences in live interactions. In addition, research on the role of robot identity and its relationship to possessing a single or multiple bodies suggest that people are able to recognize the same robot identity within a new body if certain cues such as the eyes or the voice are kept equal [
20]. More research in this realm is necessary to understand whether morphological cues associated with capabilities are affected by dynamically changing robot identities. For example, it seems plausible to assume that the same robot identity in another body knows (cognitive capability) the same information, whereas it is not plausible to assume that it will be able to transport objects if the new body is not equipped with manipulators (physical capability). How this discrepancies might affect the overall assessment of a robot should be considered in future work. Furthermore, it remains open whether perceived capabilities, such as those related to embodiment (EmCorp-subscales), are stable perceptions, or whether perceived capabilities can change over time. As subsumed in the framework (Figure
1), it can be expected that contextual factors and enabled behaviors in live interactions will render certain capabilities more salient. For instance, one can expect that performance shortcomings such as dropping a cup might result in lowered ratings of the robot’s capability for tactile interaction, although it has been initially assumed to be high due to the presence of grippers. The same can be expected for a robot that has eyes that do not include vision sensors which allow for reaction to visual stimuli. Moreover, contextual factors can render certain capabilities more important than others. In a task that includes the manipulation of physical objects, like the towers of Hanoi task [cf.
14,
37], shared perception and reaching out to manipulate objects is more relevant than nonverbal expressiveness. Thus, an industrial robot such as the Kuka Gripper might be perceived as more capable for the task than a robot with a highly realistic face but no manipulators (e.g., Flobi). Future studies can expand this line of research by including capabilities beyond body-related ones, e.g., cognitive or communicative capabilities, which might also be linked to visible features of social robots.