research-article

Open access

Swarm Body: Embodied Swarm Robots

Authors:

Shigeo YoshidaAuthors Info & Claims

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Article No.: 267, Pages 1 - 19

https://doi.org/10.1145/3613904.3642870

Published: 11 May 2024 Publication History

All formats PDF

Abstract

The human brain’s plasticity allows for the integration of artificial body parts into the human body. Leveraging this, embodied systems realize intuitive interactions with the environment. We introduce a novel concept: embodied swarm robots. Swarm robots constitute a collective of robots working in harmony to achieve a common objective, in our case, serving as functional body parts. Embodied swarm robots can dynamically alter their shape, density, and the correspondences between body parts and individual robots. We contribute an investigation of the influence on embodiment of swarm robot-specific factors derived from these characteristics, focusing on a hand. Our paper is the first to examine these factors through virtual reality (VR) and real-world robot studies to provide essential design considerations and applications of embodied swarm robots. Through quantitative and qualitative analysis, we identified a system configuration to achieve the embodiment of swarm robots.

Figure 1:

1 Introduction

The plasticity of the human brain allows it to recognize objects as body parts and manipulate them intuitively under certain conditions. This cognitive ability has led to the development of embodied systems that enhance intuitive manipulation and immersion, such as teleoperated robotic arms or virtual avatars. These systems have proven invaluable; however, there remains untapped potential in flexibility, adaptability to different environments, and overall robustness. For example, it is difficult for conventional embodied systems to increase their levels of embodiment by dynamically changing their size to suit the individual, changing the shape to one suitable for a certain action, or for use on a tabletop or in an environment with obstacles. To address these challenges, we introduce the concept of Embodied Swarm Robots.

Swarm robots consist of many small robots of similar types, such as a school of fish or a swarm of insects.Unlike conventional robotic arms, swarm robots are expected to cooperate and exert significant effects. This unique quality endows them with robustness, flexibility, and scalability, enabling tasks such as navigation through narrow paths [56], pattern formation [1], self-assembly [45], and the collective transport of objects that are larger and more complex than themselves [33]. Thus, swarm robots have a wide range of applications such as environmental monitoring, space exploration, agriculture, emergency rescue, warehouses, industrial plants, entertainment, surveillance, and maintenance [60].

Imagine a human body composed of swarms of robots, offering unprecedented adaptability. By using swarm characteristics, individuals can dynamically shape and equip their bodies to perform specific tasks and situations. For example, a human with a swarm body can effortlessly traverse through narrow paths accessible to the constituent robots of the swarm. Moreover, if some individual robots within a swarm are lost, the remaining individuals compensate for the loss of functionality. Embodying swarm robots holds promise for combining intuitive and immersive interactions with adaptability. However, the realization of swarm robot body systems remains challenging because of our limited understanding of the conditions to support the sense of embodiment and subsequent system design considerations.

Although previous studies explored the embodiment of robot arms and virtual avatars, the embodiment of swarm robots introduces additional complexities. For example, the size, density, and position distribution of robots, as well as the algorithms assigning the robots to the positions, can influence the levels of embodiment. Unlike robot arms, in which each moving link is constrained by joints, relative positions (i.e., position distributions) of swarm robots can change without geometrical constraints other than collisions, which gives them flexibility. Thus, when representing a hand in a particular posture, various position distributions are possible, such as placing robots at the joint positions of the hand or placing them such that they are equally distributed within the hand shape. In addition, the algorithm assigning robots to this position distribution influences the embodiment. When moving a hand, the hand state (i.e., posture and position) changes dynamically. Therefore, the robots need to follow each hand state while constantly assigning themselves to the position distribution for the current hand posture. One possible assignment method is to statically assign a particular robot to a particular position on a hand, whereas another method is to dynamically update the assignments so that the total sum of the travel distances is minimized. Therefore, identifying an appropriate algorithm for position distribution generation and assignment to follow the dynamically changing body states, as well as the size and density of robots, is crucial for the successful embodiment of swarm robots.

Similar to the many previous studies on embodiment introduced in section 2, this study investigated the embodiment of swarm robots by focusing on the hand, which is the part of the body where people most frequently interact with the environment. In addition, we focused on tabletop swarm robots because we intended to explore various everyday interactions that people have with their hands at a table. To evaluate the level of embodiment, we measured the sense of body ownership and agency [12]. The factors examined were robot size, density, position distribution generation algorithm, and assignment algorithm. VR and real-world robot experiments were conducted. In the VR experiment, all the aforementioned factors were explored using simulated swarm robots. This shows that swarm robots can be embodied and provides various insights into the embodiment of swarm robots with ideal swarm robot behavior. However, actual robots may behave differently from those in VR environments. To check whether swarm robots can be embodied and if similar embodiment characteristics are obtained in the real world, a similar embodiment experiment was conducted with the physical swarm robots. Based on the results of the VR and real-world experiments, we demonstrated the characteristics of swarm robot embodiment under ideal and actual robot behaviors, as well as design considerations for embodied swarm robot systems. By comparing these results, we discuss how the characteristics of a real-world system affect the embodiment of swarm robots, thereby providing useful information for the future design of new embodied swarm robot systems. Our contributions encompass:

(1)

Proposing a framework for the embodiment of swarm robots in the hand.

(2)

An algorithm to determine position distribution relatively from the hand skeleton and dynamically assign them to robots, as determined through a series of VR and physical experiments, enhanced the sense of body ownership, sense of agency, and overall usability compared with other conditions.

(3)

Suggesting practical implementations and applications for embodied swarm robots that integrate these findings.

2 Related Work

2.1 Body Ownership and Agency

The sense of embodiment is a fundamental product of the human mind. A key aspect of embodiment is the sense of body ownership. Body ownership is “the sense that I am the one who is undergoing an experience” [12]. Botvinick and Cohen [5] and others revealed that humans can perceive the sense of body ownership toward artificial objects with synchronous visuotactile stimulation. In addition to synchronous visuotactile stimulation, synchronous visuomotor feedback also affects embodiment [57]. Synchronous visuomotor feedback sheds light on another key aspect of embodiment: the sense of agency. The sense of agency is “the sense that I am the one who is causing or generating an action” [12]. From the perspective of the sense of body ownership and agency, researchers have expanded their knowledge of various aspects of human embodiment, such as its temporal [59, 62, 63, 72] and spatial [18, 22, 41, 53] characteristics, as well as the embodiment possibilities of artificial bodies.

With advancements in VR technology, many scholars have explored the embodiment possibilities of various non-biological bodies, including elongated [31], invisible [28], discontinued [50, 70], scrambled [27], re-associated [29, 30], shared [11, 13, 69], multiple [44], and supernumerary [2, 20] body parts, as well as noncorporeal objects [42], through synchronous visuomotor feedback using VR. These studies revealed that humans can embody not only their innate body parts with their original configurations but also those augmented or modified in certain ways as well as non-corporeal objects. This knowledge can be applied in robotics to develop embodied robots that realize intuitive interactions, similar to human interactions with innate bodies.

2.2 Robot Embodiment

Robot embodiment has been studied for on-site and remote intuitive robot manipulations to interact with the environment physically. For example, Aymerich-Franch et al. examined whether non-human–looking humanoid robot arms could be perceived as the user’s arms and reported that the levels of embodiment were similar between human- and non-human–looking arms [4]. Takada et al. showed that a single user could feel a sense of agency toward multiple robot arms while playing ping-pong with two opponents [68]. Most studies on virtual and robot embodiments have focused on body configuration similar to the human biological body. Recently, researchers have begun to explore the embodiment of Supernumerary Robotic Appendages (SRAs) or unconventional body parts, such as additional limbs [58, 71, 76], hands [24, 77], fingers [51, 61], joints [36], and tails [20]. The embodiment of an SRA or an unconventional body has the potential to augment human abilities and interactions with the environment, which are currently limited by the innate body configuration. For example, tentacle limbs offer greater freedom than innate human limbs. Similarly, swarm robots have the potential to provide robustness, flexibility, and scalability to a human body that conventional humans do not possess. Therefore, this study is the first to investigate the embodiment of a swarm to realize embodied swarm robots.

2.3 Swarm Robots in HCI

Swarm robots operate such that many small robots of similar types cooperate with each other to exert significant effects. This unique property endows the devices with robustness, flexibility, and scalability. In HCI, actuated objects were used as tangible user interfaces (TUIs) to allow users to interact with the digital world through physical tabletop objects [10, 48, 49, 52, 54, 65]. Zooids [34] expanded these tangible interaction possibilities by introducing tabletop swarm robots and introduced the concept of Swarm User Interfaces (Swarm UIs). This design space has been further extended by developing new robots [8, 35], adding new action possibilities to robots [40, 45, 66, 67, 78, 79], building mixed reality system cooperative with projected images [15, 16], and introducing interfaces that seamlessly connect the digital and physical worlds [19, 21, 38, 46]. Fundamental applications include sensing, visual feedback, haptic feedback, shape presentation, object representation, object actuation, environmental adaptation, and collaborative actions. As such, the multi-agent nature of swarm robots has realized a flexible and scalable interface between the physical and digital worlds.

This multi-agent nature makes user control of swarm robots challenging. Three main approaches have been adopted to control swarm robots: predefined, physical, and synchronized. Many previous works have used predefined control in which a certain action or state triggers a certain robot’s program. Kim et al. compared various human-control modalities and strategies for predefined swarm robot control and provided guidelines [25]. Another approach is to apply physical laws to robots so that users can expect and control their behavior [21, 38]. Synchronized control, where swarm robots act in sync with corresponding remote or virtual objects, is also widely used, particularly in the context of TUIs, VR, and physical telepresence. For example, HoloBots [19] move such that their positions match the positions of the corresponding robots or the user’s body part in the remote physical world. These methods enabled physical interactions with the digital or remote physical worlds. However, these swarm robots do not aim for embodiment or seek to represent human body parts. By embodying swarm robots, the robot user can interact with the environment more intuitively and precisely while maintaining the power of the swarm robots. In addition, the observer can understand the intentions of robot users more intuitively and precisely while feeling more humanness or affection toward the robots. In human-robot interactions, an early attempt to control swarms in an embodied manner using fingers has already been made [23]; however, the robots do not represent the hand well, and their embodiment and interaction opportunities have not been investigated. Inspired by these, this study explores the interaction opportunities of embodied swarm robots.

3 Framework for Swarm Robots Embodiment

Figure 2:

We first introduce our framework to represent a hand with swarm robots in real-time. As our body moves dynamically, navigating a swarm of robots has dynamic destinations associated with each timestep rather than a single static goal. We refer to these dynamic destinations as subgoals that correspond to specific timesteps. Assuming that the hand skeleton coordinates are tracked at regular intervals, we define the robot destination coordinates for each hand state as subgoal positions. If each robot moves to each subgoal position instantly, we can focus on how to represent each hand state with subgoal positions.

However, swarm robots often do not reach the subgoal positions before the subgoals are updated. Therefore, it is necessary to control swarm robots to follow the subgoal positions so that the robots’ movements represent the movement of the body part. Thus, to realize embodied motion with swarm robots, we require a new framework that simultaneously realizes both the proper hand representation and the smooth collective follow of swarm robots.

Common steps to control swarm robots are getting a subgoal formation (a collection of subgoal positions), assigning the subgoal positions to the robots (assignment), and obtaining a path for each pair of robots and its subgoal position (path planning). Inspired by this, we took the following three steps to achieve embodied movements of swarm robots as shown in Figure 2:

(1)

generate a subgoal formation based on the current hand position and shape;

(2)

assign the generated subgoal positions to the robots;

(3)

obtain local paths and move the robots accordingly to avoid collisions with each other and obstacles.

3.1 Subgoal Formation Generation

Figure 3:

Step (1) sets the subgoal formation for the swarm robots. Ideally, these positions are where the robots are; hence, this step affects the visual representation of the hand as well as the usability and interaction opportunity.

Because this step is similar to the virtual body representation, we took inspiration from that research. We explored abstract virtual body representation approaches that are applicable to two-dimensional representation because we applied them to tabletop swarm robots. Consequently, we identified two main approaches: point- and silhouette-based [74]. The point-based approach represents the body with a number of points corresponding to certain points on the body, whereas the silhouette-based approach represents a body with a solid shape in a single color. In other words, the former focuses on specific points, and the latter focuses on overall shapes. Based on these approaches, we designed two algorithms to obtain a set of subgoal positions from hand position, orientation, and shape, shown in Figure 3.

3.1.1 Bone-Based Subgoal Formation Generation.

Using the point-based approach, a set of points on the hand is determined, and the subgoal positions are moved according to the relative positions and movements of those points. The set of points on the hand must be fixed with respect to the skeleton. Thus, the bones of the hand were used as references for the tracked points. Therefore, we refer to this as a bone-based algorithm. Because the swarm robots we examined only moved on the horizontal plane, the bone positions were projected on the horizontal plane before calculating the subgoal positions. For example, if the bone at the tip of the index finger is set as the subgoal, the robot moves to the bone position of the hand projected onto a horizontal plane. The subgoal positions relative to the bones are predetermined (e.g., Figure 6 in our study).

3.1.2 Silhouette-Based Subgoal Formation Generation.

Using the silhouette-based approach, the subgoal formation should reflect the overall shape of the hand. To achieve this, a sensed hand skin mesh was obtained, and its vertices were clustered based on the number of required subgoal positions using the k-means algorithm [43]. The subgoal positions were set to the vertex positions closest to the centroid of each cluster ensuring they remained within the hand outline. The centroid itself may be placed outside the hand’s outline, especially when clusters span multiple fingers, have gaps between fingers, or are around the proximal phalanges. We call this the silhouette-based algorithm.

We decided to examine both bone- and silhouette-based algorithms because they performed differently during our preliminary testing with two-dimensional hand representations using swarm robots. The bone-based algorithm can offer more predictable robot movements to the user than the silhouette-based algorithm because the subgoal positions are always fixed to certain locations of the hand. The silhouette-based algorithm constantly updates the subgoal positions on the hand. Therefore, it does not guarantee that the subgoal positions are on the parts the user expects. A potential advantage of the silhouette-based algorithm is its adaptability to various hand signs. For example, when the user closes their hand, in the bone-based algorithm, the distance between the bones when projected onto the horizontal plane becomes so small that the subgoal positions become close, and the robots may collide. However, in the silhouette-based algorithm, the subgoal positions are rarely too close to each other because clustering is conducted for that hand shape.

3.2 Subgoal Position Assignment

Once the subgoal positions are obtained, they are assigned to the robots. Two assignment methods are considered: static and dynamic. The static method assigns a specific subgoal position to the same robot. In dynamic assignment, the subgoal positions are constantly reassigned to realize smoother transitions from one hand shape to another. This problem is defined as an assignment with variable subgoal formation. This was attributed to the linear sum assignment problem [7]. This problem can be solved by using the Hungarian algorithm [32].

The bone-static algorithm will offer more predictable robot movements than the other algorithm because the same robots always follow the same parts of the hand owing to the fixed subgoal formation and assignment. By contrast, the dynamic assignment can avoid potential collisions of the robots. For example, when the user turns the hand from facing upward to facing downward, dynamic assignment reassigns robots so that they do not have to flip their positions.

When the subgoal positions are constantly generated, and the current subgoal positions cannot be mapped to the past ones (i.e., when the silhouette-based subgoal formation is used), the static assignment cannot be applied because the same subgoal position does not exist at the next instant. Therefore, the possible subgoal formation generation and assignment algorithms are: bone-static, bone-dynamic, and silhouette-dynamic.

3.3 Robot Control with A Local Path Planner

Once pairs of a robot and subgoal position are determined, path planning and following robot control are executed. To plan paths and move multiple robots to their assigned subgoal positions, the Reciprocal Velocity Obstacles (RVO) algorithm [73] was used. The RVO algorithm is an extension of the Velocity Obstacle concept, which offers navigation among passively moving objects by treating them as obstacles in the velocity space. The RVO algorithm incorporates the assumption that other actively moving objects perform a similar collision avoidance behavior to the Velocity Obstacle and realized navigation among both passively and actively moving objects. More precisely, the RVO algorithm with nonholonomic constraints [64] was used as most of the swarm robots, including the tabletop swarm robots we use, are nonholonomic.

4 Embodiment Experiment in VR

We first conducted a VR experiment to examine how various factors, including robot size, affect the levels of embodiment. Then, to validate our findings in the real world, we conducted another embodiment experiment with fewer factors based on the VR results. This experiment aimed to determine when swarm robots give users a sense of body ownership and agency. Our experiment, designed based on prior embodiment studies, explored how specific swarm robot parameters affect body ownership, agency, and task load. Participants interacted with virtual swarm robots, varying in size, density, and control algorithms. After each trial, they answered questionnaires evaluating the sense of body ownership, sense of agency, and cognitive load.

4.1 Participants

The experiment involved 10 participants (6 males, 4 females; average age: 24.20 ± 2.57 SD). Participants were sourced from a recruitment post on social media. All participants were right-handed with normal or corrected vision and were unaware of the experiment’s purpose. Half of the participants had minimal VR experience, while the other half had extensive experience. Participants signed a consent form regarding the experiment and were compensated with approximately $16 in Amazon gift cards. The ethics review board approved the experiment.

4.2 Apparatus and Setup

The experiment program, implemented using Unity, simulates and visualizes swarm robots and runs on a Windows-based computer. A Meta Quest 2 HMD¹ tracked participants’ hands and provided visual feedback based on the Unity visualization. Participants wore noise-canceling headphones playing white noise to block external sounds. An iPad collected the post-trial questionnaire responses. In the VR, robots appeared on a table in front of the user as cylindrical bodies, a common form in HCI research [34]. The subgoal formations are located on the table surface, and they are not translated in the horizontal direction from the participant’s actual hand; i.e., there was no spatial discrepancy in the horizontal direction between the participant’s hand and the robot’s subgoal formation.

4.3 Experiment Design

Figure 4:

Figure 5:

We examined parameters (independent variables) specific to swarm robots. The potential design parameters include robot’s latency, speed, acceleration, size, color, shape, density, and control algorithm. As it is not feasible to examine all the parameters, we focused on some of the parameters that are more unique for swarm robots: size, density, and control algorithm. We excluded latency from the examined parameters because the effect of delay on embodiment is not unique to swarm robots and has been investigated in visual-motor synchronicity [62]. In a system with program-robot communication, we should consider the effect on embodiment owing to the sum of the delay caused by the robot’s movement performance and the delay caused by this communication. The first delay was excluded from this experiment by setting the robot’s wheel speed to 400 mm/s with which we did not see much latency in the pilot study. The second delay was excluded by removing the latency in the program-robot communication.

We used a factorial design with three factors: 2 levels of the robot’s size, 3 levels of the density, and 3 levels of the control algorithm. The independent variables examined were robot’s size (30 mm and 20 mm), density (sparse, medium, and dense), and subgoal position generation and assignment algorithm (bone-static, bone-dynamic, and silhouette-dynamic). All variables were within participants.

The experiment was conducted in the virtual environment shown in Figure 4. Participants were tasked with guiding swarm robots to a target sheet while making specific hand signs using their right hand. The targets were green hand-shaped sheets with variations shown in Figure 5. Participants were instructed to change their hand signs to the target shapes when reaching the targets. A variety of hand signs were provided so that participants could explore the pros and cons of the subgoal position generation and assignment algorithms in various situations. For example, the silhouette-based algorithm may have fewer collisions than the bone-based algorithm for the rock and scissors hand signs, but they may not make a big difference for the paper hand sign. In the preliminary testing, the static and dynamic subgoal assignment algorithms resulted in very different robot behaviors (i.e., the static assignment sometimes caused the robots to get stuck, while the dynamic assignment did not) when the hand was flipped over. Therefore, the reversed paper hand sign was also provided.

Participants were instructed to move the swarm robots representing their right hand to the purple starting area at the beginning of each task by moving their right hand. Once all the robots have stayed in the starting area for two seconds, a green hand-shaped reaching target appeared either at the left-front or at the right-front of the starting area. Specifically, the target appeared 300 mm to the rear and 173 mm to the side from the center of the starting area. The participants were instructed to reach for the object with the hand sign and to fit the robots in the target area as fast as possible. The target disappeared five seconds after the task started. The participants then return their hands and robots to the starting area.

Each trial consisted of eight tasks, i.e., 4 hand signs × 2 reaching positions. The order of the hand signs and positions was randomized. At the end of the eighth task, the participants were asked to take off the HMD with a text instruction.

4.3.1 Robot’s Size.

The robot’s size influences the visual feedback of the swarm robots. Two robot size conditions were prepared: 20 and 30 mm diameter. The bigger size was set to 30 mm using the size of Zooids [34], an open-source swarm robot often used in HCI research, as a reference. The smaller size was set to 20 mm because it is close to the average adult human finger width [6].

4.3.2 Robot’s Density.

Table 1:

	Sparse	Medium	Dense
	6	18	27
30 mm	6	8	12

Table 1: Number of Robots for Each Size and Density Condition.

Figure 6:

The robot’s density also affects the visual feedback of the swarm. As the density increases, the spatial resolution of the visual representation increases. However, this may complicate the representation and make control difficult as the number of robots increases. Three density levels were considered: sparse, medium, and dense.

We set the number of robots for each size and density condition based on preliminary testing. The results are summarized in Table 1. Through the preliminary testing of different densities, we concluded that at least six robots representing each finger and palm were required to represent a hand, regardless of the robot’s size. Thus, we set the number of robots to six for the sparse condition for both the 20 mm and 30 mm robots. For the dense condition, the number of robots was set to twelve for 30 mm robots because that is the maximum number of 30 mm robots that fit in an average adult hand. To achieve the same density, the number of 20 mm robots was set to 27. In the medium condition, the number of 20 mm robots was set to 18, which was the average of the numbers in the sparse and dense conditions. For 30 mm robots, the number for the medium condition was set to eight to achieve the same density as the 20 mm robots in the medium condition.

The subgoal positions relative to the hand bone need to be manually determined. Thus, we designed the subgoal position distributions shown in Figure 6 through the following step. For the sparse condition, we allocated subgoal positions to all the fingertips and the palm. For the denser conditions, we increased the number of subgoal positions on the palm and distributed them around the palm. If there are more subgoal positions, they were allocated to the proximal interphalangeal joints, to the metacarpophalangeal joints, and then, to the palm and the wrist.

4.3.3 Subgoal Formation Generation and Assignment Algorithm.

As explained in section 3.1 and section 3.2, we examined three combinations of subgoal formation generation and assignment algorithms: bone-static, bone-dynamic, and silhouette-dynamic. The subgoal formation generation algorithm affects not only the visual feedback of the swarm but also the maneuverability of the subgoal formation. Users using the bone-based algorithm can predict subgoal formation’s movements more precisely than those using the silhouette-based algorithm. This is because the silhouette-based algorithm continuously updates the correspondence between the hand and subgoal formation, and a certain hand movement does not necessarily result in the same subgoal formation’s movements.

In addition, the assignment algorithm affects the robot’s movements during hand shape transitions and hand movements. It may further influence the user’s understanding of the correspondence between the hand and robot movements.

4.4 Measurements

Table 2:

Subscale	Questionnaire item
Body ownership	It felt like the swarm robot was my body.
	It felt like some of the robots were my fingers.
	It felt like the swarm robots belonged to me.
	The swarm robot felt like a human hand.
Agency	The movements of the swarm robot felt like they were my movements.
	I felt like I was controlling the movements of the swarm robot.
	I felt like I was causing the movements of the swarm robot.
	The movements of the swarm robot were in sync with my own movements.

Table 2: Questionnaire used in the VR experiment.

We used a questionnaire to assess participants’ sense of body ownership and agency. The questions related to the sense of body ownership and agency in the virtual embodiment questionnaire [55] were modified and used. The questionnaire consisted of eight items in two subsets of questions for the sense of body ownership and agency, as shown in Table 2. Each response was scored on a seven-point Likert scale (1 = strongly disagree; 7 = strongly agree). The scores for the sense of body ownership and agency were calculated by taking the average of the corresponding four questions as suggested by Roth et al. [55]. In addition, task load was measured using the NASA TLX questionnaire [14]. The pairwise comparisons of the factors were performed only after the first trial.

4.5 Procedure

Practice trials were conducted before the experiment to reduce learning effects. Alcohol disinfection was performed on the experimental apparatus and the hands of the experimenter and participant. Participants took a seat and were briefed on the experiment, procedures, data handling, risks, and rights and were instructed to sign a consent form if they agreed. The participants put on the HMD, and if they had no problems with the fit, they did two practice trials (one with the 30 mm, sparse, and bone-dynamic settings and another with the 30 mm, dense, and bone-dynamic). After the practice trials, the participants reviewed the questionnaires, and if they did not understand the question, an explanation was provided by the experimenter.

After the practice trials, the main experiment started. Participants wore the HMD while making the four hand signs in Figure 5 a random order, making each hand sign twice, once toward the left side of the table and once toward the right side of the table. Then, they removed the HMD and answered the questionnaires on an iPad. The participants repeated the task of making the hand signs and questionnaire responses a total of 18 times. To control for learning effects, the order of the experimental conditions (robot size, density, and subgoal position generation and assignment algorithm) was randomized. To control the interference effect of arm fatigue, the participants were asked to ensure that they were not fatigued before each task. A five-minute break was provided after the ninth task. After the 18th questionnaire response, the participants filled out a demographic questionnaire, and a semi-structured interview was conducted for approximately five to ten minutes. The entire experiment lasted approximately 90 min.

4.6 Results

Figure 7:

Figure 8:

Figure 9:

The body ownership score, agency score, and task load index were calculated for three factors: robot size, density, and subgoal formation generation and assignment algorithm. The results are shown in Figure 7, Figure 8, and Figure 9. As these were nonparametric data, we performed an Aligned Rank Transform (ART) [75] followed by a three-factor two-way repeated-measure ANOVA with Holm correction for each subscale (i.e., body ownership, agency, and task load) to investigate the main effects and interactions.

4.6.1 Body Ownership.

ANOVA revealed significant main effects of density (F(2, 153) = 4.00, p = .020$, \eta _{p}^{2}=.05$) and subgoal formation generation and assignment algorithm on the body ownership score (F(2, 153) = 15.28, p = .000$, \eta _{p}^{2}=.17$). There was a trend toward an interaction between size and density, but no significant interaction was found between any of the factors. Therefore, using the Holm-corrected ART-C [9], contrast tests were performed on the size, density, and subgoal formation generation, and assignment algorithm factors. The contrast test on the size factor revealed that there was no significant difference in body ownership score between 20 mm and 30 mm conditions. The sparse condition led to a significantly higher body ownership score than the dense condition (p = .016, cohen’s d = 0.516). Finally, the bone-dynamic resulted in a significantly higher body ownership score than the other two (bone-dynamic and silhouette-dynamic: p = .000, cohen’s d = 1.002, bone-dynamic and bone-static: p = .033, cohen’s d = 0.393), and bone-static resulted in a significantly higher body ownership score than the silhouette-dynamic (p = .002, cohen’s d = 0.609).

4.6.2 Agency.

ANOVA revealed significant main effects of size (F(1, 153) = 5.92, p = .016$, \eta _{p}^{2}=.04$) and subgoal formation generation and assignment algorithm on the agency score (F(2, 153) = 5.07, p = .007$, \eta _{p}^{2}=.06$). No significant interactions were observed between any of the factors. Therefore, using the Holm-corrected ART-C, contrast tests were performed on size, density, and subgoal formation generation and assignment algorithm factors. The contrast test on the size factor showed that the 20 mm condition led to a significantly higher agency score than the 30 mm condition (p = .016, cohen’s d = 0.363). In addition, the bone-dynamic algorithm resulted in a significantly higher agency score than the silhouette-dynamic algorithm (p = .006, cohen’s d = 0.576). No significant differences in agency score were observed for the density factor.

4.6.3 Cognitive Load.

The ANOVA revealed significant main effects of subgoal formation generation and assignment algorithm on cognitive load (F(2, 153) = 6.11, p = .002$, \eta _{p}^{2}=.07$). However, a significant interaction between density and the algorithm was found (F(4, 153) = 3.03, p = .019$, \eta _{p}^{2}=.07$). Therefore, multiple comparisons by ART-C (holm corrected) were performed. As a result, no significant differences were found among all groups, but there was a trend toward differences between silhouette-dynamic + sparse and bone-static + medium as well as bone-static + medium and bone-static + sparse. In addition, the difference in differences test for the density–algorithm interaction showed significant differences between the silhouette-dynamic - bone-static and dense – sparse (p = .039, cohen’s d = 1.278) as well as the silhouette-dynamic – bone-static and medium – sparse (p = .036, cohen’s d = 1.307).

4.6.4 Semi-Structured Interview.

During the interview at the end of the experiment, a few common items were reported. Nine participants reported that the swarm robot felt like their hand at least once, and another participant reported that it felt like it was following their hand. This one participant noted that they often felt that the robot moved late relative to their hand, which led to the sensation that robots were following their hand. The other eight participants also noted that the swarm robots seemed slower under certain conditions and that increasing their speed would lead to a stronger embodiment.

Seven participants noted the importance of fingertips in feeling like the swarm robot as a hand. They reported that when the robot was positioned for each fingertip, they tended to recognize and move it as a hand. In addition, six stated that the hand-like appearance was lost when the robots collided or vibrated with each other, or when they moved, and when they did not fit well into the hand shape, coalesced around the palm, or were misaligned with the hand position.

All participants also mentioned the impact of the size of the robot. Two stated that the larger robot felt more like a hand or that they felt in control, while five stated that the smaller robots felt more like a hand or fingers.

Nine participants mentioned the influence of robot density. Eight stated that the lower density was more likely to cause the embodiment or lead to a higher maneuverability, but three of these stated that the higher density was felt as a hand when the robot’s size was small, i.e., 20 mm. Another participant felt like swarm robots were the hand regardless of density.

4.7 Discussion

In a VR psychophysical experiment, we studied how swarm robot factors like size, density, and algorithm impact embodiment. We focused on the sense of body ownership, sense of agency, and task load in the experimental results.

4.7.1 Embodiment Across Different Sizes of Swarm Robots.

We first compared the embodiment scores for each size condition with the neutral level (i.e., 4 point rating, which is a neutral response to the 7-point Likert scale questionnaire) to evaluate the level of embodiment. This neutral level was the null hypothesis of the tests. The body ownership scores were significantly higher than the neutral level for both 30 mm (p = .018, cohen’s d = 0.413) and 20 mm (p = .039, cohen’s d = 0.351), and the same was true for the agency scores (30 mm: p = .000, cohen’s d = 1.920; 20 mm: p = .000, cohen’s d = 2.172). The cohen’s d values for body ownership score indicate small to medium effect sizes, and those for agency score indicate large effect sizes. Thus, the questionnaire responses imply that the participants felt a higher level of embodiment for both 30 mm and 20 mm robots than the neutral state in which participants neither deny or affirm whether they feel the robots as their body parts or as if they are in control of the robots. Although these alone do not robustly show that Swarm Body was embodied by the participants for both conditions, they are consistent with the interview responses that many participants felt swarm robots like their hands at least once. Therefore, Swarm Body is likely to be embodied to some extent for both 30 mm and 20 mm robots with an appropriate algorithm and density.

When comparing the size conditions, no significant differences were found between the 20 mm and 30 mm conditions in the body ownership score and task load index, while the agency score was significantly higher for 20 mm than for 30 mm with a small to medium effect size (cohen’s d = 0.363). A possible explanation for this is that larger perceived movements of robots when moving fingers led to a higher sense of agency. The movement of a robot relative to its size is larger for a smaller robot. Thus, the movements of the 20 mm robots could be perceived as larger than those of the 30 mm robots. The reported preferences for robot size also varied in the interviews. Therefore, the impact of robot size on the embodiment and the resulting cognitive load is likely to vary among individuals, yet smaller robots tend to provide a higher sense of agency.

4.7.2 Sparse Swarm Robots Achieve A Higher Sense of Body Ownership.

The body ownership score was significantly higher in the sparse than in the dense with a medium effect size (cohen’s d = 0.516), suggesting that swarm robots with lower density is more likely to be felt as a hand. This is consistent with participants’ reports that the lower density of the swarm robots felt more like hands. This may be due to the more frequent occurrence of collisions and vibratory movements of robots, their deviations from the hand shape, and getting stuck around the palm in the higher density conditions, which reduced the sense of embodiment as described by the participants. Another possible reason is that the participant could tell which robots represent fingertips better in the sparse condition; i.e., it offers a better understanding of finger-robot correspondence with a simpler representation. This might result in a higher sense of body ownership as discussed in section 4.7.3.

The body ownership score for the sparse was also significantly higher than the neutral level (p = .004, cohen’s d = 0.639). The agency scores were significantly higher than the neutral level in all the conditions (dense: p = .000, cohen’s d = 1.676; medium: p = .000, cohen’s d = 1.918; sparse: p = .000, cohen’s d = 1.982), although there were no significant differences between the conditions. As stated in section 4.7.1, these comparisons with the neutral level do not guarantee the embodiment of Swarm Body; instead, they support that the level of embodiment is high in the sparse.

4.7.3 Bone-Dynamic Could Realize Higher Level of Embodiment.

The sense of body ownership was found to be greatest for the bone-dynamic, followed by the bone-static and silhouette-dynamic. Similar to the sense of body ownership, the sense of agency showed a tendency that the bone-dynamic results in the highest level followed by the bone-static, and a significant difference between the bone-dynamic and silhouette-dynamic was found. Thus, it is suggested that the bone-dynamic algorithm results in the highest level of embodiment, while the silhouette-dynamic algorithm results in the lowest.

The higher level of embodiment in bone-based algorithms would come from the robot’s ability to represent and respond to fingertip movements. In the interviews, participants reported that they were more likely to perceive the swarm robot as a hand when the robot responded to their fingertip movements. This suggests that the level of embodiment was improved when visual-motor synchronicity occurs even for local movements of the fingertips in addition to the whole hand movements. As the bone-based algorithm represents and responds to the participant’s fingertip movements, it could enhance the level of embodiment through visual-motor synchronicity of fingertips.

In addition, similar to the dense condition, the lower level of embodiment in the static condition may be due to the collisions, and oscillatory movements between robots are more likely to occur with the static assignment. As such, the bone-dynamic is considered to lead to the highest level of embodiment as it allows for visual-motor synchronization down to the fingertips while maintaining representation in various hand gestures and hand movements.

5 Embodiment Experiment with Robots

The VR experiment demonstrated that swarm robots could be embodied, offering insights into their ideal behavior. However, of course, there are differences between the VR environment and the real environment, and the use of real robots may have an effect on the results of the experiment. Therefore, we designed a similar embodiment experiment with fewer factors while considering the VR experiment results to examine whether similar embodiment characteristics can be observed in real-world settings.

5.1 Robot Implementation

We developed custom-made swarm robots to conduct an embodiment experiment in the real world. The robot design was inspired by Zooids [34], an open-source swarm robot platform, but the hardware and software were newly designed.

5.1.1 Hardware Design.

Figure 10:

The VR experiment results suggest that the embodiment of swarm robots occurs for both 20 mm and 30 mm robots and that they show similar embodiment characteristics. Also, owing to the limitations of currently available motor-based actuators and a communication module, the robot size cannot be reduced to 20 mm. Therefore, we assumed that 30 mm robots could be used to conduct the embodiment experiment and designed our robot with that size.

The hardware design is illustrated in Figure 10. The robot parts include a microcontroller unit (STM32G071KBU6² from STMicroelectronics), motor drivers (DRV8837DSGR³ from Texas Instruments), RF module (RF2401F20⁴ from NiceRF), motors with a 26:1 planetary gearbox (Pololu 2357⁵), photodiode (PD15-22C/TR8⁶ from Everlight Electronics), and a 40mAh Li-Po battery.

5.1.2 Communication and Projector-based Tracking System.

Figure 11:

The communication between the robots and the host computer is described in Figure 11. The methods are similar to the ones used for the Zooids. The robots and cradles are each equipped with an RF module and communicate through a 2.4GHz ISM band wireless communication. A projection-based localization system used for the Zooids was used to track the robots. A high-speed projector (DLP LightCrafter 4500⁷ from Texas Instruments) was used to project a sequence of gray-coded patterns onto the table; the two photodiodes on the robot received the projected coded-pattern light, and the microcontroller of the robot decoded its pattern into position information. Then, the robot calculated its orientation from the positions of two photodiodes and broadcasted its position and orientation information to the host computer.

5.1.3 Simulation-Based Robot Control.

The robots are controlled based on a simulation using the framework described in section 3. In particular, we employed the same simulation for the real-world experiment as was used in the VR experiment. Subgoal position generation and assignment are conducted based on tracked hand data, and the robot positions are given by an RVO simulation with nonholonomic constraints. Real robots are commanded to move to the current simulation robot positions every 100 ms. In this manner, the robots can obtain incremental subgoal positions along their paths. Then, the robot controls the rotation of the wheels according to the control law described in section A.1 to reach the subgoal position.

5.2 Participants

A total of 10 participants (4 males and 6 females; 28.89 ± 13.83 (SD) years old) participated in the experiment. Participants were recruited through a social media post. All the participants were unaware of the purpose of the experiment, had normal or corrected vision, and were right-handed. The participants signed a consent form regarding the experiment and were compensated with approximately $16 on Amazon gift cards. The ethics review board approved this study.

5.3 Apparatus and Setup

Figure 12 displays the experimental setup, which is similar to the VR experiment. In this study, a hand tracker (Leap Motion Controller 2⁸ from Ultraleap) was used to track hands instead of Meta Quest 2 to create the system without an HMD. The hand tracker is located under the table, as shown in the Figure 12 (bottom).

Figure 12:

5.4 Experiment Design and Conditions

This experiment was similar to the VR experiment. However, the robot size was fixed at 30 mm, followed by a 3 × 3 factorial design. The independent variables examined were density (sparse, medium, and dense) and subgoal position generation and assignment algorithm (bone-static, bone-dynamic, and silhouette-dynamic). All variables were within the subject.

The task was similar to that in the VR experiment, which involved reaching a target with a specified hand shape. One difference was that in the VR experiment, the targets were positioned in the right and left fronts, but in the real-world experiment, the target position was limited to the front, and accordingly, the number of tasks per trial was halved to four (i.e., four hand signs). This is because the swarm robots’ battery capacity would not hold a charge until the end of the experiment, with eight tasks per trial.

In addition, instead of the targets automatically (dis)appearing on the desk in the VR, the experimenter manually placed and removed the targets printed on paper on the desk. To avoid the potential slipping of the robots on the target paper, the task was changed from moving the swarm robots to fit in the target to moving the swarm robots to a specified position in front of the target and making a specific hand shape.

5.5 Measurements

The same subjective measurements as those used in the VR experiment (i.e., the modified embodiment questionnaire and NASA TLX) were used to evaluate the sense of body ownership, sense of agency, and cognitive load.

5.6 Procedure

This procedure is similar to that used in the VR experiments. Practice trials were conducted before the experiment to reduce the learning effects. After signing the consent form, participants started the practice trials (one under sparse and bone-dynamic and another under dense and bone-dynamic).

During the main experiment, the participants wore headphones with white noise and put their hands under the table to control the swarm robots. The participants were instructed to move the swarm robots representing their right hand under the table to the starting area at the beginning of each task. When all the robots returned to the starting area, the experimenter specified a hand sign by posting handshapes on paper on the desk. The participants were instructed to move their swarm robot hand forward with a specified hand sign. The hand sign sheet was removed after five seconds, and the participants moved their hands back to the starting area. Participants repeated the reaching task and completed the questionnaire nine times.

To control the interference effect of arm fatigue, the participants were asked to ensure that they were not fatigued prior to each task. A five-minute break was provided after the fifth task. After the ninth questionnaire response, the participants answered a demographic questionnaire, and a semi-structured interview was conducted for approximately five to ten minutes. The entire experiment took approximately one hour.

5.7 Results

Figure 13:

Figure 14:

The body ownership score, agency score, and task load index were calculated for each combination of robot density and subgoal position generation and assignment algorithm. The hand tracker did not work properly and frequently lost track during the experiment of one of the participants, owing to the reflection of its own infrared light at the bottom of the table. As this might have strongly affected their data, we excluded them from the results and analysis. The results are shown in Figure 13 and Figure 14. Similar to the analysis of the VR experiment results, we performed an ART procedure followed by a three-factor, two-way repeated-measures ANOVA with holm correction for the body ownership score, agency score, and task load index.

5.7.1 Body Ownership.

ANOVA revealed a significant main effect of density (F(2, 64) = 3.31, p = .043$, \eta _{p}^{2}=.09$). No significant interaction was found between the density and algorithm factors. Therefore, contrast tests were performed on the density and algorithm factors using the holm-corrected ART-C. The contrast test on the algorithm factor showed no significant differences in body ownership score between any of the conditions. The test on the density factor showed that the medium condition led to a significantly higher body ownership score than the sparse condition (p = .047, cohen’s d = 0.676).

5.7.2 Agency.

ANOVA revealed a significant main effect of density (F(2, 64) = 3.55, p = .034$, \eta _{p}^{2}=.10$) and a moderate main effect of algorithm (F(2, 64) = 2.89, p = .063$, \eta _{p}^{2}=.08$). No significant interaction was found between the density and algorithm factors. Therefore, contrast tests were performed on these factors using the holm-corrected ART-C. The contrast test on the algorithm factor showed no significant differences in agency score between any of the conditions, but there was a trend that the bone-dynamic condition led to a higher agency score than the silhouette-dynamic condition (p = .068, cohen’s d = 0.635). The contrast test on the density factor also showed no significant differences in agency score between any of the conditions, but there were trends that medium condition led to a higher agency score than the sparse condition (p = .052, cohen’s d = 0.665) and that the dense condition led to a higher agency score than the sparse condition (p = .072, cohen’s d = 0.583).

5.7.3 Cognitive Load.

ANOVA revealed no significant main effects of density and subgoal position generation and assignment algorithm. No significant interactions were observed. Therefore, contrast tests were performed on the density and algorithm factors using the holm-corrected ART-C. The contrast tests on these factors revealed no significant difference in the task load index between sparse, medium and dense conditions, as well as between the bone-static, bone-dynamic, and silhouette-dynamic conditions.

5.7.4 Semi-Structured Interview.

A few common items were identified during the interview at the end of the experiment. Four participants reported that the swarm robots felt like their hands in some trials, while the other five participants, including the participant under the poor hand tracking condition, reported that the robots were following their hands rather than being their own hands. One of the five participants noted that it was difficult to distinguish between the hand signs, making it feel like a mass following the hand. The other participant reported that the robots sometimes felt like their hand, but they were not fully convinced that it was their hand. Three participants mentioned that the control was intuitive when the robot was positioned for each fingertip.

Eight participants mentioned the influence of density on the embodiment. Six stated that the dense robots felt more like the hand or something they could control better; one preferred medium and sparse, and one preferred sparse. One of those who preferred the dense condition stated that the dense swarm presented a sense of oneness and coherence. Another participant reported that the swarm robots were embodied regardless of their density.

5.8 Discussion

To investigate the influence of swarm robot-unique factors on embodiment in real-world settings, we conducted a psychophysical experiment with the real swarm robots we developed, analyzing three aspects: sense of body ownership, sense of agency, and cognitive load. The results were also compared with those of the VR embodiment experiment in section 4, discussing the unique characteristics of the real-world system’s effect on swarm robot embodiment.

5.8.1 Embodiment of Swarm Body with Bone-Dynamic Algorithm.

No significant difference was found between any of the algorithm conditions. When compared with the neutral level (4 point rating in the 7-point Likert scale for body ownership and agency; 50 point rating in the 100-point NASA TLX for cognitive load), the bone-dynamic condition resulted in significantly higher body ownership and agency scores, and a significantly lower cognitive load score than the neutral levels with large effect sizes (body ownership: p = .042, cohen’s d = 0.751; agency: p = .000, cohen’s d = 2.354; cognitive load: p = .000, cohen’s d = 1.722, where the neutral levels were the null hypotheses). Thus, the questionnaire responses suggest that the participants felt a higher sense of embodiment for the bone-dynamic algorithm than the neutral state in which participants neither deny or affirm whether they feel the robots as their body parts or as if they are in control of the robots. This is consistent with the VR study finding that the bone-dynamic condition achieved a higher level of embodiment than neutral levels. This is also consistent with the interview responses from three participants that they could control swarm robots intuitively when the robots were positioned for each fingertip (i.e., bone-based algorithms). The interview responses further suggest that some of the participants felt swarm robots as they would their hands in some trials though we cannot tell which algorithm conditions they talk about. Overall, these questionnaire and interview results indicate that Swarm Body was possibly embodied in some trials, and if so, the embodiment probably occurred with the bone-dynamic algorithm.

5.8.2 Shift in Preference toward Denser Swarm.

When comparing the size conditions, the medium density condition resulted in a significantly higher body ownership score than the sparse condition. The medium and dense conditions tended to result in a higher agency score than the sparse condition. These results are consistent with the interview results that six out of eight participants who mentioned the influence of density preferred denser conditions. However, these results seem to be inconsistent with the VR experiment’s result that the sparse condition led to a higher sense of body ownership than the dense condition. There are three possible causes for this shift in preference for denser conditions in the real world: the increased importance of visual similarity, sense of accomplishment, and collisions.

First, visual similarity may be more important than understanding the correspondence between the body parts and robots in the embodiment in real-world settings. As discussed in section 4.7.2 and section 4.7.3, sparse swarm robots with bone-based algorithms achieved a higher level of embodiment by offering a better understanding of the finger-robot correspondence through a simpler hand representation. However, as suggested in [3], the visual similarity of an object to a hand affects its level of embodiment. This effect might become more dominant in real-world settings. This hypothesis was supported by the participants’ comments that the dense swarm had a greater sense of oneness and coherence and that the denser the robot, the more it felt like a hand shape.

Second, the sense of agency may be influenced by the expected amount of effort required to move the object to be embodied. Four participants reported a stronger feeling of controlling the robots and a stronger sense of accomplishment in the dense condition.

Third, real-world systems have more collisions between robots, which might have reduced the level of embodiment in the sparse condition. As discussed in section 4.7.2, the lack of collisions likely contributed to the high level of embodiment in the sparse condition in the VR study. Real-world systems cause more collisions owing to the robot position errors between the simulation and the real world. As a result, our real-world system had collisions even in the sparse condition, which potentially decreased the level of embodiment. It is also important to note that some robots turned off when they were stuck during the real-world study. This is because our control program turns off a robot when the torque applied on its motors exceeds a certain threshold to protect the motors. When that happened, the experimenter quickly turned them on, but this could affect the study results.

Thus, the participants’ preferences shifted toward denser swarm robots, and the levels of embodiment in the dense and medium conditions were relatively higher in the real-world study. And the shift was possibly caused by the increased importance of visual similarity, a sense of accomplishment, and collisions. This finding is somewhat limited by the fact that the preference shift might come from increased collisions which is partly dependent on our implementation. However, as most real-world swarm robots have more collisions than their simulations, our finding is valuable for designing real-world embodied swarm robots though the degree of this shift may vary. In addition, several consistent trends were observed throughout the VR and real-world experiments. For example, higher embodiment levels were reported when the robots were located at the fingertips, and a lower cognitive load was measured in the sparse condition.

6 Applications

Swarm Body expands the design space of tangible and embodied interaction, offering unique characteristics such as robustness, flexibility, and scalability to the human body. The components can move without geometrical constraints other than collision with each other. Additionally, although the current implementation requires a projector, the robot itself is not anchored to a specific environment; thus, our system is versatile and can be used in a variety of locations, including ordinary desks.

The main application of Swarm Body is physical telepresence, where embodied swarm robots facilitate physical interaction with remote people and environments, as shown in Figure 1 (right). The operator can control Swarm Body projected on their table as if manipulating their own hands. The other person can physically interact with the operator through Swarm Body.

Our physical telepresence system is inspired by Physical Telepresence Workspace by Leithinger et al. [37], particularly in the physical representation of the user’s hand using hand sensing and spatially-aligned visual feedback. Our work extends their interaction capabilities through swarm robots characteristics, such as swarm splits, mergers, transfers, and obstacle avoidances. Below, we outline scenarios that showcase new interaction opportunities in physical telepresence enabled by the characteristics of swarm robots.

6.1 Multipliable Body

Swarm Body can split from a single swarm into multiple swarms, each representing different body parts, and then merge back into a single swarm. These splits and mergers allow the user to adjust the number of independent swarms and the number of robots constituting each swarm as shown in Figure 15. This enables the user to seamlessly switch between one- and two-handed telepresence in a single interface. For example, when organizing a desktop remotely, the user can employ both hands to efficiently collect objects and then marge the robots to the dominant hand for precise organization. The user can duplicate one hand into two, enabling the performance of two similar tasks simultaneously through parallel embodiment as in [68]. Additionally, unlike existing embodied robots that require one system for each user, Swarm Body supports multiple users manipulating the robots through a single interface. For example, while one remote user engages in an activity such as rolling a ball with a local person, another remote user can allocate half of the robots to form a new hand and participate.

Figure 15:

6.2 Form-Giving to Transformable Body

The malleability of Swarm Body enables the transformation of the swarm into various body parts of different sizes and unconventional forms (e.g., a small hand, elongated fingers, and a tentacle-like limb) (Figure 16). This transformation provides enhanced interaction freedom while preserving embodiment features, such as intuitive swarm manipulation. For example, one can experience the affordance of objects from a child’s perspective by interacting with them using Swarm Body, which simulates a smaller hand, as in [47]. Swarm Body can extend its fingers or transform them into tentacle shapes to reach and grasp objects at a distance or in narrow gaps. Similar to how pixels on a screen represent a range of embodied avatars, Swarm Body physically embodies diverse avatars through its ability to transform.

Figure 16:

6.3 Adaptability to the Environment

Swarm Body adapts to its environment by avoiding obstacles, resizing, or transforming itself as needed. As shown in Figure 17, our control method introduced in section 3 enables the swarm robots to not only mimic the body movements but also avoid obstacles. In a tabletop environment with obstacles, the user can interact with the environment without the need to intentionally avoid the obstacles. For example, when picking up a pen on the other side of a mug, the user can reach for it as if the mug did not exist. We believe that Swarm Body could develop into a system that seamlessly adapts to various settings, easing interactions even in cluttered spaces giving it the potential to eliminate physical barriers in interactions, enabling smoother engagements than what our own bodies can typically achieve.

Figure 17:

6.4 Emotional Haptic Notification

Swarm Body allows the user to exert horizontal forces on remote objects or individuals in an embodied manner. This further expands the design space of physical telepresence with vertical actuation previously explored by Leithinger et al. [37]. Its embodiment aspect can also introduce intuitive and affective interactions, taking advantage of swarm characteristics in swarm user interface (SUI). Swarm Body achieves haptic communication by touching people Figure 18. Specifically, they can naturally get a person’s attention or express emotions to an intimate partner with various forces and touch. For example, by gently tapping a person’s arm engrossed in desk work, the robot can capture the person’s attention and initiate communication. Although previous studies have explored haptic feedback using swarm robots [26], our system enables haptic feedback with the embodiment of the user. Thus, Swarm Body has the potential to haptically mediate the emotions and intentions of the user to the notified person.

Figure 18:

6.5 Gesture Presentation

Another interaction modality of Swarm Body is vision. When embodied as hands, swarm robots can communicate through gestures with a physical presence in remote environments, as illustrated in Figure 19. A remote individual can utilize physical gestures during their online presentations to boost engagement. In a remote collaboration scenario, a remote user can point to specific objects or convey simple reactions. For example, a remote craft instructor can point to the tools the students need to use at each step, direct their hand movements, and send a physical thumbs-up reaction upon task completion.

Figure 19:

7 Limitations and Future Work

Our study did not comprehensively investigate the embodiment characteristics of swarm robots and their applications.

7.1 More Extensive Investigation on Design Parameters

Our investigation focused on a subset of the factor levels that should be examined to understand the embodiment of swarm robots on the hand. Therefore, further studies should explore other possible factor levels. For example, it is possible to investigate the embodiment characteristics of robots smaller than 20 mm in size or under even denser conditions than the dense condition in the current work. Studies in VR on these conditions may show how much embodiment is possible in theory. The obtained level of embodiment for this theoretical condition will be a baseline when evaluating real-world systems.

We did not examine some design parameters and complex conditions to make the experiments feasible. Future research should explore various parameters including the robot’s latency, speed, acceleration, color, and shape. Additionally, dynamic changes of the parameters seem to be promising approaches. For example, applying the bone-based algorithm to the fingertips and a silhouette-based algorithm to the other parts of the hand could integrate the advantages highlighted in our discussion. Also, robots in a swarm can have different sizes, shapes, densities, and functions and change them dynamically as Li et al. demonstrated [39]. Therefore, further investigation on the embodiment characteristics and applications of such swarm robots and algorithms is expected.

7.2 Recognition as a Body Part by an External Observer

While we revealed the embodiment characteristics of swarm robots for the operator, we did not investigate whether an external observer could recognized the robots as someone’s body parts. In our preliminary testing, an observer was able to differentiate hand signs, although they were aware that the robots were representing a hand in advance. Since visual feedback is the only information source for an observer, a silhouette-based subgoal generation algorithm, which reduced the embodiment level for the operator in our study, might enhance their recognition of the robots as a hand. Further study on how an external observer recognizes the swarm robots is expected.

7.3 Beyond Tabletop Robots and Embodiment of the Hand

Swarm robots moving in 3D space, such as swarm drones, should be explored. Although some of our methods and findings will be applicable to them, swarm robots moving in a 3D space have unique design parameters (e.g., subgoal positions on the skin surface only vs. those on the entire volume, including the interior of the hand). This exploration also provides insight into how dimensional matching of the user’s hand movements to the robot’s subgoal formation affects a level of embodiment. If 3D formations give a higher level of embodiment, then restricting the user’s hand movements to the 2D plane (i.e., avoiding supinations and pronations) may also improve the level of embodiment in our 2D system. In addition, their 3D formation and movements will open up a further interaction design space. For example, embodied 3D swarm robots could pick up objects or enter space that is not accessible for wheeled swarm robots. Note that the projection-based robot localization method used in our study could be adapted to estimate the position and orientation of such swarm robots in 3D space [17]. These can be determined by solving the Perspective-n-Point problem, which involves considering the sensors mounted on the robots as points.

The lack of investigation on body parts besides hands also limits this study. We focused on hands as they are the most commonly used body parts for interaction with the environment. Although we revealed the embodiment characteristics of swarm robots for the hand, it remains unclear if the same tendencies apply to other body parts. We believe that our findings in the hand can be used to verify future studies on other body parts. Moreover, to expand the design space of Swarm Body, future research focusing on the effects of embodiment in body parts that differ from the user’s actual body size and shape could be highly beneficial. This would open up new capabilities and applications for Swarm Body, allowing the user to manipulate objects at micro or macro scales with larger or smaller hands or using different shapes such as tentacle-like limbs.

7.4 Exploration of the Applications

Although we have presented several application scenarios, their effectiveness has yet to be evaluated. In future work, we plan to showcase our applications through interactive demonstrations and videos, thereby collecting direct feedback from participants. We also expect that this feedback will allow us to discover new application possibilities of Swarm Body beyond our initial scope.

7.5 Swarm Control Algorithm Dedicated for Embodied Behavior

Lastly, collisions and misalignments between our robots might have affected the participants’ evaluation of embodiment. During the embodiment experiments in both VR and the real world, the experimenter sometimes observed collisions and misalignments of the robots. These undesirable behaviors potentially come from conditions, such as a large robot size or a high density, that make it difficult for swarm robots to avoid collisions while following a hand. As mentioned in section 5.8.2, our control program shuts down the robot in the case of excessive torque to protect the motors. This might affect the participants’ sense of embodiment, leading to a different result from the VR study, even though the experimenter turned them on immediately. In other words, the participants’ preference toward the denser swarm robots might be the characteristics of our real-world implementation not our approach. However, our real-world experiment and the interview responses provided some consistent and generalizable insights, such as a higher level embodiment coming from fingertip representations and a lower cognitive load in sparse conditions. To address the collision issues, a specialized swarm control algorithm tailored for embodied behavior could potentially reduce collisions and enhance the embodied experience. Further psychophysical embodiment experiments with a control algorithm with fewer collisions will help us understand whether the differences in our VR and real-world study results are due to the approach or our implementation. It will also deepen our understanding on the embodiment characteristics we observed in both VR and real-world settings (i.e., a higher level of embodiment coming from fingertip representations and a lower cognitive load in sparse conditions).

8 Conclusion

We proposed a new embodied system concept, embodied swarm robots, a group of robots collectively acting as a human body part. We presented a framework for the embodiment of swarm robots, investigating their characteristics in both VR and real-world environments. Our results offer two key insights into the embodiment of swarm robots.

(1)

Swarm robots are likely to be embodied in VR and real-world scenarios using a suitable algorithm and density though there are some individual differences.

(2)

The choice of swarm body control algorithm influences the level of embodiment, impacting both the visual-motor synchronicity of fingers and the frequency of robot collisions.

Additionally, we explored applications of our system, demonstrating how embodied swarm robots can enrich tangible and embodied interactions between humans and the environment.

Acknowledgments

We thank all of our participants for their time and invaluable feedback. Our special thanks go to Zendai Kashino, Masahiko Inami, and researchers at The University of Tokyo’s Information Somatics Lab for their discussions. We also thank Karakuri Products, Inc. and Noura Howell and her students at Georgia Institute of Technology for their wonderful help. This work is partially supported by JST AIP Acceleration Research JPMJCR23U2, Japan.

A Appendix

A.1 Swarm Robot Control for Following Subgoal Position

The robot receives the subgoal (target) position from the host computer via RF communication and moves to follow it. It is desirable for the robot to follow a smooth path to the subgoal position. In controlling wheeled robots, using a Bézier, spline, or cross-oid curve as the path is common. This curve passes through the current and subgoal positions to avoid sudden changes in angular velocity. However, due to the limited computational resources of the robot’s microcontroller, it has been difficult to implement control that sequentially calculates and follows these curves. On the other hand, since the robot can get its absolute position information with the projection-based method, and the subgoal position is updated by the host computer every 100 ms, the distance between the current position and the subgoal position is considered to be close, and we thought that sudden changes in angular velocity would be rare even on a path connecting these two points by a straight line. Thus, we implemented a control model that follows the straight path.

Figure 20:

The concept of the control model is shown in Figure 20. Let P₀(x₀, y₀) be the robot’s initial position at time t = 0 given the subgoal (target) position and P_G(x_G, y_G) the subgoal position. Next, define $\overrightarrow{P_0 P_G}$ as the vector connecting points P₀ and P_G, with L (L = L₀ at t = 0) as the distance between them, and θ (θ = θ₀ at t = 0) as the angle between the robot’s direction of motion and the vector $\overrightarrow{P_0 P_G}$. (Counterclockwise is defined as positive.) In this case, the following law controls the velocities V_l and V_r of the robot’s left and right motors.

\[\begin{eqnarray*} \begin{split} V_l &= V - \Delta V\\ V_r &= V + \Delta V\\ V &= K_L L\\ \Delta V &= K_\theta \theta + K_{\dot{\theta }} \dot{\theta }, \end{split} \end{eqnarray*}\]

where $K_L, K_\theta, K_{\dot{\theta }}$ are constants that represent the control gain. Based on the above control law, $\overrightarrow{P_0 P_G}$, the distance L_n, and the angle θ_n are updated from the robot’s current position P_n(x_n, y_n) to the subgoal position, and V_l and V_r are calculated and output. However, if this control law is followed, the point P_G is theoretically unreachable and will never fully converge, since L = 0 and the velocity will converge to 0 as the robot approaches P_G. Therefore, the robot is judged to have converged when it is within a certain distance σ from P_G, and the robot stops at that point. In addition, the minimum velocities V_{l, min} and V_{r, min} are set for V_l and V_r, respectively. The maximum speed of motors V_max is determined by the maximum speed value of the slower of the two motors. This is based on the actual measured speeds of the left and right motors as an upper limit. This can be written in the formula as follows:

\[\begin{eqnarray*} < V_{l,min}$}\\ V_{max} & \text{if $V_l > V_{max}$}\\ < V_{r,min}$}\\ V_{max} & \text{if $V_r > V_{max}$}\\ < V_{l,max}$} \end{array}\right.}, \end{array}} \end{eqnarray*}\]

where V_{l, max} and V_{r, max} are the maximum speed value of left and right motors.

If a new subgoal position P_G is given before convergence, P_G is updated, and the subgoal following continues. Here, the dimension (unit) of the calculated V_l and V_r is [mm/s], but the dimension of the PWM duty of the voltage value, which is a control value that can be input to the motor, is [rpm]. Therefore, the calculated V_l and V_r are converted to the PWM duty value by the following formula:

\[\begin{eqnarray*} \begin{split} Duty_l &= f(V_l)\\ Duty_r &= g(V_r), \end{split} \end{eqnarray*}\]

where Duty_l and Duty_r are the PWM values of left and right motors, and f(x) and g(x) are functions corresponding to the PWM value calculated by the robot’s calibration mode. We confirmed that this control rule can be used to control the robot to successfully reach the goal position even with a microcontroller with limited computational resources.

Footnotes

https://www.meta.com/quest/products/quest-2/

https://www.st.com/en/microcontrollers-microprocessors/stm32g071kb.html

https://www.ti.com/product/DRV8837/part-details/DRV8837DSGR

https://www.nicerf.com/item/nrf24l01-module-rf2401f20

https://www.pololu.com/product/2357

https://everlighteurope.com/ir-detectors/2374/PD1522CTR8.html

https://www.ti.com/tool/DLPLCR4500EVM

https://leap2.ultraleap.com/leap-motion-controller-2/

Supplemental Material

MP4 File - Video Preview

Video Preview

Download
72.02 MB

MP4 File - Video Presentation

Video Presentation

Transcript for: Video Presentation

PDF File - Plots, Interview prompts and responses

Plots of effective sizes (Cohen's d) and their confidence intervals for the VR and real robot embodiment experiments. Interview prompts and responses in both of the experiments.

Download
1.51 MB

References

[1]

Javier Alonso-Mora, Andreas Breitenmoser, Martin Rufli, Roland Siegwart, and Paul Beardsley. 2011. Multi-Robot System for Artistic Pattern Formation. In 2011 IEEE International Conference on Robotics and Automation. 4512–4517. https://doi.org/10.1109/ICRA.2011.5980269