research-article

Open access

UHTP: A User-Aware Hierarchical Task Planning Framework for Communication-Free, Mutually-Adaptive Human-Robot Collaboration

Authors:

Kartik Ramachandruni,

Cassandra Kent,

Sonia ChernovaAuthors Info & Claims

ACM Transactions on Human-Robot Interaction, Volume 13, Issue 3

Article No.: 44, Pages 1 - 27

https://doi.org/10.1145/3623387

Published: 26 August 2024 Publication History

PDF eReader

Abstract

Collaborative human-robot task execution approaches require mutual adaptation, allowing both the human and robot partners to take active roles in action selection and role assignment to achieve a single shared goal. Prior works have utilized a leader-follower paradigm in which either agent must follow the actions specified by the other agent. We introduce the User-aware Hierarchical Task Planning (UHTP) framework, a communication-free human-robot collaborative approach for adaptive execution of multi-step tasks that moves beyond the leader-follower paradigm. Specifically, our approach enables the robot to observe the human, perform actions that support the human’s decisions, and actively select actions that maximize the expected efficiency of the collaborative task. In turn, the human chooses actions based on their observation of the task and the robot, without being dictated by a scheduler or the robot. We evaluate UHTP both in simulation and in a human subjects experiment of a collaborative drill assembly task. Our results show that UHTP achieves more efficient task plans and shorter task completion times than non-adaptive baselines across a wide range of human behaviors, that interacting with a UHTP-controlled robot reduces the human’s cognitive workload, and that humans prefer to work with our adaptive robot over a fixed-policy alternative.

1 Introduction

Robots have begun to work alongside humans in multiple industries, such as manufacturing, logistics and healthcare [7]. To enable safe interaction with humans in these applications, modern industries are adopting collaborative robots which are built with lightweight materials and use on-board sensors to maintain human-safe speed and force limitations [17]. However, collaborative robots have not achieved their full potential as these robots are currently made to work near humans on independent tasks instead of helping human coworkers on shared tasks. In order to fulfill their role as assistive systems, collaborative robots must be able to adapt their actions to accommodate their human partners on shared tasks.

Solutions to collaborative human-robot task execution typically fall under three categories based on action selection strategy: instructional, cooperative and collaborative approaches [3]. Instructional approaches strictly utilize a leader-follower paradigm where the robot follower passively follows action sequencing or role assignment decisions made by the human leader. This category includes today’s industry standard approaches, where robots’ actions are fixed through pre-programming, specified by a human partner using a teach pendant [13, 18, 34], or learned from human demonstration [36, 45]. Cooperative approaches divide a task into fully-independent subtasks that can be separately assigned between the human and robot partners, allowing each agent to handle their own subtask execution [14, 22]. Finally, collaborative approaches require mutual adaptation, allowing both partners to take active roles in action selection and role assignment, typically toward a single shared goal [10, 11, 15, 16, 33, 37, 38, 41, 42, 43]. Such approaches are necessary for effective human-robot collaboration, but the interdependence between actions and the need for mutually adaptive behaviors makes this a challenging problem that requires further research [3].

Among the above collaborative techniques, prior works can further be distinguished by whether the human and/or robot can take active roles in facilitating the collaboration, and whether explicit communication¹ is required for coordination. For example, hierarchical task models [10, 15] and game-theoretic approaches [16] have been used to coordinate multiple agents to optimally solve shared tasks; such techniques often require the human partner to directly follow their assigned actions [15], or assume the human will follow an optimal policy when planning ahead [10, 11, 16], limiting their ability to efficiently collaborate with realistic human behavior. Other scheduling approaches are designed for real-time adaptation to the human partner’s decisions [38, 43], but require tasks to be specified as sets of constraints, and in some cases limit the robot to a follower role [43]. Communication-free adaptive leader-follower approaches, in which the robot partner can switch to the leader role and expects the human to act as a follower when necessary, have been shown to be effective for physical human-robot collaboration tasks [41, 42], but such approaches have not been adapted to the longer time horizon tasks seen in the hierarchical planning literature. While some hierarchical task planning approaches do support fully-collaborative human-robot task execution [10, 14, 33, 37], they require significant communication effort between the robot, the human, and in some cases the planner itself, which can increase the human’s cognitive load and slow down task execution time.

Our work presents a communication-free human-robot collaborative approach for adaptive execution of multi-step tasks that moves beyond the leader-follower paradigm. Specifically, our approach enables the robot partner to observe the human partner, perform actions that support the human partner’s decisions, and actively select actions that will guide the collaborative process to maximize expected efficiency. In turn, the human partner is free to make their own action decisions based on their observation of the task and the robot, without having their actions dictated by a scheduler or the robot partner itself. As such our approach preserves the autonomy of the human, while still making use of the robot’s task optimization capabilities, resulting in efficient, mutually-adaptive collaboration.

We introduce the User-aware Hierarchical Task Planning (UHTP) framework. UHTP takes advantage of prior work by Chen et al. [4] in which a single-agent Hierarchical Task Network (HTN) is generated from unannotated human demonstrations, alleviating the need to engineer task models with extensive manual domain knowledge elicitation as required by some prior collaborative hierarchical task planning and execution approaches [1, 10, 11, 40]. UHTP adapts this single-agent HTN into a representation suitable for human-robot collaboration by adding potential agent assignments and estimated primitive action costs. The primitive action costs are propagated to the network’s higher-level abstract actions using characteristic accumulation functions, which allow the algorithm to reason about the costs of various execution paths. UHTP then uses the HTN structure itself to monitor the current state of the task using real-time human action recognition. By combining the abstract node costs with real-time perception, our algorithm is able to efficiently adapt to any deviations in human behavior from the optimal execution plan without significantly increasing task execution time. In addition, searching for the optimal execution path is computationally inexpensive in comparison to online re-optimization approaches [15, 22, 43], and is suitable for real-time adaptive task execution.

We summarize the contributions of our work as follows:

(1)

We introduce a multi-agent HTN representation that extends the traditional HTN representation to incorporate agent action assignment and action cost accumulations for collaborative human-robot task execution.

(2)

We present UHTP: a novel task planning and execution framework that uses the collaborative HTN representation to track the state of a collaborative assembly task in real-time, adapt to the human partner’s decisions, and actively select robot actions that minimize a combined task cost.

(3)

We show that UHTP allows both the human and the robot to guide collaborative task execution without requiring communication.

We evaluate UHTP in simulation over two complex assembly tasks with interdependent actions, shared action spaces, and optional execution steps, and in a human subjects experiment with a physical robot performing a drill assembly task. Our statistically significant results show that (1) UHTP is capable of adapting to a wide range of human behaviors to achieve more efficient task plans and shorter completion times than non-adaptive and greedy baselines, (2) interacting with a UHTP-controlled robot reduces the human’s cognitive workload, and (3) human partners prefer to work with the adaptive robot enabled by UHTP control rather than a robot controlled by an industry-standard fixed policy.

2 Related Work

Collaborative human-robot task execution requires the robot to model its own actions, the actions of the human collaborator, and to adapt its own task execution to the behavior of the human. Prior work has proposed variations on both the task representation and the adaptation mechanism, and we present an overview of these approaches below.

AND/OR Graphs: An AND/OR Graph is a hierarchical and compositional representation with a directed acyclic graph structure [44]. The nodes of an AND/OR graph represent possible task states at various levels of abstraction, while the edges define the actions required to transition between states. As such, this single model allows for representation of tasks with multiple alternative execution paths.

Darvish et al.’s FlexHRC collaborative task execution system [11] builds on AND/OR graphs to maintain a set of active collaboration paths and add cost information about each action to every graph edge. This approach allows a robot to plan ahead and select the action that minimizes task cost, and was made more efficient in later work [10]. While it can re-plan if the human takes an unexpected action, FlexHRC’s action planning assumes the human partner will behave optimally, which can result in inefficient collaboration. We use a similar method to maintain active collaboration paths with a hierarchical task model, but instead minimize expected path costs over all potential human actions. This idea is similar to Hawkins et al.’s earlier human-robot collaborative work using AND/OR graphs [24], although Hawkins et al.’s approach does not consider dynamic role assignment. We also alleviate the effort of defining costs for every subtask by leveraging hierarchical structure to propagate action costs from the leaf nodes throughout the rest of the model.

Other human-robot collaboration approaches develop assembly plans by performing searches over AND/OR graphs [29, 31]. Johannsmeier and Haddadin [29] use a three-layer architecture to perform offline task planning and online action execution using an AND/OR graph, while Knepper et al. [31] develop a fully online planning system that generates a search tree of assembly states derived from the task graph constraints, which is used to assign actions to all agents. However, both approaches limit human partners to a passive role by assigning actions to human workers rather than allowing them to freely make decisions. Our approach maintains the human worker’s autonomy by using HTN structure to infer potential future collaboration paths, and select robot actions that best support the human.

Behavior Trees (BTs): A Behavior Tree (BT) represents a robot policy in the form of a hierarchical tree structure, with states as leaves. The representation is similar to AND/OR graphs, but includes execution-specific control flow nodes. The hierarchical nature of BTs promotes modularity, scalability, and human interpretability, making this representation popular in both game development and robotics applications [27]. Work by Fusaro et al. [14, 15] demonstrates the potential of BTs for human-robot collaboration over industrial assembly tasks. Their work using flat BTs to assign lowest-cost independent subtasks to robot collaborators enables human-robot coordination while considering a variety of different types of costs [14], and their use of hierarchical BTs demonstrates role assignment for multi-agent collaboration with interdependent tasks [15], but, similar to the AND/OR graph search methods discussed previously [29, 31], requires the human partners to follow explicitly assigned tasks.

Hierarchical Task Networks (HTNs): Unlike state-based AND/OR graphs and BTs, an HTN is an action-based task representation that decomposes a task into its component actions while preserving constraints among them [12]. Using an action-based instead of state-based hierarchical representation allows the accumulation of cost information [5] – a representational attribute we use to reduce the amount of knowledge specification required for our approach. HTNs also possess the flexibility of being either manually specified or learned from expert demonstrations [4, 25]. Although HTNs are traditionally designed for single-agent tasks, prior work contains extensions that make HTNs suitable for producing human-robot plans for collaborative assembly tasks [6, 33, 37].

Milliez et al. represent the human worker’s knowledge level of a task using an HTN and generate a collaborative plan for a human-robot team using one of two policies – the former policy favoring teaching by assigning more unknown tasks to the human and the latter maximizing efficiency by reducing the number of unknown subtasks given to the human [33]. Their approach handles plan generation, task execution, and monitoring, allowing them to adapt and re-plan online. However, the knowledge level of the human worker needs to be manually determined, and the same knowledge representation cannot be used for humans with different knowledge levels, restricting the generalization capability of the method. Roncone et al. [37] treat each executable subtask in a collaborative HTN as an independent Partially-Observable Markov Decision Process (POMDP) to be solved. The approach focuses on supporting adaptability through transparency, and thus requires explicit communication between the human worker and robot’s planner, which can slow down task execution. Most similar to our approach, Cheng et al. introduce a communication-free optimization-based planner for human-robot collaboration over a learned hierarchical task model [6]. One limitation of their approach is that it uses a limited planning horizon that specifically prioritizes parallel task execution in an attempt to reduce execution time, whereas our approach is capable of optimizing more general costs over the entire remaining duration of the task.

Scheduling Algorithms: Task scheduling provides an alternative approach to hierarchical task modeling for solving task allocation and action sequencing for multi-robot collaboration [22, 39, 43]. Notably, Chaski [38, 39] and APA [43] explicitly support human-robot collaboration by using least-commitment schedulers optimized for real-time decision making. When formulated for human-robot collaboration, APA treats the robot as a follower reacting to the human’s action execution preferences. Chaski models an equal-partners interaction, giving both the human and robot freedom to select their own actions [38], but expects communication between the two agents. Scheduling algorithms require a different task specification in the form of a comprehensive set of task constraints, in contrast to our approach which uses a task model learned from demonstration. As such we view scheduling approaches as complementary methods to approach collaborative task planning, depending on whether the tasks themselves are more easily defined through constraint specification or shown through demonstration.

Human Intent Prediction: Another approach to human-robot collaboration is to focus on the human intent prediction problem. Well-informed action predictions can be used to choose robot actions that complement the human’s intended approach without requiring explicit communication. For example, Hawkins et al. construct Bayes networks from AND/OR task graphs to perform inference on user intent, informing robot action selection [24]. Cramer et al. formulate a POMDP from a state graph with the human operator’s intent as a hidden variable to plan for robot actions [9]. Both methods follow a leader-follower strategy with the robot’s role to react to the human’s decisions, though we recognize the benefits of human intent prediction to alleviate sensor uncertainty and consider the compatibility of these approaches with our UHTP framework as future work.

3 Overview

In this section, we formalize our human-robot collaborative planning problem and briefly explain our proposed framework UHTP and evaluation procedure.

3.1 Problem Formulation

We define our collaborative task execution problem as follows. The world is described by some state $\mathcal {S}$, where $\mathcal {S}_i$ represents the initial state and $\mathcal {S}_g$ represents the goal state of the task. We assume two agents operate in the world, a human and a robot, who are able to perform actions from the finite sets $\mathcal {A}_h$ and $\mathcal {A}_r$,² respectively. The actions that make up $\mathcal {A}_h$ and $\mathcal {A}_r$ are not mutually exclusive ($\mathcal {A}_h \cap \mathcal {A}_r \ne \emptyset$), such that there may be actions that either agent could complete. Each action $a_h \in \mathcal {A}_h$ and $a_r \in \mathcal {A}_r$ has a cost associated with it, written as $C(a_h)$ and $C(a_r)$, respectively. A task solution is defined as a plan of robot actions, $\Pi _r$, and human actions, $\Pi _h$, that enables the world to transition from the initial state $\mathcal {S}_i$ to the goal state $\mathcal {S}_g$. The objective of the task planning system is to minimize the overall cost $C_{task}$ of the combined plan:

\begin{equation*} C_{task} = \sum _{\Pi _r}{C(a_r)} + \sum _{\Pi _h}{C(a_h)} \end{equation*}

We define action costs in terms of execution time, although other action costs can also be defined, such as ergonomics, travel distance, quality of result, and monetary costs discussed in prior collaborative hierarchical task modeling work [5, 14]. The assumption driving this work is that the human and the robot are ultimately responsible for selecting their own actions, respectively. Thus, the challenge for the planning system is that the human’s choice of actions, action order, and the exact costs of actions can only be estimated a priori. Additionally, we make the following assumptions: individual actions are performed by individual agents, although both agents can perform separate actions simultaneously, and when an agent starts executing an action, they will continue until the action is completed. We revisit these assumptions when discussing future work in Section 8.

3.2 Enabling User-aware Human-Robot Collaboration

To address the above problem, we introduce the User-aware Hierarchical Task Planning framework, which models human-robot collaborative planning as follows. We represent the world state $\mathcal {S}$ as a set of literals [19]. $\mathcal {A}_h$ and $\mathcal {A}_r$ represent the sets of primitive actions executable by the human and robot, respectively, such as pick up, insert, and wait, applied to a predetermined set of objects in the environment. We model the action cost function C in terms of action duration and represent $\Pi _h$ and $\Pi _r$ using a unified Hierarchical Task Network (HTN) representation. The unified HTN, such as the one in Figure 1(iii), encodes the actions of both agents and their respective role assignments in a single representation.

The key innovation of UHTP is two-fold: (i) modifying the standard HTN representation to aid in collaborative task planning, and (ii) using the augmented HTN for online task planning to monitor task progress and perform the least cost execution path. Algorithm 1 gives the high-level program flow for the full UHTP algorithm. We begin with a standard single-agent HTN $\mathcal {T}$ that decomposes the task into executable actions. As HTNs do not provide any collaboration capabilities by themselves, UHTP extends the standard HTN representation to enable collaborative planning by assigning action nodes to individual agents and propagating execution costs throughout the model (Algorithm 1 lines 2 and 3). UHTP assigns action nodes in $\mathcal {T}$ based on action sets $\mathcal {A}_h$ or $\mathcal {A}_r$, as demonstrated in Figure 1. Thereafter, each action node is assigned its associated cost $C(a)$. The node costs of abstract nodes are calculated by aggregating the costs of their child nodes, starting from the leaves of the tree, thereby propagating action costs throughout the task model. The resulting HTN $\mathcal {T}^{UHTP}$ encodes actions, role assignments, and execution costs in a single representation. Details of the collaborative HTN representation and cost aggregation method are given in Section 4.1.

Fig. 1.

Once the collaborative HTN is constructed, UHTP leverages $\mathcal {T}^{UHTP}$ itself to monitor the current execution state of the collaborative task by continuously polling for changes in the human’s action and selecting robot actions that minimize the total task cost. Human actions are accounted for by observing the human partner’s current action using real-time activity recognition, removing actions from $\mathcal {T}^{UHTP}$ upon completion, and pruning inconsistent branches from $\mathcal {T}^{UHTP}$ upon starting a new action (Algorithm 1 lines 7–12). The robot’s actions are selected by querying $\mathcal {T}^{UHTP}$ for a list of available actions, computing a new expected cost for executing each action, and executing the action that minimizes the total expected cost (lines 13–18). This process repeats until there are no nodes left in $\mathcal {T}^{UHTP}$, and the task is complete. The robot action selection, execution state monitoring, and activity recognition components are discussed in detail in Section 4.2.

We evaluate UHTP on collaborative tasks in both simulation and the real world. We first construct two collaborative assembly tasks in simulation with artificial human behavior to evaluate how UHTP allocates actions to the robot in the presence of overlapping action sets, alternate task execution paths, and optional actions. We present these results in Section 5. Second, we conduct an in-person user study with a physical robot arm on a collaborative drill assembly task to examine the performance of our proposed framework with a human participant and compare it to a robot executing a predefined sequence of actions. The details of the user study are presented in Section 6. This study serves to not only validate our proposed framework for real-time collaboration with real human users, but also provides insight into how the robot’s awareness of human activity influences the users’ task decisions, cognitive workload, and preferences for their robot partner’s behavior. The results from this study are discussed in detail in Section 7.

4 Adaptive Task Planning for Human-Robot Collaboration

This section explains the detailed working of UHTP, including extending the HTN representation to encode agent role assignments and agent-specific costs, and the online task planning algorithm to select robot actions based on human activity feedback and the current state of the task.

4.1 Extending HTNs for Collaborative Tasks

We begin with an HTN representation based on the representation described by Chen et al. [4]. Let $\mathcal {N}$ be the set of all nodes in the task model $\mathcal {T}$. Any given node $n \in \mathcal {N}$ has a single parent node $P(n)$ and may or may not have children nodes: $D(n) = \lbrace v_1, v_2, \dots , v_M\rbrace$ or $D(n) = \emptyset$. Set $\mathcal {N}$ can be divided into four subsets³:

•

$\mathcal {N}_x$ (primitive nodes) - nodes with no children which represent individual, executable actions,

•

$\mathcal {N}_f$ (fully-ordered nodes) - whose children must be executed in a prescribed order,

•

$\mathcal {N}_p$ (partially-ordered nodes) - whose children can be executed in any order, and

•

$\mathcal {N}_d$ (decision nodes) - whose children represent alternative subtask execution paths with associated probabilities denoting execution preferences.

Without loss of generality, we make the assumption that the children of partially-ordered nodes can be executed in parallel by multiple agents,⁴ as in the parallel tasks of Roncone et al.’s hierarchical task representation [37].

We modify $\mathcal {T}$ to produce a collaborative HTN representation, $\mathcal {T}^{UHTP}$, by assigning an agent $E(n_x)$ to each primitive node $n_x \in \mathcal {N}_x$, based on the action sets $\mathcal {A}_h$ and $\mathcal {A}_r$. Actions that exclusively belong to a single action set are assigned to the corresponding agent, while actions that belong to $\mathcal {A}_h \cap \mathcal {A}_r$ are converted to a decision node with two children – one primitive node child assigned to each agent– and initialize the branch probabilities as a uniform distribution. For example, in Figure 1, action B and C are assigned to the robot and human agents, respectively, while action A is converted into a decision node with the two children $A-Robot$ and $A-Human$. We assume that the set of primitive actions in $\mathcal {T}$ is a subset of $\mathcal {A}_h \cup \mathcal {A}_r$, i.e., the robot and human can together perform all of the actions required in the original task model.

Additionally, every node n in $\mathcal {T}^{UHTP}$ is assigned a a scalar total cost $C_{total}(n)$ used to select the optimal robot action and a cost value for the human $C_h(n)$ and robot $C_r(n)$ that encodes the cost of each agent executing the state. For a primitive node $n_x \in \mathcal {N}_x$, this cost is defined as the average time taken by agent $E(n)$ to complete the action. We set the cost of the non-executing agent to zero and the total cost of the node to the maximum agent-specific cost, i.e., the action cost for the executing agent – $C_{total}(n_x) = max(C_h(n_x), C_r(n_x))$. The cost values of higher-level nodes in HTN – $n_a \in \mathcal {N}\setminus \mathcal {N}_x$ – are calculated by aggregating the costs of their children nodes using Characteristic Accumulation Functions (CAFs), similar to Chen and Decker [5]. We define a unique CAF for each node type:

Fully-ordered:. For a fully-ordered node $n_f \in \mathcal {N}_f$ and its children $v_f \in D(n_f)$, we calculate both the agent-specific costs $C_e(n_f), e \in \lbrace h, r\rbrace$ and the total cost $C_{total}(n_f)$ as a sum of its children’s costs.

\begin{equation} C_e(n_f) = \sum _{v_f \in D(n_f)}{C_e(v_f)} \end{equation}

(1)

\begin{equation} C_{total}(n_F) = \sum _{v_f \in D(n_f)}{C_{total}(v_f)} \end{equation}

(2)

Partially-ordered:. For a partially-ordered node $n_p \in \mathcal {N}_p$ and its children $v_p \in D(n_p)$, we again calculate the agent-specific costs $C_e(n_p), e \in \lbrace h, r\rbrace$ as a sum of its children’s agent-specific costs, as in Equation (1). The node’s total cost $C_{total}(n_p)$ depends on the order that the actions are executed in, but we approximate it by calculating a lower bound $C_{total}^{lb}(n_p)$ and upper bound $C_{total}^{ub}(n_p)$. We calculate the lower bound $C_{total}^{lb}(n_p)$ or the fastest possible execution time for the node as either (1) the maximum cost of executing a single one of its children, or (2) the higher agent-specific cost between the robot and the human, whichever is larger. The intuition behind calculating $C_{total}^{lb}(n_p)$ is that executing a set of actions simultaneously (1) must take at least as long as the longest action in the set, and (2) cannot be faster than a single agent’s total execution time. We calculate $C_{total}^{ub}(n_p)$ as if every action was executed sequentially, similar to a fully-ordered node. Finally, we approximate the full cost as an average of the upper and lower bounds.

\begin{equation} \begin{split} C_{total}^{lb}(n_p) & = max(\lbrace C_{total}(v_p) | v_p \in D(n_p)\rbrace , C_r(n_p), C_h(n_p)) \\ C_{total}^{ub}(n_p) & = \sum _{v_p \in D(n_p)}{C_{total}(v_j)} \\ C_{total}(n_p) & = \frac{C_{total}^{lb}(n_p) + C_{total}^{ub}(n_p)}{2}\\ \end{split} \end{equation}

(3)

As a special case, if $n_p$ has two children $v_{p1}, v_{p2} \in D(n_p)$ assigned to different agents ($E(v_{p1}) \ne E(v_{p2})$), then the cost is simply the maximum agent-specific cost as we assume full parallel execution.

\begin{equation} C_{total}(n_p) = max(C_h(n_p), C_r(n_p)) \:\: \mathbf {if} \: C_e(v_{p1})\cdot C_e(v_{p2}) = 0, \: e \in \lbrace h, r\rbrace \end{equation}

(4)

Decision:. For a decision node $n_d$, its children $v_d \in D(n_d)$, and a set of probabilities for selecting each child $\mathtt {p}(v_d)$, we calculate both the agent-specific costs $C_e(n_d), e \in \lbrace h, r\rbrace$ and the node’s total cost $C_{total}(n_d)$ by taking the expectation of each, respectively:

\begin{equation} C_e(n_d) = \sum _{v_d \in D(n_d)}{\mathtt {p}(v_d)\:C_e(v_d)} \end{equation}

(5)

\begin{equation} C_{total}(n_d) = \sum _{v_d \in D(n_d)}{\mathtt {p}(v_d)\:C_{total}(v_d)} \end{equation}

(6)

We define $\mathcal {T}^{UHTP}$ for only two agents, but this task model can be extended to represent collaborative tasks with more than two agents by slightly modifying the node CAFs, although we leave evaluation beyond two collaborators to future work.

4.2 Online Task Planning with UHTP

Equipped with the augmented task model from the previous section, UHTP can now plan robot actions while adapting to the human user in real-time. We first describe the algorithms UHTP uses to select robot actions, and extend these algorithms to monitor and adapt to the human partner. Algorithm 2 gives pseudocode for the relevant functions that support the overall execution of the UHTP framework shown in Algorithm 1.

Whenever the robot is not performing an action, UHTP queries $\mathcal {T}^{UHTP}$ for a list of currently valid robot actions $\mathbf {a}_r = [a_r^1, a_r^2, \dots ]$ (Algorithm 2 line 2). Valid actions include any primitive node which (1) has the robot assigned as the agent, and (2) is reachable by following the first child of fully-ordered nodes and any child of partially-ordered or decision nodes, computed recursively from the root node of $\mathcal {T}^{UHTP}$ (line 14). Executing each available action $a_r^i$ leads to a different plan $\Pi (a_r^i)$, which we can represent with another collaborative HTN $T^{\prime }_i$. $T^{\prime }_i$ is constructed by creating a copy of $\mathcal {T}^{UHTP}$, and removing decision branches inconsistent with executing $a_r^i$ (lines 5 and 6), where inconsistent decision branches are any children of decision nodes that do not include $a_r^i$ in their currently valid actions (line 30). An example of how individual plan HTNs are generated is shown in Figure 2. UHTP also considers a $no\_action$ plan $\Pi (a_r^0)$ by constructing an HTN $T^{\prime }_0$ in which all decision branches with robot actions are pruned, as shown in the top row of Figure 2.

Fig. 2.

Constructing HTNs for each available action generates a list of robot plans $\mathbf {\Pi }_r = [\Pi (a_r^0), \Pi (a_r^1), \dots ]$ for UHTP to choose from, with each plan $\Pi (a_r^i)$ corresponding to an HTN $T^{\prime }_i$. Representing each plan $\Pi (a_r^i)$ as a collaborative HTN allows UHTP to compute the expected total cost of the remaining actions in the plan after executing $a_r^i$ (Algorithm 2 line 7), represented by the root node cost of $T^{\prime }_i$ after performing the same cost aggregation procedure described in Section 4.1. UHTP then chooses the robot plan with the least completion time and the robot executes the corresponding action (lines 10 and 11):

\begin{equation} a_r^* = \mathop{\text{argmin}}\limits_{a_r^i} C_{\Pi }(a_r^i). \end{equation}

(7)

Concurrently with the robot action selection and execution process described above, UHTP maintains the current state of the collaborative task execution by polling for the start of a new human action and the completion of any human and robot actions. Human actions are tracked using real-time activity recognition (Algorithm 1 line 7). While the specific activity recognition approach will depend on the robot’s sensor configuration, Section 6.4 provides details of a feed-forward activity classification network trained over body tracking data from a front-facing RGBD sensor. Whenever the human begins a new action, signified by detecting current action $a^{\prime }_h$ which differs from $a_h$, UHTP updates the state of the task model (lines 8–12) to reflect this change. Specifically, UHTP removes all of the currently valid primitive nodes from $\mathcal {T}^{UHTP}$ containing the completed action $a_h$, and prunes any decision branches that are inconsistent with $a^{\prime }_h$ from the task model using the same pruneBranches function used during robot action selection. For example, if the human starts performing action A, UHTP immediately prunes any decision branches in which the robot would execute action A instead. Similarly, whenever the robot completes an action $a_r$, UHTP removes all of the currently valid primitive nodes containing $a_r$ from the model (line 14) and prunes decision branches inconsistent with the new robot action (line 16). We continue this cycle of pruning inconsistent execution paths, selecting optimal robot actions, and removing completed action nodes, until $\mathcal {T}^{UHTP}$ is empty and the collaborative task is complete. In this manner, UHTP enables mutual adaptation by continuously maintaining all valid future plans through observation while simultaneously actively selecting robot actions that guide collaborative execution towards the optimal collaborative plan, without telling the human partner what actions to select for themselves. Furthermore, UHTP accommodates a variety of human behaviors without encoding preferences in the task cost function by identifying the user’s preferred execution path from possible future human actions in $\mathcal {T}^{UHTP}$.

5 Simulated Evaluation

We evaluate UHTP’s ability to reduce task completion time in collaborative manipulation scenarios. We test the following hypothesis – H1: UHTP produces quicker human-robot execution paths than baseline planning strategies for non-optimal human behavior. To evaluate this hypothesis, we built two manipulation tasks in a custom simulation with fixed action times and executed the tasks over multiple trials with random human behavior. We compared the overall task completion time, or makespan, of the assembly when using our proposed framework versus the following baseline planning strategies: fixed action sequence, greedy and random policy. This section investigates the results obtained from these experiments.

5.1 Task Design

We defined two collaborative manipulation tasks to evaluate the performance of UHTP:

$Q_{chair}$: This is a furniture assembly task in which the robot and human assemble a flat-pack chair, similar to the chair designs found in the IKEA furniture assembly dataset [2]. As shown by the HTN in Figure 3(a), the human and robot have an overlapping action space in $Q_{chair}$, with only the ‘Flip seat’ action being exclusive to the human. As both agents have a shared set of responsibilities in this task, the collaboration strategy between agents will decide the final agent assignment to each action. Action execution time depends on the distance between the agent and the object, according to the simulated object layout shown in Figure 3(b). However, on average the robot executes actions slower than the human. This task is designed to test whether a given planner can adapt to a human’s real-time allocation of shared actions while optimizing for assembly makespan.

Fig. 3.

$Q_{drill}$: This is an industrial assembly task in which the robot assists a human in assembling a set of power drills, as shown by the HTN in Figure 4. The human’s goal is to perform two drill assemblies, but for safety reasons only the robot can retrieve the initial parts and drop off the completed drill assemblies. Additionally, for each drill the robot can choose to support the human’s drill assembly actions by holding parts in place – this will speed up the human’s assembly actions at the cost of sacrificing parallel execution. As such, the robot can either (a) execute the task in parallel, fetching new parts while the human is performing a single drill assembly, (b) execute the task linearly by supporting the human throughout their assembly process, or a mix of (a) and (b) depending on what the robot observes and anticipates. Further, the human performs quality control, and may need to perform the additional step of rewiring the drill to complete an assembly. We systematically vary the chance that the human will need to perform the rewiring action with a probability between 0 and 1. This task tests the planner’s ability to minimize assembly makespan under the presence of optional robot actions and variable human decisions made due to external factors.

Fig. 4.

In addition to defining an HTN for each task, we specified the action execution times for each agent in a lookup table. Across all trials, we randomly selected human actions from the probability distribution defined by the decision nodes in the HTN. Although random action selection for the human is likely inaccurate, this strategy allows us to validate the effectiveness of our proposed framework over a wide range of possible human behavior. We ran 1,000 trials for each task and each action selection method, with each trial simulating a complete execution of the collaborative assembly task.

5.2 Baseline Conditions

In order to evaluate the performance of UHTP on the above tasks, we sought out prior works that met three specific criteria: (1) require minimum task specification effort, (2) do not put constraints on what the human collaborator can do, and (3) avoid the additional overhead of a communication mechanism. However, as discussed in Section 2, most prior techniques fail to meet one or more of these criteria, making direct comparisons with our framework difficult. For instance, some works require detailed cost specification for every subtask [22] or a model of user-specific task knowledge [31]. Other approaches assume optimal human behavior [10] or restrict the approach to tasks where the human must execute all assigned subtasks [6]. Additionally, a majority of approaches rely on direct communication with the human partner [14, 15, 29, 34], making it impractical to directly compare with the communication-free UHTP framework.

To adhere to our criteria for baseline approaches, we instead compare UHTP’s performance against the following set of baseline planning strategies:

$P_{fixed}$ (fixed sequence policy): Executes a fixed sequence of actions pre-defined by a domain expert. This condition is inspired by fixed-order industrial automation approaches typically used in industry, in which the robot’s actions are consistent and predictable, but do not account for the human’s behavior.

$P_{greedy}$ (greedy policy): Selects the current robot action with minimal single-action cost, based on the utility-maximization single action selection approach of Fusaro et al.’s collaborative Behavior Trees [14] adapted to our collaborative HTN representation. This baseline uses the proposed multi-agent HTN formulation from Section 4.1 without the UHTP algorithm, and we compare with this approach to demonstrate the importance of reasoning about possible future execution paths to minimize overall execution cost in a collaborative task.

$P_{random}$ (random policy): Baseline approach to randomly select an action from the currently available actions that the robot can perform with uniform probability. We include this baseline to demonstrate how both our approach and the other baselines perform in comparison to random action selection in each domain.

5.3 Results and Discussion

Figure 5(a) presents a comparison of each planning strategy’s mean makespan for $Q_{chair}$. As shown by the plot, both $P_{greedy}$ and $P_{random}$ were ineffective in this task, necessitating either planning or domain knowledge. UHTP resulted in the fastest average task completion time of 45 seconds, outperforming the industry standard of $P_{fixed}$. Furthermore, UHTP and $P_{fixed}$ achieved the same minimum makespan of 38 seconds, as the fixed policy was designed to be optimal under the optimal human action selection for this task. A one-way Analysis of Variance (ANOVA) [20] showed a significant effect of action selection policy on the makespan ($p \lt 0.001$). Post-hoc Tukey HSD tests showed that UHTP resulted in significantly lower average makespan than all of our baselines ($p\lt 0.001$ in all cases). Further, the post-hoc tests showed statistically significant differences for all pairwise comparisons, except for the greedy vs. random baselines ($p\lt 0.001$ in all significant cases).

We also observe that $P_{fixed}$ had a larger standard deviation and a greater maximum makespan than UHTP, since the fixed policy does not consider sub-optimal human actions. In contrast, UHTP successfully accounted for sub-optimal action selection by evaluating all of the human’s potential decisions while estimating the task cost during robot action selection. Further, since this task has a shared set of actions, the UHTP-controlled robot preemptively selected actions that would be slower for the human to execute, limiting their future set of actions to a more optimal set.

Fig. 5.

Figure 5(b) shows a comparison of the makespan for $Q_{drill}$ over a discrete range of failure probabilities. As with $Q_{chair}$, greedy and random policies were inefficient for the task, showing that either planning or domain knowledge are necessary. UHTP performed better than the other baselines, minimizing the expected makespan over the full range of quality control failure chance. The fixed policy for this task was defined assuming a low probability of quality control failure, having the robot execute part retrieval and drop-off actions in parallel to the human’s assembly actions, and was thus unable to adapt to perform efficiently under higher quality control failure rates. A two-way ANOVA showed significant main effects of both action selection policy and chance of quality control failure on the makespan ($p\lt 0.001$ for both effects). This was qualified by an interaction effect between the two variables ($p\lt 0.001$). We conducted post-hoc tests using the Bonferroni correction to compare our method to each baseline. The post-hoc tests showed that UHTP resulted in significantly lower average makespan than the greedy and random baselines for all chances of quality control failure, and significantly lower average makespan than the fixed policy baseline when the chance of quality control failure was 0.3 or greater, with no significant difference between UHTP and $P_{fixed}$ otherwise.

We posit that UHTP performs well on task $Q_{drill}$ because $Q_{drill}$ was designed to require the robot to act preemptively, rather than follow the human’s decisions in a leader-follower paradigm. If the robot anticipates a high enough likelihood of quality control failure, requiring the human to perform a time-consuming optional rewiring step, the robot can preemptively act to assist the human in the assembly and rewiring actions instead of executing part fetching actions in parallel with the human’s assembly actions. This could reduce the human’s assembly time, but is less efficient than parallel execution if there is no quality control failure. UHTP was therefore able to account for this situation under all likelihoods of the rewiring task occurring, and took appropriate preemptive actions to aid the human partner.

The results discussed above demonstrate that UHTP outperforms baseline planning strategies in producing smaller makespans and shows robustness to varying human behavior and the presence of shared and optional actions.

6 User Study Experimental Setup

We evaluate UHTP beyond simulation by conducting a within-subjects user study in which participants teamed up with a robot arm to assemble power drills, similar to $Q_{drill}$ in the previous section. Our study aims to evaluate the performance of UHTP when interacting with real humans instead of simulated random human behavior and compare UHTP’s performance to that of a robot executing a predefined action sequence. We also attempt to gain insight into how interacting with a user-aware robot affects a human participant’s individual assembly strategy and their mental workload. This section describes our user study design and defines the metrics used in the evaluation.

6.1 Study Design

Participants played the role of an assembly worker and were asked to construct power drills of two different colors (Blue, Yellow) with the help of our 7-Degrees of Freedom (DOF) JACO robot arm. Participants were required to assemble the drill parts brought by the robot arm into a finished product and there was no overlap in the set of actions that each agent can perform. Participants were also asked to complete any action they began before starting a new one; for example, a participant fastening screws onto the drill must not switch to another action without completing the screw fastening. This constraint enabled our perception engine to accurately identify the beginning and end of individual human actions. Participants were free to either build the two drills one after another, alternate between building the two drills, or pause the assembly of one drill to complete another. As shown by the HTN in Figure 6, participants were also allowed to choose which color drill to build first. Thus, the robot had to infer the ordering choice and deliver drill parts at the appropriate time to minimize human idle time and reduce assembly makespan.

Fig. 6.

The study was within-subjects, and each participant was presented with two validation scenarios: $S_{UHTP}$, in which the robot arm was controlled by UHTP, and $S_{fixed}$, in which the robot executed a predefined action sequence. We refer to the robots operating in each of these scenarios as $R_{UHTP}$ and $R_{fixed}$, respectively. Participants were unaware of which scenario corresponded to which robot behavior, and we counter-balanced the order in which scenarios were presented to participants. Each validation scenario consisted of two assembly runs with participants building two drills in each run, totaling four drills per scenario. After each validation scenario, participants answered a survey asking them questions about the robot’s behavior and their mental workload. Additionally, participants answered a post-experiment questionnaire asking them to compare the two robot behaviors they experienced based on specific attributes and their personal preferences.

6.2 Participants

Our user study involved 35 participants recruited from the local community.⁵ Five participants were excluded from the data analysis: three due to hardware malfunctioning and two due to participants deviating from the study protocol. The resulting 30 participants (13 women, 17 men) were aged between 18 and 32 (Median = 23.5) and spanned a wide range of previous robotics expertise (Not an expert: 11, Enthusiast: 13, Novice roboticist: 4, Expert roboticist: 2). Of the 30 participants, 16 participants were given $S_{UHTP}$ first and 14 participants were given $S_{fixed}$ first. The study took approximately 45 minutes and participants were paid $10 USD as compensation for their time.

6.3 Experimental Setup

Figure 7 shows the experimental setup, which consisted of two mutually exclusive workspaces for the robot and participant respectively, and a drop-off zone between the two workspaces. The JACO robot arm brought delivered parts from the robot workspace to the participant by placing them in the drop-off zone. Two Azure Kinect depth cameras were used in this task: one body tracking camera and one overhead camera for localizing individual drill parts via tabletop segmentation.⁶ At the start of each assembly run, participants indicated which drill they chose to build first by picking up either a blue or yellow-colored tile shown on the right. Participants complete an assembly run by building both a blue and yellow drill. The human’s action costs in the HTN were calculated by averaging the action durations of pilot participants, while the robot’s action costs depended on the average speed of the robot and the distance of individual drill parts from the robot. The robot’s speed was empirically chosen to provide participants with enough time to complete their actions, and the robot’s action durations were kept constant across all participants.

Fig. 7.

6.4 Action Classification Model

For scenario $S_{UHTP}$, we obtained real-time human activity feedback by generating action labels from body pose data collected with an Azure Kinect depth camera. The action labels were generated using a feed-forward neural network consisting of five hidden fully-connected layers with intermediate dropout layers ($p=0.3$). The network was trained with a Focal loss function to compensate for the imbalance in the number of per-class training examples [32].

We collected 11 drill assembly demonstrations from five expert demonstrators for training and split them into 8 for the train set and 3 for the test set. Each frame of the training demonstrations was annotated with one of six action labels- attach shell, screw, attach battery, place drill, grab parts, and a null label for miscellaneous poses. We trained the network for 200 epochs with a learning rate of 1e-5 using the default Adam optimizer. The macro-averaged F1 score obtained on the test set is 0.72 and the confusion matrix for the same is shown in Figure 8. Additionally, we queried the task model for a list of available human actions to reject network predictions that violated the ordering constraints of the task.

Fig. 8.

6.5 Evaluation Metrics

For each trial, we evaluated the quality of collaboration using the following metrics:

(1)

Makespan ($X_{makespan}$): is the combined total assembly time of two runs, which is the time taken by the human-robot team to assemble four power drills minus the time taken to reset the task between runs.

(2)

Perceived Fulfillment Rate ($X_{fill}$): is a self-reported evaluation of the robot’s ability to fulfill the participant’s requirement by promptly delivering the right drill part, collected by the post-scenario surveys.

(3)

Perceived Failure Rate ($X_{fail}$): is a self-reported measure of how often the robot fails to meet the participant’s requirement by providing a wrong drill part, collected by the post-scenario surveys.

(4)

Perceived Robot-induced Delay ($X_{delay}$): is a self-reported degree of delay in the participant’s activity because the robot delivered the wrong drill part, collected by the post-scenario surveys.

(5)

Perceived Robot-induced Alteration ($X_{alter}$): is a self-reported frequency of how often the participant had to alter their assembly due to the robot’s choice of drill parts, collected by the post-scenario surveys.

(6)

NASA TLX Score ($X_{workload}$): is the NASA TLX score, a measure of the self-reported subjective mental workload experienced by a participant during a task [23].

To maintain the same trend among metrics, we reversed the ordinal scale of $X_{fill}$ to evaluate participant responses. The resulting measure, $X_{fill}^*$, represents the participant’s evaluation of how often the robot failed to bring the necessary drill part at the right time.

In addition to these metrics, we recorded participants’ responses for each scenario on the following questions: (a) ‘Did the robot always track which color drill you were constructing?’ (Yes/No) and (b) ‘Did the robot adapt its behavior to your actions?’ (Yes/No). Finally, after interacting with both scenarios, we asked participants to compare the two robot behaviors with three different questions: (a) ‘Which behavior was better at tracking which color drill you were building?’, (b) ‘Which behavior made you wait the least for parts you needed?’, and (c) ‘Which behavior did you prefer the most?’. Participants responded to each question with either the first or second behavior, both behaviors equally, or none of the behaviors.

7 User Study Results and Discussion

We analyze the task measures and survey responses from our user study to draw conclusions about the effect of using UHTP on the human participant’s individual assembly strategy and, consequently, the performance of the human-robot team in our collaborative drill assembly task.

7.1 Hypotheses

We consider the following hypotheses:

H2-A: UHTP produces quicker human-robot execution paths than a fixed policy under real human behavior.

H2-B: A UHTP-controlled robot satisfies the human partner’s requirement in a task more frequently than a fixed-policy robot.

H2-C: Human partners deviate from their task execution strategy more frequently when interacting with a fixed-policy robot versus a UHTP-controlled robot.

H2-D: Human partners experience a higher mental workload while interacting with a fixed-policy robot than with a UHTP-controlled robot.

Fig. 9.

Fig. 10.

7.2 Results

We now present the results and data analysis obtained from our study. For each metric, we test for statistical significance using a two-sided Wilcoxon Signed-Rank test (non-parametric analysis) [8] with a significance value of $\alpha = 0.05$ and a sample size of 30 participants. We expect $S_{UHTP}$ to score less than $S_{fixed}$ across all the metrics, supporting hypotheses H2-A to H2-D.

H2-A: The mean $X_{makespan}$ for $S_{UHTP}$ (M = 392.6, SD = 53.60) is lower than that of $S_{fixed}$ (M = 435.6, SD = 62.59) as shown in Figure 9. This difference is statistically significant with a p-value of less than 0.001, thereby supporting hypothesis H2-A that planning with UHTP leads to quicker task completion times. Furthermore, the range of $X_{makespan}$ values for $S_{fixed}$ (Range = 268.4 secs) is greater than the range of values for $S_{UHTP}$ (Range = 168.7 secs).

To explain the difference in $X_{makespan}$ values, consider the examples shown in Figure 10, in which the user begins assembling either a blue drill (Row $H_{blue}$) or a yellow drill (Row $H_{yellow}$). $R_{fixed}$ is programmed to deliver parts assuming the human always begins with a blue drill. We observe that in example $H_{blue}-R_{fixed}$, the participant’s choice of building a blue drill first matches the robot’s assumption about the human and this leads to a low makespan of 390 seconds. On the other hand, the opposite occurs in example $H_{yellow}-R_{fixed}$ and the human has to wait for the robot to bring parts for a yellow drill before they can even begin assembling. This leads to undesirable human idle time and an increased assembly makespan of 420 seconds. Alternatively in the second column, robot $R_{UHTP}$ infers the user’s choice of ordering in real-time and accordingly executes a sequence of actions that complement the user’s decision and reduces human idle time. As a result, we find that in examples $H_{blue}-R_{UHTP}$ and $H_{yellow}-R_{UHTP}$, the color of parts brought by $R_{UHTP}$ match the drill color chosen by the user. The makespans of both these examples also match the lowest makespan from example $H_{blue}-R_{fixed}$ which is 390 seconds. Consequently, the range of makespan for $S_{fixed}$ will also be larger than that of $S_{UHTP}$.

H2-B: Participants ranked $R_{UHTP}$ higher than $R_{fixed}$ in terms of how frequently the robot brought parts that the participant needed at the right time. This is evident from the lower median $X_{fail}$ and $X_{fill}^*$ scores for $S_{UHTP}$ (Median $X_{fail}$ = Never, Median $X_{fill}^*$ = Never) than the median scores for $S_{fixed}$ (Median $X_{fail}$ = Sometimes, Median $X_{fill}^*$ = Most of the time), as shown in Figure 11. Statistical analyses shows that participants responded to $S_{UHTP}$ with significantly different $X_{fill}^*$ scores ($p \lt 0.001$) and $X_{fail}$ scores ($p \lt 0.001$) than to $S_{fixed}$. These results support hypothesis H2-B that $R_{UHTP}$ anticipates and fulfills the human partner’s requirements better than $R_{fixed}$.

H2-C: Figure 11 shows that participants responded to $S_{UHTP}$ with a lower median $X_{alter}$ value (Median = Never) than to $S_{fixed}$ (Median = Sometimes). While both scenarios received the same median $X_{delay}$ value (Median = Sometimes), $S_{fixed}$ has more participants to the right of 0 than $S_{UHTP}$. Statistical analysis shows that the $X_{delay}$ and $X_{alter}$ values for $S_{UHTP}$ are significantly different than those of $S_{fixed}$ for both metrics ($p \lt 0.001$), supporting hypothesis H2-C. These results inform us that participants had to pause and/or change their construction more often during $S_{fixed}$ than during $S_{UHTP}$ due to the robot providing parts in a sub-optimal order, which is reflected in the above responses. Thus, due to UHTP’s adaptation to participant decisions, the robot avoids undesirable interruptions in the participant’s assembly such as idle time and building alteration.

Fig. 11.

H2-D: The average $X_{workload}$ score for $S_{fixed}$ (M-11.9, SD = 2.54) is larger than the average score for $S_{UHTP}$ (M = 11.1, SD = 1.90), as shown in Figure 12. This difference is statistically significant ($p \lt 0.01$), supporting hypothesis H2-D that participants in $S_{fixed}$ experienced a higher workload than $S_{UHTP}$. We posit that the overall workload is not substantial, though, because the task performed during this study is short (15 mins per scenario) and fairly straightforward. We anticipate that the difference in workload will be more substantial when measured on other assembly tasks that are more complex and are repeated over an extended period of time.

Fig. 12.

Post-scenario Responses: Figure 13(a) shows the responses of participants to yes/no questions about whether the participant felt that the robot was monitoring their activity. Participants responded more positively to robot $R_{UHTP}$ than $R_{fixed}$ in terms of both the robot’s ability to track the color of the drill being built (Percentage of Yes responses for $S_{UHTP}$ : 90.0%, $S_{fixed}$ = 20.0%) and the robot’s adaptability to participant actions (Percentage of Yes responses for $S_{UHTP}$ : 63.3%, $S_{fixed}$ = 30.0%). McNemar’s test shows that there is a significant difference in participant responses between scenarios for these two questions (p-value from left to right in Figure 13(a): p < 0.001, p = 0.006).

Fig. 13.

Comparative Responses: Figure 13(b) shows the responses from our post-experiment questionnaire where we asked participants to compare the robot behaviors based on specific qualities. For each question, participants responded by selecting one of four options: the UHTP scenario ($S_{UHTP}$), fixed policy scenario ($S_{fixed}$), both scenarios equally, or none of the scenarios. The majority of participants preferred the interaction in scenario $S_{UHTP}$ over $S_{fixed}$ (Percentage responses for $S_{UHTP}$ = 86.7%, $S_{fixed}$ = 10.0%). Also, most of the participants chose $R_{UHTP}$ as better at tracking the color of drill being constructed (Percentage responses for $S_{UHTP}$ = 80.0%, $S_{fixed}$ = 6.7%) and minimizing participant idle time (Percentage responses for $S_{UHTP}$ = 63.3%, $S_{fixed}$ = 10.0%). Additionally, we note that a moderate number of participants responded in favor of both scenarios equally across all three questions. This result matches our previous observation of a wider range of responses for $S_{fixed}$ due to some participants choosing a color order that meets the optimal ordering assumptions of the fixed-policy robot’s action sequence. Participants whose choice matched with the action sequence of $R_{fixed}$ would not be able to differentiate between either scenario, as both policies would behave optimally.

7.3 Discussion

The above results validate two central claims we make about UHTP under operating conditions with real human collaborators – (1) the ability to infer a human user’s intent in real-time and complement it during a collaborative task and (2) the versatility in accommodating a variety of human behaviors without sacrificing task performance. Participant responses about interacting with a UHTP-controlled robot show that UHTP also improves the human user’s experience of collaboration as compared to a fixed policy in a number of ways, such as reducing human idle time and avoiding execution paths that interrupt the user’s construction. Furthermore, participants are able to identify that $R_{UHTP}$ is adapting to their choices. As a result, participants reported a reduced mental workload and distinctly prefer the UHTP-controlled robot as a collaborative partner over a fixed-policy robot.

A common observation made across all metrics is that the data obtained for $S_{fixed}$ cover a wider range of values compared to that of $S_{UHTP}$ for all six metrics. This complements what we found with the makespan values of task $Q_{chair}$ from Section 5, where the makespan’s standard deviation of $P_{fixed}$ was higher than that of UHTP. We again attribute this trend to the fact that there is a split among participants–those whose choice of color ordering matched the fixed-policy robot’s sequence of actions and those that did not–which is not an issue for $S_{UHTP}$ due to its adaptive properties.

Comparing Participant’s Responses for Different Orderings of Scenarios: We further analyze potential ordering effects in participants’ self-reported responses by plotting participant responses across the two scenario orderings in Figure 14. The left sides of Figures 14(a) and 14(b) show the responses for participants who interacted with $S_{fixed}$ first (ordering $O_{fixed}$), and the right sides of the figures show the responses for participants who interacted with $S_{UHTP}$ first (ordering $O_{UHTP}$). We observe from Figure 14(a) that more participants from $O_{UHTP}$ responded to $S_{fixed}$ with responses like ‘Always’ and ‘Most of the time’ than participants from $O_{fixed}$. Since a lower response is better across all four metrics, this implies that $O_{UHTP}$ participants penalized $S_{fixed}$ more severely than $O_{fixed}$ participants. $O_{UHTP}$ participants whose choice of color order did not match with the fixed policy in $S_{fixed}$ observed the robot changing into a less efficient partner, while the opposite occurred with $O_{fixed}$ participants. We believe that interacting with a robot partner whose performance degraded across conditions elicited a stronger reaction in participants than interacting with a robot partner whose performance improved, resulting in stronger responses from $O_{UHTP}$ participants. The observed trend complements a concept from human psychology studies known as loss aversion, which states that an individual experiences a loss more severely than an equivalent gain [30]. This analysis shows a measurable effect of changing the ordering of participant scenarios on the participants’ responses, although this does not affect participant workload (see Figure 14(b)). We note that this disparity in participant responses does not negate the significant differences in metrics observed between $S_{fixed}$ and $S_{UHTP}$, but we include this observation to show the importance of counterbalancing the ordering of scenarios presented to reduce order effects for our study.

Fig. 14.

8 Conclusions and Future Work

In this paper, we presented a novel algorithm for mutually-adaptive human-robot collaboration using Hierarchical Task Networks. Our algorithm uses real-time activity feedback from human partners to determine the path of lowest expected cost among the possible action sequences remaining in the HTN to inform the robot partner’s decisions. We showed that our collaborative HTN task models require a low amount of developer specification–only which actions can be executed by which agents, and their associated costs–compared to other approaches that use collaborative hierarchical task models, as our representation is autonomously constructed from a single-agent HTN that can be learned from demonstration. We validated that UHTP enables significantly more efficient collaborative execution of assembly tasks across a variety of human behaviors in both simulated and real-world experiments, through mutually-adaptive behavior that allows both partners to take an active role in making task execution decisions without explicit communication. Specifically, our experiments in simulation show that UHTP can improve collaborative task execution in tasks with highly overlapping agent action spaces ($Q_{chair}$) and actions performed jointly by both agents ($Q_{drill}$), and our in-person user study validates that UHTP’s superior performance transfers from simulation to a real-world collaborative task which includes actual human users, multiple possible execution strategies, and an imperfect perception system. Furthermore, our user study allowed us to gather subjective measures of the human partners’ experiences interacting with our adaptive algorithm. Based on our evaluation, supported by statistically significant results, our collaborative HTN representation and adaptive execution algorithm (1) accurately reasons about the human partner’s unique execution strategies in real-time and supports them by providing the necessary parts, (2) minimizes potential interruptions, human idle time, and changes in the human partner’s construction approach due to incorrect robot decisions, (3) minimizes the human partner’s workload, and (4) is the preferred robot behavior by users over the fixed policy robot.

This work represents an initial flexible approach to human-robot collaborative task execution designed to account for multiple types of task costs and support a variety of task structures with minimal task specification requirements. However, as an initial approach, there are many directions to take for future work. It would be interesting to evaluate UHTP with more extensive user studies in more realistic collaborative assembly settings to determine human collaborators’ responses to different adaptive policies and multi-objective cost functions that include terms measuring human workload and ergonomics. To further develop UHTP for realistic collaborative assembly tasks, we consider possibilities for improving our approach’s robustness. Integrating failure handling control components similar to behavior trees [27] could allow us to relax the assumption that primitive actions will be successfully completed once started. Incorporating plan repair methods from the hierarchical planning literature [21, 26] could handle cases where the human collaborator does something novel that is not represented in the task model. There are also potential improvements to be made in modeling collaborative plans more accurately, such as learning an initial estimate for decision node probabilities from human demonstrations, as done by Chen et al. [4], and updating the decision node probabilities and user action costs online to enable long-term user-specific personalization. Additionally, adapting the robot’s average speed to the user’s pace of execution in real time can help improve task fluency.

This work can also benefit significantly from advances in other areas, through exploring extensions with complementary approaches. One example would be to combine this task-level adaptive framework with adaptive trajectory-level optimization, to modify the robot’s motion trajectory to be more legible, visible, and safer for close-proximity human partners [28]. This lower-level adaptation would enable robots to work more closely with humans, unifying our framework’s adaptive high-level task planning with adaptive low-level physical control. Similarly, integrating physical human-robot interaction approaches to allow for joint execution of single primitive actions would further expand the types of tasks that the UHTP framework could apply to. This work is also complemented by advances in human activity recognition–incorporating perception models that can detect fine changes in human actions would enable us to represent collaborative tasks at lower levels of action abstraction, allowing for finer adaptation to human behavior. Additionally, our approach predicts the likelihood of different execution paths chosen by the human based on prior observed behaviors learned in the single-agent HTN. However, uncertainty in perception models caused by inaccurate human activity recognition can lead to an incorrect estimate of the current task state and affect the robot agent’s decision making. We will investigate other human-robot mutual adaptation works [35] that model human behavior as belief states from partial observations to extend our framework towards handling uncertainty in estimating task state. We are also interested in how integrating UHTP with more sophisticated human intent recognition approaches [9, 24] could improve our estimates of expected task cost and improve robustness to sensor uncertainty.

Footnotes

We define communication as any signal from one collaborator that suggests or assigns actions to another collaborator.

Action sets $\mathcal {A}_h$ and $\mathcal {A}_r$ also include a wait action to represent agent idle time.

We add a fourth node type to Chen et al.’s representation [4], by denoting their Sequence node as a fully-ordered sequence node, and adding the partially-ordered sequence node. The representation is equivalent, as partially-ordered sequence nodes can be represented as a decision over fully-ordered sequence nodes.

⁴

As above, partially-ordered nodes that require serial execution can be modeled with our representation as decisions over fully-ordered nodes.

⁵

Four additional participants took part in pilot trials to tune the costs of the HTN and train a classification model to recognize human activity data.

⁶

The tabletop segmentation algorithm is available as a ROS package at https://github.com/GT-RAIL/rail_segmentation

References

[1]

Samir Alili, Matthieu Warnier, Muhammad Ali, and Rachid Alami. 2009. Planning and plan-execution for human-robot cooperative task achievement. In 19th International Conference on Automated Planning and Scheduling (ICAPS’09).

Abstract

1 Introduction

2 Related Work

3 Overview

3.1 Problem Formulation

3.2 Enabling User-aware Human-Robot Collaboration

4 Adaptive Task Planning for Human-Robot Collaboration

4.1 Extending HTNs for Collaborative Tasks

4.2 Online Task Planning with UHTP

5 Simulated Evaluation

5.1 Task Design

5.2 Baseline Conditions

5.3 Results and Discussion

6 User Study Experimental Setup

6.1 Study Design

6.2 Participants

6.3 Experimental Setup

6.4 Action Classification Model

6.5 Evaluation Metrics

7 User Study Results and Discussion

7.1 Hypotheses

7.2 Results

7.3 Discussion

8 Conclusions and Future Work

Footnotes

References

Cited By

Index Terms

Recommendations

Planning with Verbal Communication for Human-Robot Collaboration

Combining human guidance and structured task execution during physical human–robot collaboration

Concurrent Probabilistic Motion Primitives for Obstacle Avoidance and Human-Robot Collaboration

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations