1. Introduction
In recent years, human–robot collaboration (HRC) has emerged as a key solution to address the challenges faced by high-wage countries, where traditional manual labor is becoming increasingly difficult due to a shortage of skilled workers and rising labor costs [
1,
2]. The need for partial automation in assembly processes has grown, driven by the necessity to remain competitive in the global market [
3]. However, planning HRC assembly sequences remains a complex, time-consuming task, especially for small lot sizes [
4]. The task allocation between humans and robots and the generation of appropriate assembly sequence plans (ASP) have been identified as major challenges in implementing HRC [
5,
6,
7]. Developing ASPs often requires substantial effort, as emphasized by Ranz, Hummel, and Sihn [
8] and Fechter et al. [
7], because these plans must be created from scratch for each unique scenario, requiring careful analysis and planning.
The development of effective HRC assembly plans poses several challenges, such as determining appropriate task distribution between humans and robots and optimizing task sequencing to ensure smooth interaction. These challenges are further compounded by several factors: (1)
Heterogeneity of Data: Integrating various sources of information, such as CAD models [
9], 2D drawings, and written instructions, into a cohesive assembly plan is a significant difficulty, (2)
Complex Task Allocation: Task division requires careful consideration of the skills and capabilities of both humans and robots to ensure the best fit for each task [
10]. In our previous work [
11], we used a non-automated approach to decompose a manual assembly sequence into tasks for both humans and robots, relying on a manual criteria catalog [
12] to allocate tasks based on engineering judgment. This process proved challenging due to the need for expertise and deep familiarity with the product, an issue that other researchers have also identified [
13,
14]. (3)
Modes of Interaction: The chosen mode of human–robot interaction—whether synchronization, cooperation, or collaboration—adds complexity to the planning process, and (4)
Mass Customization: The need to tailor assembly sequences for a wide variety of products further complicates the process, necessitating frequent adjustments to accommodate different configurations and requirements [
15].
For manual assembly processes, the automated generation of assembly plans has been addressed in the literature, e.g., [
16,
17,
18]. Initial approaches using extracted CAD information for HRC sequences have also emerged, e.g., [
7,
19]. However, most of those approaches are tied to specific CAD formats [
17] or specific software [
18]. We aim at a general approach using CAD information in meta-format STEP AP242 [
20], 2D drawings (DXF), and assembly instructions (PDF/Excel) covering a large amount of the information required for HRC assembly planning [
21]. Missing information is detected, and an expert is guided via a dashboard interface to manually enrich the data.
The objective of this work is to partially automate and simplify the HRC planning process, reducing its complexity and time-consuming nature. By streamlining and structuring the planning process for HRC assembly paradigms, this work aims to facilitate the transition from current manual assembly processes to more efficient automated systems, therefore promoting widespread industry adoption. To address this challenge, a framework is presented that generates various HRC assembly plans. In this context, “various” implies that the framework generates multiple plans involving various ways in which humans and robots work together, along with different arrangements of steps within the assembly plan. For instance, this allows an adaptation of the assembly plan during runtime. The framework uses four heterogeneous data sources: (1) CAD data, (2) 2D drawings, (3) written assembly instructions, and (4) knowledge from a product expert. The contributions of this work are three-fold:
Presentation of a novel framework “Extract–Enrich–Assess–Plan–Review” (E2APR) that generates multiple alternative assembly sequence plans to enable a dynamic human–robot workflow.
Ability to generate assembly sequences for three different human–robot interaction modalities.
Evaluation results with respect to (a) the level of automation of the planning framework and (b) cycle times for three interaction modalities.
This work extends the preliminary presentation of Schirmer et al. [
22] by (1) adding two more input sources to the framework, (2) creating a hierarchical data structure to facilitate the augmentation process for the expert, (3) incorporating the additional HRI modality of Synchronization, (4) extending the capabilities of the output of the framework, and (5) evaluating the updated E
2APR framework experimentally. In the extended framework, we distinguish three types of human–robot interaction—Synchronization, Cooperation, and Collaboration—all of which involve humans and robots working in close physical proximity. In Synchronization, each agent works sequentially in a shared workspace on separate assembly steps. In Cooperation, both agents work concurrently in the shared area but on different assembly steps.
Figure 1 illustrates Collaboration, where both agents simultaneously engage in the same assembly step.
The remainder of this paper is organized as follows.
Section 2 presents related work. The toy truck use case is described in
Section 3. The use case is employed as a means of elucidating the framework in detail in
Section 4 and to evaluate the results in
Section 5.
Section 6 discusses the results compared to other works in the field, and
Section 7 concludes the paper and identifies future work.
2. Related Work
Our work is centered on holistic Assembly Sequence Planning for Human–Robot Collaboration, with an emphasis on frameworks that utilize CAD data and Methods-Time Measurement (MTM). These frameworks facilitate effective collaboration and enable flexible workflows during the assembly process.
Figure 2 presents a summary of relevant research in these areas.
Although almost all papers performed a capability test and sequencing, work in the field of CAD-based planning (
Extract) did not generate various plans [
23], and the collaboration interaction modality was also omitted. However, the flexible ASP and HRC work often did not use experts to adjust the results (
Review), the only exception being the work by Raatz et al. [
24].
Petzold et al. [
25] conducted a thorough review of the latest advances in task allocation for HRC, classifying them in a systematic way. They distinguish between two categories of task allocation methods: static and dynamic. Subsequently, static methods are subdivided into two categories: those based on suitability assessment [
8,
26] and those supported by static simulation [
4,
27]. They further subdivide the dynamic methods into two categories: ‘reactive and ad hoc’ [
10] and ‘proactive’. In addition, they assess the potential for optimization following the application of the methods. They highlight that static methods are only applicable to large lot sizes and standardized processes, whereas dynamic task allocation methods facilitate a more flexible response to changes during the process flow and allow for a more adaptable distribution of tasks. Petzold et al. posit that dynamic methods are the most effective means of addressing human uncertainties since the final task allocation takes place during the HRC assembly.
According to Petzoldt’s classification, our framework is an optimized, dynamic task allocation framework that can dynamically adapt the originally created static ASPs. In contrast to the work cited in their study, we focus on the automatability and quality of input data extraction. In addition, we use MTM times to create meaningful assembly sequences in the early planning phases. In the following sections, we look primarily at static task allocation methods, which serve as the basis for our reasoning, as well as dynamic methods for adapting the initial static allocations. Together, they provide a holistic approach to the assembly planning of HRC sequences. Additionally, we focus on works on automated CAD data extraction for assembly planning and HRC task allocation using MTM times for sequencing.
2.1. Holistic Approach to HRC Assembly Sequence Planning
The static HRC planning systems found in the literature adhered largely to a framework initially proposed by Beumelburg [
26]. The process began with the creation of a relationship matrix [
28,
29], which was sometimes assumed to be given [
7,
8]. Next, a capability assessment identified which resource—whether a robot, a human, or a combination of both—could perform each assembly step. During scheduling, suitable resources and processing times were assigned to the assembly tasks. Decision-making in this scheduling phase often incorporated weights [
7,
8,
28] that reflected higher-level objectives such as cost, time, and complexity [
8], as well as goals like maximizing parallel activities, increasing automated tasks, minimizing mean time [
28], or managing monotony and break times [
24].
None of the previously mentioned methods considered the generation of flexible assembly sequences. As highlighted by Schmidbauer [
23], only a limited number of approaches existed for developing dynamic assembly sequences tailored to HRC scenarios [
4,
10,
30]. Additionally, few strategies focused on generating sequences specifically for HRC [
24,
30,
31]. The detailed system framework proposed in this paper streamlines the process by partially automating the workflow from CAD, DXF, and PDF/Excel data to produce a comprehensive dynamic assembly sequence plan (ASP) for HRC.
2.2. CAD-Based Assembly Sequence Planning
To our knowledge, few approaches offered a holistic method for generating HRC workflows from CAD data (e.g., [
7,
27,
28,
32]). Typically, these methods extract low-level features, such as hierarchical structures, component names, shapes, or colors, to create relationship matrices or perform capability assessments for HRC tasks. For instance, Fechter et al. [
7] utilized both low-level features (e.g., geometry and weight) and high-level features (e.g., joining operations) to allocate assembly tasks to humans, robots, or a combination of both. Their system relied on a carefully curated database.
In contrast, our framework is designed to handle incomplete data by incorporating an expert-in-the-loop. This expert reviews both the input data and the generated results, similar to the approach taken by [
27]. Consequently, our method is better equipped to handle variations in data quality and completeness.
In addition to data sourced from CAD files, information is also derived from 2D drawings. The AUTOFEAT algorithm developed by Prabhu et al. [
33] enabled the extraction of both geometric and non-geometric data from these drawings. Expanding on this, Zhang and Li [
34] presented a method for establishing associations among data extracted from DXF files. Regarding product variants, several researchers proposed innovative techniques for predicting compliant assembly variations, including the use of geometric covariance and a combination of Principal Component Analysis (PCA) with finite element analysis [
35,
36]. However, these methods did not address data extraction for HRC in the context of assembly sequences.
In summary, there remained a need for a generic, user-friendly tool for HRC sequence planning that could accommodate a wide range of product variations despite the development of numerous innovative methodologies in the literature addressing various aspects of CAD modeling and feature extraction. Our proposed methodology tackles these challenges by fusing multiple information sources for effective data acquisition and creating an advanced data model.
2.3. HRC Assembly Sequence Planning Based on MTM
To enable the planning of robot cycle times in an HRC setting, variations of Methods-Time Measurement (MTM) [
37] were frequently employed. The MTM method facilitated the estimation of processing times for individual work steps performed by both robots and humans, eliminating the need for complex simulations or real-world measurements.
Schröter [
29] built upon Beumelburg’s framework by incorporating specially designed robot process modules to calculate target times based on the MTM-1 system [
38]. Weßkamp et al. [
39] presented a framework for planning HRC ASP based on the criteria catalog also used in our previous work [
12] and a simulation environment to calculate ergonomic factors and cycle times. For robot time estimation, they used a modified MTM-UAS approach that treated robot actions like human actions but multiplied the results by a factor
p since the robot was usually slower than the human in HRC. Weßkamp presented a simple approach to obtain a first estimation of possible human–robot interactions.
Komenda et al. [
40] evaluate five methods for estimating HRC cycle times, including those proposed by [
24,
29,
39], using data from real-world applications. Their findings indicate that Schröter’s MRK-HRC and Weßkamp’s modified MTM-UAS yield comparable results for overall cycle times, both achieving a 5% error margin, despite Weßkamp’s method being simpler. The authors critique MTM-based cycle time estimation methods for HRC, arguing that these are designed for high-volume production scenarios with averaged-trained workers, which is often not applicable in HRC contexts. They advocate for simulation-based methods instead. However, MTM-UAS offers a faster, more precise, and ergonomically integrated solution for analyzing short to medium-cycle tasks in contrast to AI-driven [
41] and simulated [
42] approaches, which require large datasets, significant computational resources, and complex training processes, often lacking the interpretability and ergonomic focus needed for optimizing manual, context-sensitive assembly tasks. Given our objective of distributing tasks between humans and robots with minimal expert input, we adopt Weßkamp’s modified MTM-UAS as our baseline for estimating robot cycle times.
Figure 2.
A review of related work [
4,
7,
8,
10,
19,
23,
24,
26,
27,
28,
29,
30,
31,
32,
39] organized into the five components of our E
2APR framework, adapted from Schirmer et al. [
22], showing the missing holistic perspective.
Figure 2.
A review of related work [
4,
7,
8,
10,
19,
23,
24,
26,
27,
28,
29,
30,
31,
32,
39] organized into the five components of our E
2APR framework, adapted from Schirmer et al. [
22], showing the missing holistic perspective.
4. Extract–Enrich–Assess–Plan–Review Framework
The detailed framework, illustrated in
Figure 4, comprises three layers: the Input Layer, the Application Layer, and the Output Layer, with data flowing from left to right. The input data includes CAD files in STEP format [
43], 2D drawings in DXF format, and assembly instructions for manual assembly in PDF/Excel format. These data are processed in the Application Layer through units known as
Extract,
Enrich,
Assess,
Plan, and
Review. The Review unit involves an expert who contributes by filling in missing information and evaluating the outputs of each unit. The final output of the E
2APR framework consists of dynamic ASPs for three different human–robot interaction modalities, along with an assembly catalog detailing the assembly steps and components. To evaluate the framework, we assess two key aspects: (1) the degree of automation achieved by the Extraction Unit for different CAD formats and (2) the cycle times of ASPs for the three interaction modalities generated by the Planning Unit.
4.1. Extraction Unit
The Extraction Unit processes diverse input data for feature extraction and assembly information. This includes (1) CAD files in STEP formats (detailed information is provided in
Table 1), (2) Drawing Interchange File Format (DXF), and (3) a combination of Portable Document Format (PDF) and tabulated Excel data [
21]. The unit’s output is a data model of the product, as shown in
Figure 5. The model comprises detailed component information, assembly information and a hierarchical structure designed to identify the order and components for each sub-assembly.
Our method for extracting CAD data builds upon the research of Ou and Xu [
46], utilizing assembly constraints and contact relationships among the components. By employing a disassembly-oriented strategy [
47] on the final product, we can dissect the assembly into smaller sub-assemblies and atomic components, revealing their hierarchical positioning within the final product. In addition, both the functional and geometric relationships extracted for each component are incorporated into its respective relationship matrix (
Section 4.2). As a foundation to extract the CAD information, we use the Open CASCADE Technology library (accessed on 4 January 2025,
https://dev.opencascade.org/project/pythonocc).
We extend this approach by incorporating additional information about product variants from accompanying DXF files. To achieve this, we developed a variant extraction algorithm that automatically extracts relevant data from the DXF files and enriches the component information accordingly.
One further extension involves integrating supplementary information about the assembly steps from pre-existing manual assembly instructions. These instructions include detailed information about the
components needed for an assembly step,
action verbs (e.g., “join” or “screw”) and used
tools (e.g., “screwdriver” or “hammer”). We use a model named
de_core_news_sm (accessed on 4 January 2025,
https://spacy.io), which is trained on German newspaper reports, to identify those keywords from assembly instructions provided as Excel or PDF files. Since the standard model was not able to identify
tools from the assembly instructions, we retrained the model using transfer learning with synthetic data generated from ChatGPT (accessed on 4 January 2025,
https://openai.com). Example sentences (blue) with a
tool and its exact position in the sentence, determined by its start and end character (gray) and labeled as such, served as input:
By implementing the updated model, we achieved an accuracy rate of approximately 90% for detecting the keywords. All extracted information from CAD, DXF, and PDF/Excel are combined in an assembly step catalog and are stored in a MongoDB (accessed on 4 January 2025,
https://www.mongodb.com/).
4.2. Enrichment Unit
Given that all variations of CAD input data must be managed and that CAD files are often not well maintained in practice, it is essential to address incomplete data. We have created a dashboard that allows a product expert to incrementally add missing information. The expert can modify all components, actions, and tools for a specific step or add missing information, which is highlighted in the dashboard.
Figure 6 shows the dashboard’s navigation options.
First, the
Assembly Overview (1) provides detailed information about each component, including its quantity, names, and images. Second, the
Skill Matrix (2) shows adaptable human capabilities (e.g., skills, arm length, height), as well as adaptable robot capabilities (e.g., available grippers, payload capacity, precision). Third, the
Relationship Matrix (3) depicts how components interact, indicating whether they are connected and in what way. Each row and column represents a component, and each entry corresponds to a specific relationship—such as a geometric interface (e.g., a contact surface) or a functional dependency (e.g., a screw fastening one component to another). Fourth, the
Location (4) section defines the positions of all components, sub-assemblies, and the final product in the workspace for MTM calculations (
Section 4.4). A world coordinate system establishes both the absolute distance to an origin and the relative distances of each component. Fifth, the
Hierarchy (5) section, which includes an editable version of
Figure 5, organizes the final product into its sub-assemblies and associated components. Sixth, the
Action/Tool/Resource (6) section lists the actions, tools, and resources allocated to each assembly step.
Process Time then provides the duration of each assembly step for both humans and robots, derived from MTM calculations (
Section 4.4). Seventh, the
Criteria Catalog (7) and the
Capability Level support the Assessment Unit (
Section 4.3) by offering a detailed catalog for analyzing each assembly step. Eighth, the
Assembly Sequences (8) present the results from the planning unit (
Section 4.4) as various output formats.
Within the tab
Action/Tool/Resource of the dashboard, the domain expert obtains a more detailed view of the planning process, as shown in
Figure 7, and influences the possible forms of human–robot interaction by handling atomic tasks, which will be further discussed in
Section 4.4.3.
The expert can view those assembly steps in three increasingly granular levels: (1) Basic Operations, (2) Movement Sequences, and (3) Basic Movements. This subdivision is taken from the MTM framework [
37] and is required for precise planning of the assembly sequences as described in
Section 4.4. For human–robot interaction in Synchronization or Cooperation, it is sufficient to consider the level of Basic Operations. In Collaboration mode, the level of Basic Movements is required so that the expert can add additional Basic Movements such as “hold”, which enables the robot to work as a third hand (see
Figure 1 on the right).
4.3. Assessment Unit
Each assembly step requires an evaluation to determine its suitability to be executed by either human or robot. This evaluation facilitates a detailed comparison between the capabilities of the human and the robot by referencing the skill matrices that include metrics for dexterity, precision, sensory requirements, and ergonomic considerations. These skill matrices are aligned with a criteria catalog, which provides a structured framework for evaluating task requirements. For each task, the criteria catalog identifies parameters such as the complexity of the assembly, force application, the necessity for fine motor skills, or the ability to adapt to unexpected variations. The suitability of humans and robots for the task is then quantitatively assessed against these parameters. The evaluation process incorporates weighted scoring, where each criterion is assigned a specific importance level depending on its relevance to the task.
The resulting suitability score, expressed as a percentage, reflects the alignment of the task’s demands with the inherent capabilities of both resources. For instance, a task requiring high adaptability and complex decision-making might yield a higher score for the human, while repetitive, high-precision operations would likely favor the robot. This method, as outlined by Beumelburg [
26], ensures an objective and consistent allocation of tasks, leveraging the strengths of both humans and robots in the assembly process.
A key contribution of this work is the pre-population of the criteria catalog with task-specific information, utilizing a decision tree classifier trained to streamline the initial setup of the planning process. The model is trained on historical task data collected from prior assembly processes, encompassing parameters such as task complexity, ergonomic factors (e.g., postural strain, repetitive motion), required precision (e.g., micrometer tolerances), force application needs, cognitive demands, environmental constraints (e.g., temperature or lighting conditions), and cycle time requirements. The criteria catalog acts as a structured repository for defining task requirements in HRC. Fields in the catalog are automatically populated by the classifier, which uses the historical data patterns from toy truck variants with different dimensions to assign initial values for each criterion. This process significantly reduces manual input effort by pre-filling likely parameter values based on learned correlations from the dataset. To ensure the suggested values are accurate and contextually relevant, an expert reviews the pre-populated catalog. They identify and correct any missing, ambiguous, or incorrect entries directly within the model’s hierarchical tree structure, which enables logical and task-specific refinements. This structure allows for fine-grained adjustments while preserving the interpretability of the decision tree model.
This hybrid approach—automated pre-population with human validation—balances machine efficiency with expert oversight, making the system adaptable to various applications. Furthermore, it supports iterative learning, as corrections and updates made by the expert are integrated back into the training dataset, continuously enhancing the decision tree’s predictive performance for future tasks.
4.4. Planning Unit
Planning is made in three stages: (1) the order of the assembly steps is derived from a
relationship matrix and represented as a
directed graph, (2) tasks are allocated to human and robot, including the decision about the interaction modality, using the results from the Assessment Unit, and (3) multiple assembly sequence plans are generated considering time, cost and complexity. All stages are illustrated via the toy truck use case shown in
Figure 3.
4.4.1. Task Order of the Assembly Sequence
Using the hierarchical product structure with its different levels and interrelations as shown in
Figure 5, an assembly
relationship matrix is automatically generated. This matrix displays the pairwise connection relationships between all components of the assembly in a tabular format, incorporating all relevant relations and constraint data obtained from the CAD file. The expert has the option to remove faulty relationships or add restrictions that could not be derived from the data alone. The constraints from the relationship matrix, along with the hierarchical levels of the data model, determine the sequence of assembly steps. The outcome of the first stage of the Planning Unit is a
directed graph as shown in
Figure 8.
Node represents the starting point, while node denotes the endpoint of the assembly sequence that culminates in the final product FP. The intermediate nodes correspond to each assembly step within the sequence.
Table 2 outlines the assembly steps for the toy truck example. The initial stage of the directed graph demonstrates a parallel process comprising assembly steps 1 and 2. Sub-assemblies
SA1 and
SA2 illustrate the sequential progression of these assembly steps.
4.4.2. Task Allocation for Human and Robot
In the second phase, the results of the Assessment Unit are integrated into the
directed graph. First, tasks that can only be performed by humans are assigned to them (blue-colored nodes), and the same applies to tasks that can only be performed by robots (green-colored nodes) as illustrated in
Figure 9. The sub-assemblies
SA2 and
SA3, containing possible tasks for a robot (
C4 and C5) and human-only tasks (
SA1), allow for multiple types of human–robot interactions. The Planning Unit distinguishes between three interaction modalities: (1) Synchronization, (2) Cooperation, and (3) Collaboration.
In the toy truck use case, the task assignment results in six distinct assembly sequence plans, as illustrated in
Figure 9. During Synchronization and Cooperation mode, the robot places components
C4/C5 into a mounting bracket within the shared workspace. Following this, the human integrates 2×
SA1 with
C4/C5 to create
SA2/SA3. In Synchronization mode, only one agent is permitted in the shared workspace at a time, while in Cooperation mode, both agents can operate simultaneously within the workspace. In Collaboration mode, the robot retrieves components
C4/C5, positions the axle, and holds it steady, allowing the human to concurrently combine 2×
SA1 with
C4/C5 to produce
SA2/SA3.
As shown in
Figure 8, the order of
SA2 and
SA3 can be swapped, resulting in three more assembly sequence plans. There are two ASPs per interaction modality yielding the option of adapting the execution of the assembly plan during the actual assembly. This property of our system allows reaction to unforeseen circumstances during operations, e.g., a short-term bottleneck in material supply.
4.4.3. Determination and Sequencing of the Assembly
Algorithm 1, which determines the task allocation mentioned above, will be presented next. Our approach builds upon the research conducted by Johannsmeier et al. [
10] and incorporates an additional focus on complexity. In addition, we distinguish three distinct interaction modalities: Synchronization, Cooperation, and Collaboration.
Algorithm 1 Determination of dedicated tasks for humans or robots, adapted from Schirmer et al. [22]. |
Input sequence where x = 1,2, …, n Assignment of human or robot to input for to n do if then else end if end for
|
For Synchronization and Cooperation mode, human or robot tasks will be allocated to the Basic Operations as described in
Section 4.2. In contrast, Collaboration mode divides the Basic Operations into Basic Movements. We use the Basic Movements
reach,
grasp,
bring,
release and
join, based on the work of [
39] and extend them by the
hold which utilizes the robot as a third hand. The expert is able to rearrange and add new Basic Movements as needed. For Synchronization and Cooperation mode, Algorithm 1 takes an input sequence
I of a single Basic Operation (
). For Collaboration, the algorithm takes Basic Movements (
) as an input. Based on the given properties of time, cost, and complexity the algorithm yields a score in percent. The score expresses the suitability of the human and robot to conduct a Basic Operation (in the case of Synchronization and Cooperation) or Basic Movement (in the case of Collaboration).
In Synchronization and Cooperation mode, Algorithm 1 processes an input sequence I consisting of a single Basic Operation (). In Collaboration mode, the algorithm handles input sequences of Basic Movements (). The algorithm evaluates the properties of time, cost, and complexity to generate a suitability score expressed as a percentage. This score indicates how well a human or robot can perform a Basic Operation (for Synchronization and Cooperation) or a Basic Movement (for Collaboration).
The algorithm is detailed as follows. First, the process time (
) for each input sequence
is calculated using:
In this equation,
and
represent the human and robot resources, respectively, with their distinct skill metrics. For human resources, the standard times are obtained from MTM-UAS [
38], while for robots, the times are calculated using the modified MTM-UAS approach [
39]. This approach estimates robot times based on the time it takes a human to perform the same task, adjusted by a speed factor
p to account for slower robot speeds. The factor
p varies depending on the type of interaction:
The baseline factors are set as
for Synchronization,
for Cooperation, and
for Collaboration, with
corresponding to a robot speed of approximately 250 mm/s, as suggested in [
39]. These values reflect the interaction complexity: Synchronization involves simple, predictable movements, Cooperation requires moderate speed for shared tasks, and Collaboration demands dynamic, real-time responsiveness.
Next, the cost factor (
) is computed as follows:
This cost factor, derived from the
skill matrices and
criteria catalog detailed in
Section 4.3, takes into account additional costs such as those for auxiliary devices or specialized grippers. A higher accumulation of these costs results in a higher cost factor for the input
.
Third, the complexity factor is determined using:
This complexity factor considers error probabilities, component handling, and task precision. It includes criteria like whether the robot can handle delicate materials without damaging them and whether the human can apply the required torque to a bolt.
Finally, these three criteria, process time, cost, and complexity, are weighted by an expert using a 3 × 1 vector . This weighting emphasizes the relative importance of each criterion. By default, this vector is initialized with equal weights of 1. However, the expert can modify these values to prioritize specific criteria as needed. The resulting scores for the human () and robot () are expressed as percentages. The resource with the highest score is designated for the input sequence .
The resulting ASP for the toy truck assembly is illustrated in
Figure 10 and discussed in detail in
Section 5.
4.5. Review Unit
The Review Unit involves an expert in two critical stages of the framework: the Enrichment Unit and the Planning Unit (see
Figure 4). This expert, who is familiar with the product and the abilities of both the worker and the robot, plays a key role in enhancing the process. During the enrichment phase, the expert fills in missing data, addressing challenges related to data heterogeneity and gaps that result from different STEP formats. The data model shown in
Figure 5 provides a structured representation of the component hierarchy, which the expert is able to modify if needed. Additionally, the assembly step catalog enables an in-depth representation of the interdependencies and task distribution among the components.
In the Planning Unit, the expert reviews the relationships identified from the assembly relationship matrix and inspects the interaction modality for each assembly step. Following the planning phase, the expert performs a plausibility check on the automatically generated assembly sequences. In addition, the expert can express their preference for the relative importance of cost, complexity, and time by adjusting the weighting factor , resulting in the generation of alternative assembly sequences.
4.6. Output Layer
The E2APR framework yields multiple options for the output format of the assembly sequence depending on the required level of detail downstream. Assembly step catalogs for Basic Movements, Movement Sequences, and Basic Operations are possible.
For example, if assembly instructions for the worker are to be generated from the planned sequences, the information on the individual assembly steps can be output as Basic Operations (e.g., “Join axle holder with two screws”) and enriched with images extracted from the CAD data. If generic robot commands from the catalog are to be derived, it is more suitable to output Basic Movements (e.g., “Reach Load Carrier”, “Grasp Load Carrier”, “Bring Load Carrier”).
5. Experimental Results
To assess the performance of our Extraction Unit, we determine the degree of automation by calculating the ratio of data processed automatically to the total amount of information available, which includes both automatically extracted data and manually enhanced data provided by the expert. We extracted information from the toy truck use case and compared the results to our initial Extract–Enrich–Plan–Review (EEPR) framework, presented in Schirmer et al. [
22]. The original EEPR framework exclusively used CAD data in STEP formats AP242, AP214, and AP203 as its information source. The E
2APR framework presented here additionally includes 2D drawings (DXF) and assembly instructions (PDF/Excel) as information sources.
Table 3 shows the overall results. The E
2APR framework outperforms the original EEPR by 11% for AP203, 12% for AP214 and 9% for AP242. The highest degree of automation is reached for STEP AP242 (88%). Due to an increase in information richness from AP203 to AP242, the level of automation increases for both frameworks. In situations where only AP203 is accessible, the expert must add more missing data compared to the other formats, but our framework remains functional. This shows the adaptability of our holistic framework in accommodating various types of CAD input data.
Additionally, our results focus on the output generated by our Planning Unit. We evaluate three ASPs with the interaction modalities Synchronization, Cooperation, and Collaboration as depicted in
Figure 9. These ASPs are compared to a manual assembly baseline regarding idle time and cycle time. The comparison results are presented in
Figure 10.
Each of the three human–robot assembly plans demonstrates enhanced cycle times relative to manual assembly: (1) Synchronization shows an improvement of 11.24%, (2) Cooperation exhibits a 13.48% improvement, and (3) Collaboration shows a 3.37% improvement.
This time efficiency is primarily because humans and robots can simultaneously execute assembly steps. The parallelizable assembly steps C1 to C3 can be performed either by the robot or the human, whereby the robot requires three times the process time (p = 3 in MTM-UAS) for their execution. For the three human–robot assembly sequences, there are different assignments of the assembly steps resulting from the dependencies of the interaction types: In Synchronization, steps C1 and C2 are assigned to the human and C3 to the robot. In Cooperation, the human performs C1 and the robot C2 and C3. In Collaboration, all three steps are assigned to the human.
Cooperation is faster than Synchronization, although the assembly steps
C4 and
5 take longer (
p = 4). This is due to the effect that in Cooperation, humans and robots are allowed to work in the same workspace at the same time, and therefore
C5 can be processed simultaneously with
SA2, which is not permitted in Synchronization. Collaboration is the slowest human–robot ASP resulting from the highest factor
p and the additional Basic Movements (
C4 (Hold) and
C5 (Hold)) the robot performs. These Basic Movements allow the robot to act as a third hand, as shown in
Figure 1, right side. Although Collaboration is slower compared to the other interaction modalities, the ergonomics for the human improves when a third hand is available.
The incorporation of additional Basic Movements enhances resource utilization, decreasing the robot’s idle time to 11 s, in contrast to 46 s in Synchronization and 24 s in Cooperation. It is important to note that the idle time for the robot in both Synchronization and Cooperation occurs at the beginning and end of the sequence. This allows the robot to engage in other tasks that may not be directly related to truck assembly. For instance, in Synchronization, the robot could potentially operate a second truck assembly station.
6. Discussion
The E
2APR framework introduces a novel approach to sequence planning for HRC by integrating diverse data sources and enabling dynamic task allocation. Experimental results confirm its ability to streamline assembly planning while addressing heterogeneous data and task complexities. Compared to traditional methods, the E
2APR framework addresses significant limitations. The integrated task allocation builds upon the open research from Bänziger et al. [
4] and Müller et al. [
48], enhancing the allocation process by incorporating cost and time considerations. Additionally, the E
2APR framework offers a fine-grained option to assign capabilities to either humans or robots, addressing a gap identified by Pellegrinelli et al. [
49]. Unlike previous works on task allocation [
7,
27,
28,
32,
50], which either focus narrowly on CAD-based feature extraction or omit expert feedback, E
2APR extends this approach by utilizing diverse data sources, including 2D drawings and textual instructions and combining them within a holistic planning structure. For instance, the inclusion of dynamic assembly sequence generation differentiates this work by allowing multiple plans with varying interaction modalities (Synchronization, Cooperation, and Collaboration) to be created and adapted based on operational constraints. These insights affirm the framework’s potential for enabling dynamic and adaptable assembly processes and contribute to the findings of Faccio et al. [
51].
In accordance with the reference model for task allocation for HRC proposed by Petzoldt et al. [
25], the E
2APR is a framework that enables the dynamic allocation of tasks and optimizes the assembly sequence. The E
2APR framework offers the potential for not only dynamic adaptation of the sequencing process but also for variation in the human–robot interaction modality. The objective of our research was to automate as many process steps as possible in the planning of HRC sequences. In comparison to other work in this field, our approach not only automated the suitability assessment, task allocation, and optimization but also placed a special focus on the extraction of heterogeneous input data. The expert is nevertheless involved in every step of the process as a reviewer, thus ensuring that their preferences can be incorporated into the planning at an early stage and that goals for the optimization of the assembly sequence can be defined.
As highlighted by Ferraguti et al. [
52], the E
2APR framework provides valuable support to workers by helping prevent uncomfortable postures and leveraging the robot as an assistive partner. Similarly, our results demonstrate the ergonomic benefits of such collaboration, where the robot functions as a “third hand” to stabilize the axle while the worker mounts the screws. This approach enhances ergonomic efficiency, as one human hand can securely hold the axle holder while the other hand operates an automatic screwdriver for fixation.
Despite its strengths, the E2APR framework has limitations that must be addressed to improve its overall effectiveness. The complexity of the expert-in-the-loop mechanism, particularly the usability of the expert dashboard, warrants further investigation into its practicality, highlighting the need for a comprehensive usability study to assess its efficiency. Preliminary evaluations revealed challenges when handling large assemblies, which require extensive manual data enrichment. Additionally, extracting information from assemblies with numerous electronic components causes confusion in the hierarchy, primarily due to the detailed representation of printed circuit boards.
Furthermore, while the framework supports multiple assembly plans, its current implementation lacks real-time adaptability. Integrating a decision-making algorithm could enable real-time dynamic behavior, further enhancing the framework’s strengths by facilitating real-time task allocation and adaptation to operational constraints. Although the E
2APR framework aims for dynamic task allocation, it still lacks the decision-making algorithm discussed in Petzoldt et al. [
25].
As Hentout et al. [
53] point out, there is a gap in transferring research into industrial environments. The E
2APR framework, however, demonstrates potential for seamless integration into industrial settings such as automotive [
54] and medical [
55] industries. Since the domain expert is always involved in the planning process, their knowledge is incorporated into the system, allowing key factors, such as quality and accuracy for medical applications and cycle time for automotive applications, to be appropriately weighted. Beyond assembly, it can enhance inspection and maintenance processes as well. An inspection or maintenance plan consists of several steps. Each step is evaluated based on a criteria catalog and aligned with the skill matrices as described in
Section 4.3.
Incorporating AI can address these limitations by automating criteria updates, enabling real-time task adjustments, and improving ergonomic analysis, making the framework more adaptive and efficient.
7. Conclusions and Future Directions
In this work, we introduced a holistic framework designed to streamline the creation of assembly sequence plans for HRC. Traditionally, creating these HRC assembly sequences involves a labor-intensive manual process carried out by experts. Our E2APR framework provides a novel approach by leveraging product and process data such as CAD, DXF, and PDF/Excel files to automatically generate assembly sequences. The framework integrates an expert at several key stages: data enrichment, adjustment of weighting parameters related to time, cost, and complexity, and overall review of the generated sequences. We demonstrated and assessed our framework using a toy truck assembly case study. The experimental results highlight the framework’s capability to automate the process across three different CAD file formats and its effectiveness in generating assembly sequences for various human–robot interaction modalities, including Synchronization, Cooperation, and Collaboration.
In future research, we plan to test the framework in more intricate industrial scenarios presented by our industry partners, incorporating the
Safety Analysis from our previous work [
56]. We will also compare the cycle times of the generated ASPs with real-time data to evaluate their accuracy. Additionally, we will enhance the pre-population of the criteria catalog, utilizing a prompt engineering approach to compare and evaluate the results. Our ultimate goal is to leverage multiple assembly sequences during the actual assembly process, allowing for dynamic switching between sequences based on real-time information.
Furthermore, including expert feedback from those who have used the framework to validate the dashboard’s usability could be interesting. We will include metrics on end-user satisfaction, focusing on human workers’ perceptions of usability, efficiency, and overall system performance to gain insight into system improvements and their smooth adaptation in an industrial environment. With that, we will statistically evaluate our framework and assess the expected return on investment (ROI) when using E2APR.
Finally, we propose to improve the task allocation algorithm by considering real-time dynamics, such as fluctuations in production demand, worker availability, and unforeseen disruptions. This will ensure that the framework adapts dynamically to changing conditions and optimizes task distribution throughout the assembly process.