[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Contact Interaction of a Rigid Stamp and a Porous Elastic Cylinder of Finite Dimensions
Previous Article in Journal
Computational Fluid Dynamics Simulation and Analysis of Non-Newtonian Drilling Fluid Flow and Cuttings Transport in an Eccentric Annulus
Previous Article in Special Issue
Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrated Scheduling of Multi-Objective Job Shops and Material Handling Robots with Reinforcement Learning Guided Meta-Heuristics

by
Zhangying Xu
1,2,
Qi Jia
1,2,
Kaizhou Gao
1,2,*,
Yaping Fu
3,*,
Li Yin
1,2 and
Qiangqiang Sun
4
1
Macau Institute of Systems Engineering, Macau University of Science and Technology, Macao 999078, China
2
Macau University of Science and Technology Zhuhai MUST Science and Technology Research Institute, Zhuhai 519031, China
3
School of Business, Qingdao University, Qingdao 266071, China
4
School of Information Engineering, Shandong University of Aeronautics, Binzhou 256603, China
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(1), 102; https://doi.org/10.3390/math13010102
Submission received: 8 November 2024 / Revised: 15 December 2024 / Accepted: 25 December 2024 / Published: 30 December 2024
Figure 1
<p>An example of a scheduling solution for multi-objective JSP with MHR.</p> ">
Figure 2
<p>Multi-objective JSP with MHR coding method.</p> ">
Figure 3
<p>Neighbourhood structures.</p> ">
Figure 4
<p>Framework of RL.</p> ">
Figure 5
<p>The framework of the proposed algorithms.</p> ">
Figure 6
<p>Parameter level trend of GA.</p> ">
Figure 7
<p>Parameter level trend of DE.</p> ">
Figure 8
<p>Parameter level trend of HS.</p> ">
Figure 9
<p>Parameter level trend of Q-learning.</p> ">
Figure 10
<p>Parameter level trend of SARSA.</p> ">
Figure 11
<p>Nemenyi post hoc analysis of algorithms on benchmark instances. (<b>a</b>) Nemenyi post hoc analysis of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Nemenyi post hoc analysis of algorithms IGD values.</p> ">
Figure 12
<p>Distribution of ranks for algorithms across benchmark instances. (<b>a</b>) Ranked distribution of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Ranked distribution of algorithms IGD values.</p> ">
Figure 13
<p>Nemenyi post hoc analysis of algorithms on benchmark instances. (<b>a</b>) Nemenyi post hoc analysis of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Nemenyi post hoc analysis of algorithms IGD values.</p> ">
Figure 14
<p>Distribution of ranks for algorithms across benchmark instances. (<b>a</b>) Ranked distribution of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Ranked distribution of algorithms IGD values.</p> ">
Figure 15
<p>Nemenyi post hoc analysis of algorithms on benchmark instances. (<b>a</b>) Nemenyi post hoc analysis of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Nemenyi post hoc analysis of algorithms IGD values.</p> ">
Figure 16
<p>Distribution of ranks for algorithms across benchmark instances. (<b>a</b>) Ranked distribution of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Ranked distribution of algorithms IGD values.</p> ">
Figure 17
<p>Nemenyi post hoc analysis of seven algorithms on benchmark instances. (<b>a</b>) Nemenyi post hoc analysis of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Nemenyi post hoc analysis of algorithms IGD values.</p> ">
Figure 18
<p>Distribution of ranks for algorithms across benchmark instances. (<b>a</b>) Ranked distribution of algorithms <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> values. (<b>b</b>) Ranked distribution of algorithms IGD values.</p> ">
Versions Notes

Abstract

:
This study investigates the integrated multi-objective scheduling problems of job shops and material handling robots (MHR) with minimising the maximum completion time (makespan), earliness or tardiness, and total energy consumption. The collaborative scheduling of MHR and machines can enhance efficiency and reduce costs. First, a mathematical model is constructed to articulate the concerned problems. Second, three meta-heuristics, i.e., genetic algorithm (GA), differential evolution, and harmony search, are employed, and their variants with seven local search operators are devised to enhance solution quality. Then, reinforcement learning algorithms, i.e., Q-learning and state–action–reward–state–action (SARSA), are utilised to select suitable local search operators during iterations. Three reward setting strategies are designed for reinforcement learning algorithms. Finally, the proposed algorithms are examined by solving 82 benchmark instances. Based on the solutions and their analysis, we conclude that the proposed GA integrating SARSA with the first reward setting strategy is the most competitive one among 27 compared algorithms.

1. Introduction

The World Energy Outlook 2023, published by the International Energy Agency, projects that total global energy consumption will grow at an average annual rate of 0.7% by 2030 under the established policy scenario [1]. Manufacturing enterprises utilise over 50% of the country’s electricity and are responsible for producing at least 26% of the total carbon dioxide emissions [2]. Hence, it is vital for industrial organisations to investigate techniques to conserve energy and reduce consumption [3,4]. While the development of energy-efficient equipment and the adoption of new technologies might alleviate this pressure, these approaches typically entail significant costs for manufacturing firms. Green scheduling has been demonstrated to be a viable and efficient method for conserving energy and reducing emissions [5].
Production scheduling in manufacturing involves the distribution of a series of tasks to a group of production resources to attain designated objectives [6]. The job shop scheduling problem (JSP) is a complex optimisation challenge classified as a combinatorial optimisation problem. It is nondeterministic polynomial-time hard (NP-hard). Among the different production scheduling problems, JSP is widely regarded as one of the most typical and complex scheduling problems [7,8,9,10].
For improving the productivity of companies, efficient automated production in modernised fabrication shops is receiving increasing attention from industry and academia. In practice, more and more industries are using automated guided vehicle (AGV) to transport materials [11], and it has significantly improved productivity in recent years [12]. AGV, as a kind of automation equipment, can complete the designated transportation assignments within the specified route or environmental configuration. However, due to the low autonomy of AGV, its behaviour is usually limited by fixed paths and commands, which requires the support of an external control system. With the rapid development of technology, a new type of material handling robot (MHR) has emerged, combining the features of AGVs and traditional robots[13]. With higher autonomy and intelligence, the ability to autonomously plan paths, avoid obstacles, and adapt to different working environments. MHR is gradually becoming the mainstream choice in the field of automated mobile robots [14]. In many modern enterprises, multi-MHR logistics scheduling systems have been rationally used in manufacturing to achieve efficient production [15], especially in manufacturing workshop scheduling problems, such as JSPs [16]. In existing studies, the transportation time in JSPs is attributed to the processing time [17] and transportation machines are sufficient [18]. Currently, scheduling patterns are shifting from mass to individualised production, which requires a more flexible shop floor to respond to complex demands and improve efficiency. In this context, coordinated scheduling between MHRs and machines is particularly important [19].
In the research field of multi-objective scheduling problems, there are relatively few studies on integrated job shops with MHR scheduling. With the increasing level of industrial automation and the growing demand for high-efficiency production in enterprises and factories, MHRs have been widely used in the actual production process, and have become an important tool for improving production efficiency and flexibility. At the same time, the requirements for high efficiency and environmental protection in modern production are becoming more and more prominent, which poses a higher challenge to the scheduling problems.
Considering energy, earliness or tardiness (E/T) and multi-MHR transportation in real production environments, this work investigates the joint scheduling of JSP and MHR. Since conventional JSP is an NP-hard problem, so is the multi-objective JSP with MHR. In this work, three meta-heuristics, genetic algorithm (GA), differential evolutionary (DE) and harmony search (HS), are used and seven local search operators are designed to jump out of the local optimum. At the same time, two reinforcement learning methods, Q-learning and state–action–reward–state–action (SARSA), are employed for dynamic local search to improve the convergence speed.
In comparison with existing works, this work intends to make the following new contributions to the field of JSP with MHR:
(1)
A multi-objective mathematical model for the concurrent scheduling of multi-objective JSP with MHR is devised for the first time.
(2)
Seven local search operators are designed based on the problem characteristics to improve the convergence speed of the used meta-heuristics.
(3)
Three reward strategies and six variants of reinforcement learning (RL) algorithms are designed to select the appropriate local search operator during iterations.
The rest of this paper is organised as follows. Section 2 gives literature review. Section 3 presents a multi-objective JSP with MHR and formulates a mathematical model. Section 4 gives the proposed algorithm. Section 6 presents the outcomes of the experiments, and Section 6 draws conclusions.

2. Literature Review

The JSP with the MHR problem has been extensively studied, and we review and analyse it from the following aspects. Bilge and Ulusoy [20] present the JSP with material transfer by AGVs and solve 90 cases of different scales. The makespan is minimised for the JSP with MHR problems by Hurink and Knust [21], who model the analytic graph and propose a tabu search (TS) algorithm. In a flexible manufacturing system (FMS), Deroussi et al. [22] schedule both machines and vehicles for minimising the overall completion time by utilising three distinct meta-heuristics: simulated annealing, hybrid iterative local search, and the hybrid thereof. In [23], a hybrid mathematical model for FMS including AGVs is built, and an evolutionary algorithm is proposed to solve the multi-objective scheduling problem. Reddy et al. [24] studied both machine and vehicle scheduling issues inside FMS and explored the integrated problem of minimising the completion time, average flow time, and average delay. For decreasing the energy consumption and completion time in an energy efficient flexible shop scheduling problem with transportation constraints, a multi-objective optimisation model is created [25]. Baruwa and Piera [26] conducted the synchronised scheduling of machines and AGVs in manufacturing systems by employing a timed-coloured Petri net. For solving the parallel scheduling problem of a hybrid job shop with AGVs, a parallel heuristic based on two-step decomposition and a mixed integer linear program (MILP) is proposed [27]. In the joint scheduling of production and transportation in FMS, MILP is proposed to make chained decisions for machines and AGVs, interrelating the time constraints of machine processing and AGVs [28]. Yang et al. [29] proposed a cooperative multi-robot scheduling system in an intelligent warehouse, and designed a maximin-based multi-objective algorithm. For minimising the makespan of a scheduling problem with multiple machines and two identical AGVs, a DE-based method is developed [30]. Han et al. [31] investigated an integrated scheduling problem of the flexible job shop scheduling problem (FJSP) with AGVs and proposed a two-population collaborative GA to solve this MILP. Yao et al. [19] proposed a new MILP model for the JSP with mobile robots to minimise the makespan. A practical multi-objective reliability optimisation model for integrating the scheduling of AGV and JSP is developed, and the problem is solved by a novel non-dominated sorted cuckoo search algorithm [32]. Zhou et al. [33] designed a cluster intelligence algorithm with multi-objective disturbance and repair strategies to minimise the TEC of a mobile robot combined with operational criteria.
In recent years, meta-heuristics have gained significant attention in addressing JSP [34,35,36,37,38]. For scheduling both machines and AGVs in FMS, Abdelmaguid et al. [39] introduced a hybrid GA that employs heuristic rules to assign transportation tasks. May et al. [40] engineered a GA-based method to identify a Pareto frontier solution set when solving a production scheduling problem aimed at minimising the energy consumption and completion time. Zhang et al. [41] designed a hybrid GA to solve the JSPs with the total weighted delay time and TEC. A DE-based modal factorisation algorithm (MODEMA) is proposed for the multi-objective job shop scheduling problems (MO-JSP) [42]. For minimising the makespan, earliness, and tardiness, a mixed integer programming (MIP) model is proposed for the multi-objective JSP in robotic cells, along with the development of a meta-heuristic based on the teaching–learning-based optimisation algorithm [43]. Satish et al. [44] integrated machine tool selection heuristics and trolley assignment heuristics in DE to schedule machines and material transportation systems, aiming to minimise both the completion time and cycle time. In FMS, Gnanavel et al. [30] addressed the problem of concurrently scheduling machine tools and AGVs to minimise the completion time using a DE-based method. Pan et al. [45] proposed a local optimal HS with dynamic sub-harmony memory (HM) to minimise the weighted total advance and delay penalties in batch pipeline scheduling problems with equally large sub-batches. In [46], a hybrid-enhanced global optimal HS algorithm was developed to handle the blocked-arrangement flow shop problems, focusing on a completion time standard. Gao et al. [47] used the discrete HS algorithm to address the weighted amalgamation of two objectives: makespan and the average of the earliness or tardiness (E/T) in FJSP. Li et al. proposed a novel parallel evolutionary algorithm (PEA) for the AGV scheduling problem, which integrates meta-heuristics with parallel computing to enhance computational efficiency [48]. In [49], an enhanced HS method is proposed to address the AGV scheduling problems in industrial systems.
RL algorithms are effective in solving JSP and can outperform traditional scheduling algorithms under certain circumstances [50]. Fu et al. [51] reviewed the hybrid use of meta-heuristics and reinforcement learning in the problem of production scheduling, and analyse the importance of reinforcement learning for enhancing meta-heuristics across various dimensions. In recent years, Q-learning and SARSA have gained significant recognition as the most prominent and frequently utilised algorithms in the field of reinforcement learning [52,53]. A Q-learning-based NSGA-II algorithm was proposed to deal with the dynamic FJSP considering limited transportation resources, minimising the production cycle time and TEC [54]. Cui and Yuan [55] proposed a GA based on RL to solve the delay penalty objective and energy minimisation in a photovoltaic glass energy-efficient production scheduling problem, and the experimental results proved the performance of the algorithms based on reinforcement learning by comparing them with the common heuristics. In [56], Q-learning is applied in the distributed assembly-arrangement flow shop scheduling problem, to select a high-quality local search strategy. For minimising the maximum completion time of jobs and maximum workload of factories, an artificial bee colony algorithm based on Q-learning is designed [57]. Lin et al. [58] combined a meta-heuristic with Q-learning to study the multi-objective urban traffic signal scheduling problems. In [59], a variable neighbourhood iterative search based on Q-learning was designed for minimising the smoothing exponent with a threshold on the number of workstations in the disassembly line scheduling problems. For minimising makespan in FJSP, two Q-learning algorithms and two SARSA algorithms are proposed to solve the makespan optimisation problems [60]. Deng et al. [61] researched the fuzzy distributed mixed-flow shop floor scheduling problems for on-time delivery, aiming to maximise the delivery accuracy while minimising the duration and total energy consumption. A three-dimensional distribution estimation algorithm is developed based on reinforcement learning.
Currently, most studies on MHR scheduling mainly rely on meta-heuristic or mathematical models. However, these methods face the challenge of trade-off between computational efficiency and solution quality when dealing with complex multi-objective scheduling problems, and lack the ability of deep learning for dynamic scheduling decisions. Therefore, improving the intelligence and adaptability of scheduling strategies has become an important direction in current research.
This study proposes improved meta-heuristics based on RL for solving the integrated scheduling problem of job shops with MHR. By introducing RL to do the dynamic local search, we can automatically learn more effective decision-making strategies during the scheduling process and improve the speed of convergence. This leads to the effective solution of multi-objective optimisation problems, specifically including the objectives of minimising makespan, E/T, and TEC. The method can improve the efficiency of scheduling, and enhance the adaptability and robustness of the system, which has greater theoretical significance and application value.

3. Problem Descriptions and Mathematical Model

3.1. Problem Descriptions

The description of multi-objective JSP with MHR is as follows: some jobs need to be processed on several machines, with each job containing a few operations. The operations are processed sequentially by different machines. A finite number of MHRs facilitate the transportation of workpieces between machines. The scheduling objective is to determine how to sequence the operations appropriately and assign the jobs to the appropriate MHRs for transportation to minimise different objectives, makespan, E/T, and TEC.
In contrast to traditional JSP problems, the multi-objective JSP with MHR considers the sequence of jobs on each machine, the task assignment for MHRs, and the order of transportation tasks for each MHR. Specifically, the proposed problem considers not only the job’s processing time on machines, but also its transportation time among machines. The layout of machines varies, resulting in different transportation times for MHRs when moving from one machine to another or to the loading and unloading (L/U) area. The movement of MHR is classified into two categories: loading and unloading. This classification is based on the current location of the MHR relative to its originally assigned place for a transportation job. Thus, the multi-objective optimisation for JSP with MHR can be divided into three sub-problems:
  • Organising the order of processing operations—developing a processing sequence tailored for each machine;
  • Transportation task allocation problem—assigning a transportation MHR for each job between machines (L/U areas);
  • Logistical task sequencing issue: arranging the transportation sequence for each MHR.
While satisfying the constraints of JSP, to solve the multi-objective JSP with MHR, we need the following assumptions:
(1)
The number of MHRs is known, and they are exactly the same in terms of speed and load-bearing characteristics.
(2)
MHR congestion and path conflicts are not considered.
(3)
MHR has sufficient power, and there is no malfunction.
(4)
Each operation is sent just by one MHR. Each MHR is capable of managing only a single task at a time.
(5)
Each MHR performs all transportation tasks in sequence.
(6)
Initially, both MHRs and jobs are in the L/U station. When the processing is completed, all MHRs and jobs return to the L/U station.
In particular, after the last step of the process for a job is completed, an MHR is scheduled to transport the completed product to the finished L/U. The transportation time is not counted as a processing time. It is important to note the loading and unloading status of the MHR. A simple example with two jobs, two machines, and one MHR is considered with the processing sequence <0, 1, 0, 1>, and transported by an MHR. Table 1 presents the processing time and transportation durations of MHR. As illustrated in Figure 1, MHR streamlines the handoff of the initial task O 00 from L/U to the assigned machine M 0 , and then back to L/U to transport the next operation O 10 . MHR completes the previous transport task, and then goes to the corresponding machine or L/U to complete the succeeding one. It is worth noting that the grey portion represents the unload state of MHR, and the tasks with the same colour represent the same job. The time to complete the last transportation task from a machine to L/U is not considered into the makespan, but the energy consumption of MHR needs to be considered. As shown in Figure 1, C m a x is the completion time of O 11 (65), but the energy consumption of MHR from L/U to M 0 and from M 0 to L/U later needs to be considered.

3.2. Mathematical Model

We develop of a mathematical model to describe the multi-objective JSP with MHR using the following notations in Table 2.
The mathematical model can be derived as follows:
m i n c ^
m i n Ω = m i n ( i = 1 n c i d i n )
m i n E = m i n ( E ¯ + E + E )
c ^ a ¯ i j + T ˜ i j , i
l r x i j l = 1 , i , j , l
a ¯ i j a ¯ i j + T ˜ i j V · Y i j i j , i , j , i , j , j < j
a ¯ i j a ¯ i j + T ˜ i j V · ( 1 Y i j i j ) , i , j , i , j , j < j
a ¯ i j a ˜ i j + t ˜ i j , i , j
a ˜ i j a ¯ i ( j 1 ) + T ˜ i ( j 1 ) , i , j , j 1
a ˜ i j a ˜ i j + t ˜ i j + T ^ b i j a i j V · ( 2 X i j l X i j l ) V · Z i j i j l , i , j , i , j , l , j < j
a ˜ i j a ˜ i j + t ˜ i j + T ^ b i j a i j V · ( 2 X i j l X i j l ) V · ( 1 Z i j i j l ) , i , j , i , j , l , j < j
d i = ( 1 + T · n m ) · j = 1 n i p t i j , i , j
i t k = i = 1 n j = 1 o i = 1 n j = 1 o ( s i j k c i j k ) · β i j i j k , i , j , i , j , k
E ¯ = i = 1 n j = 1 o k = 1 m p i j k · α i j k · e ¯ k , i , j , k
E = k = 1 m e k · i t k , k
E ˜ = i = 1 n j = 1 o l = 1 r X i j l t ˜ i j · e ˜ , i , j , l
Formulas (1)–(3) are the three objectives. Constraint (4) ensures that the total processing time never falls below the completion time of the last operation for any given task. Constraint (5) specifies that each operation is limited to selecting only one MHR for transportation. Equations (6) and (7) ensure that there is a unique priority assigned to any two processing tasks that are executed on the same job. Constraint (8) means that the processing task cannot commence until its transport task has been finished. Equation (9) dictates that the start time for transportation of all operations, aside from the initial one, must be greater than or equal to the end time of the previous operation. Equations (10) and (11) establish the sequential connection between any two transportations of the same MHR, guaranteeing an MHR only carrying out one transportation at a time. If an MHR is not at the designated location (machine) for a specific transportation task, it must move from its current position to the designated location (machine) first. Equation (12) calculates the due date for each job. Constraint (23) computes the idle times of machines. Equations (14)–(16) are utilised to calculate E ¯ , E and E ˜ .
In this research, our aim is to optimise the three objective Functions (1)–(3). The concepts regarding multi-objective optimisation are briefly introduced as follows. A multi-objective optimisation problem is defined as:
m i n F ( x ) = ( f 1 ( x ) , f 2 ( x ) , , f n ( x ) )
where x represents a dimensional decision vector. Multi-objective optimisation problems can utilise the Pareto dominance relation to find a set of solutions that do not dominate each other. Specifically, a solution vector A = ( A 1 , A 2 , …, A m ) is a Pareto non-dominated solution if no other solution vector B = ( B 1 , B 2 , …, B m ) satisfies the following conditions:
  • For all objective functions j , A j B j ;
  • There exists at least one objective function k such that A k < B k
In this research, we utilise an archive set rewarded as A to keep track of non-dominated solutions, where each solution is assessed against all other solutions in the A . In multi-objective optimisation problems, the algorithm must seek the non-dominated solutions located on or near the optimal Pareto front [62].

4. Proposed Algorithm

4.1. Encoding and Decoding

A two-layer encoding approach, consisting of operation sequences (OSs) and MHR sequences (MHRSs), is designed. The lengths of OS and MHRS are contingent upon the quantity of operations performed. The OS layer comprises job indexes and is responsible for determining the sequence and arrangement of all operations in terms of processing and transportation. The MHRS layer is responsible for conveying the indexes of MHRs that carry the associated tasks, referring to Figure 2 for specific implementations.
The determination of decision variables can be made based on the two-layer chromosome. Based on the chromosome combinations ( O i j , r), X i j l can be determined. Y i j i j and Z i j i j can be determined by the sequential assignment between combinations. The decoding process for finding makespan is shown in Algorithm 1.
Algorithm 1: Decoding process
Mathematics 13 00102 i001

4.2. Meta-Heuristics

Meta-heuristics are general-purpose optimisation algorithms that can be used to solve various types of complex optimisation problems. They find the near-optimal solutions through iterative search and diversity strategies, and have important practical applications to NP-hard problems. Meta-heuristics include many different types, such as GA, DE, HS, TS, simulated annealing (SA), and particle swarm optimisation (PSO). In this study, three meta-heuristics, namely GA, DE, and HS, are adopted, and each of which is based on different ideas and strategies. GA first generates a series of random solutions as an initial population, and then determines the fitness values of these solutions. Based on these values, a selection process occurs to determine which solutions will undergo crossover and mutation. The crossover and mutation broaden the search space, where the crossover perpetuates the superior genes of the parent individuals, and the mutation operation may produce individuals with more diversified genes than the current dominant ones. The mutation operation facilitates jumping out of local optimal solutions while increasing the probability of random search. The specific steps of GA in this study are shown in Algorithm 2. In Algorithm 3, in contrast to GA, which operates more directly on individuals, DE performs search optimisation through differential variational and probability-based selection strategies. HS accomplishes the optimisation by iteratively adjusting the solution variables in the harmonic memory (HM), so that the function values converge with the number of iterations. The specific implementation is illustrated in Algorithm 4.
Algorithm 2: Genetic Algorithm
Mathematics 13 00102 i002

4.3. Local Search

Seven local search operators are crafted and tailored specifically to align with the features of multi-objective JSP with MHR. In the following contents, the neighbourhood structures of the seven local search operators are presented.
  • Swap: Two tasks are randomly selected to swap in the sequence of operations and in the sequence of MHR transportation tasks, respectively. The process is shown in Figure 3a.
  • Inverse: Randomly select two tasks from the solution, switch their places, and reverse the segment of tasks between these two positions. The procedure is shown in Figure 3b.
  • Insertion: Select two tasks at random in the operation sequence and the MHR transportation sequence, respectively. Their positional relationship is determined, and the latter task is inserted into the location of the former task. The tasks following the insertion point are then moved back by one position in the sequence. The exact technique is outlined in Figure 3c.
  • Binding insertion: Randomly select two tasks from the operation sequence and the MHR transportation sequence, respectively. Next, individually, put them as a whole unit into every conceivable spot inside the sequence. More specific details are described in Figure 3d.
  • Block insertion: Randomly identify two distinct points within the sequence of operations and the MHR transportation sequence, consolidate the tasks between these two positions into a block, and insert it into all feasible positions. This is specifically described in Figure 3e.
  • D and C insertion: Deletes multiple tasks from the operation sequence and the MHR transport sequence. Insert the deleted tasks one by one randomly into all possible positions in the sequence. The specific procedure is shown in Figure 3f.
  • Double swap: Two exchange operations are performed in the operation sequence and the MHR transportation sequence, respectively, as described in Figure 3g.
Algorithm 3: Differential Evolution
Mathematics 13 00102 i003

4.4. Reinforcement Learning

RL is a type of machine learning. RL trains an agent to keep interacting with the environment, and the agent receives a feedback (i.e., reward) from the environment after taking particular action in the current state. The specific process of RL is shown in Figure 4.
Both Q-learning and SARSA are RL algorithms [63]. They have different ways of updating the Q-table. Q-learning is an off-policy algorithm, while SARSA is on-policy. The following equations demonstrate the method of updating the Q-value for Q-learning and SARSA, respectively.
Q ( s , a ) Q ( s , a ) + α ˜ r ˜ + γ m a x a Q ( s , a ) Q ( s , a )
Q ( s , a ) Q ( s , a ) + α ˜ r ˜ + γ Q ( s , a ) Q ( s , a )
where Q ( s , a ) represents the Q-value associated with taking action a in state s. α ˜ signifies the learning rate, which controls the magnitude of each update. r ˜ is the reward obtained immediately after performing action a. The discount factor γ serves to weigh the significance of immediate rewards against those anticipated in the future. The s in Q-learning denotes the next state that the agent moves to after performing an action. The a and a denote potential actions in the subsequent state s . In SARSA, s denotes the new state observed by the agent after executing the action a and a denotes the action chosen in the new state s .
Algorithm 4: Harmony Search
Mathematics 13 00102 i004
In Q-learning, the expected maximum Q-value by all possible actions in the next state is a factor to update the Q-value under the current state. However, the action with the expected maximum value may not necessarily be executed in the next state. In SARSA, one Q-value by an action in the next state is also used to update the current Q-value, and the action will be executed in the next state.
In this work, seven local search operators are set as the states in Q-learning and SARSA algorithms, while the seven local search operators are also identified as the actions. Table 3 illustrates that the rows of the Q-table correspond to the local search operators, while the columns also denote the seven local search operators (A0–A6).

Rewards Setting Strategies

For Q-learning and SARSA, the reward setting can affect their performance. This study designs three reward setting strategies (S1, S2, and S3).
S1:
In each iteration, one non-dominated solution set (Set 1) is obtained from the solutions in the new population. The non-dominated solution set is used to update the AS. The reward is set based on the degree to which the AS is updated.
R = c / l e n ( A )
where the c is the number of solutions in Set 1, which can dominate at least one solution in A . If there is no solution from Set 1 dominating one solution in A , the value of c is 0.
S2:
In each iteration, the solutions in the new population are used to update the non-dominated solutions in A . The coverage of updated A is compared to that of the original A . The reward is set by the coverage difference between them. A higher coverage indicates a more significant improvement and the reward increases accordingly. The reward formula is described as follows.
P ¯ , P = D / R
R = 2 × ( P ¯ P ) P ¯ > P 1 P ¯ = P 0 P ¯ < P
where D is the largest distance between the maximum and minimum values of all targets. The R is the sum of the ranges of all targets. The P ¯ is Pareto coverage of the updated A , while the P is Pareto coverage of the original A .
S3:
As mentioned in strategy 2, A is updated in each iteration. The hypervolume (HV) [64] difference between the non-dominated solution set in the new A and the set in the original A is employed to calculate the reward of RL algorithms. The specific expression is shown in Equation (23).
R = 0 A = A ¯ HV ¯ HV A ! = A ¯ & len ( A ¯ ) > 1 1 A ! = A ¯ & len ( A ¯ ) = 1
where A is original A , A ¯ is new A . HV represents the HV of A , while HV ¯ is the HV of A ¯ .

4.5. The Framework of the Proposed Algorithms

In this study, we design seven local search operators to improve the performance of the classical meta-heuristics. Two RL algorithms are employed to choose the appropriate local search operators during iterations. Reward values are assigned to the local search operators based on their performance. The improved algorithm framework is shown in Figure 5.

5. Experiments and Discussions

5.1. Experimental Setup

We utilise 82 datasets to conduct large-scale validation and assess the effectiveness of all algorithms. The number of machines and processing times in the 82 instances are set according to the JSP benchmark instances in the OR-Library [65], and the transportation times are generated based on the rules established by Weller [66]. The strategy influenced by Gholami [67] is utilised to incorporate due dates into the datasets. The due date for jobs is specified by:
D i = ( 1 + T × n m ) × j = 1 n i P i , j
where T is a fixed parameter, specifically T = 0.3 [62], n denotes the number of jobs, and m represents the number of machines. p i , j signifies the average processing time of operation O i , j over all selected machines. According to the literature [68], the energy consumption is established in the following way: p e k [ 10 , 15 ] kW, i t k [ 1 , 3 ] kW, t e = 2 kW.
We delve into the experimental findings, along with a thorough discussion and analysis of the algorithms employed. The coding is carried out in Python, with all scripts executed on a machine equipped with a 3.40 GHz 13th generation Intel® Core™ i7-13700K CPU and 64 GB of RAM. All codes are running on the Microsoft Windows 11 Pro for the Workstations operating system. Furthermore, the population is set to 20, which can ensure an equitable assessment of all algorithms. Moreover, the termination criterion is determined as 0.3 × n × m × r seconds (s). Each algorithm is run independently 20 times.
It is worth noting that the real Pareto front (PF) is difficult to obtain. The non-dominated solutions by all algorithms are used as approximate PF. The first metric is the ρ measure, which first involves determining the intersection of the set of individuals generated by the algorithm with the approximate PF, followed by computing the ratio of this intersection to the total number of approximate PF. A higher ρ value indicates the superior algorithm performance [69]. The second metric is the inverted generation distance (IGD), a combined measure of convergence and distribution of solutions [70]. A smaller IGD value indicates better algorithm performance. The formulas for the two metrics are shown as follows:
X = Q P
ρ = X P
I G D = v p min d ( v , Q ) P
where Q represents the collection of optimal Pareto solutions derived from one algorithm, and d ( v , Q ) denotes the smallest Euclidean distance between an individual v in Q and the population P. P denotes the set of points evenly distributed along the true Pareto front, with P representing the number of individuals within this set.

5.2. Parameter Setting

The parameter configuration of an algorithm can affect its performance. We utilise the example of y n _ 120 × 20 to evaluate the merits and drawbacks of various parameter combinations across different algorithms. Each parameter is executed independently 20 times, and HV is used as the response variable. The HV value is usually used to assess the quality of the Pareto front, which reflects the volume or area in the target space occupied by the solution set. A larger value means that the solution set occupies a larger area in the target space, which usually indicates a better quality solution set [71]. In GA, two parameters must be considered: crossover probability ( P C ) and mutation probability ( P M ), where P C { 0.6 , 0.7 , 0.8 , 0.9 , 1 } and P M { 0.2 , 0.4 , 0.6 , 0.8 , 1 } . As can be seen in Figure 6, the largest values of P C and P M in HV metric are 0.8 and 0.4. Therefore, the P C = 0.8 and P M = 0.4 is the optimal parameter combination for GA. The P C and P M also need to be considered in DE, and the range of parameter settings is consistent with GA. As illustrated in Figure 7, the algorithm exhibits the best performance with P C = 0.8 , P M = 0.2 . In HS, there are three important parameters, namely harmony memory considering rate ( H m c r ), pitch adjusting rate ( P a r ), and bandwidth ( B W ) [72]. The B W defines the magnitude of allowed note changes when adjusting notes in harmony, and a larger B W is suitable for global search, which helps explore a larger solution space. Helping to explore a larger solution space, the B W is fixed to 1 in this paper. Set the ratio of H m c r and P a r as follows: H m c r { 0.2 , 0.4 , 0.6 , 0.8 , 1 } and P a r { 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 1 } . According to the trend plot of parameter levels shown in Figure 8, the value of HV is largest under the H m c r = 0.2 and P a r = 0.4 , which is the best choice.
In both Q-learning and SARSA, the learning rate α determines the relative weight given to new experiences versus existing knowledge when updating the Q-values. The discount rate γ governs the importance of future rewards in decision making, influencing the relative value of future rewards compared to immediate ones. To investigate their effects, we designed a series of experimental configurations, testing multiple combinations of α and γ . In Q-learning, we set α { 0.6 , 0.7 , 0.8 , 0.9 , 1.0 } , and γ { 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 1.0 } . As shown in Figure 9, α = 0.9 and γ = 0.4 have the highest HV value. The setting of the ratio of α and γ in SARSA is as follows: α { 0.6 , 0.7 , 0.8 , 0.9 } and γ { 0.1 , 0.2 , 0.3 , 0.4 } . As can be seen from Figure 10, α = 0.9 and γ = 0.3 are the most competitive under the HV metric.

5.3. Effectiveness of the Proposed Strategies

To validate the effectiveness of the enhancement strategies, we compare the three fundamental algorithms (GA, DE, and HS) with respective variants (GA_LS, DE_LS, and HS_LS) utilising a random selection of the seven local search operators, as well as the variants combining reinforcement learning algorithms (GA_Q1, GA_Q2, GA_Q3, GA_S1, GA_S2, GA_S3, DE_Q1, DE_Q2, DE_Q3, DE_S1, DE_S2, DE_S3, HS_Q1, HS_Q2, HS_Q3, HS_S1, HS_S2, and HS_S3).
The results for two metrics of these algorithms are reported in Tables S1–S6 in the Supplementary File. Here, the Friedman test is performed to verify the performance differences among the fundamental algorithms and their respective variants. When the asymptotic significance (Asympt. Sig.) value is less than 0.05, this means that there is a significant difference among the algorithms. Then, we conduct the Nemenyi post hoc test ranking for GA, DE, HS, and their corresponding variants.
The statistics for GA and its variants are shown in Table 4, Figure 11 and Figure 12. In Table 4, both the Asymp. Sig. values of the two metrics for GA and its variants are less than 0.01, which indicates the substantial variances among GA and its variations. As illustrated in Figure 11a, GA_S1 stands out as the top performer, with the highest average ranking value of 6.43 compared to all other algorithms. The same conclusion can be drawn from Figure 11b as GA_S1 has the smallest average ranking value (3.03) under metric IGD. Figure 12a presents the ranking of ρ , while Figure 12b demonstrates the ranking of IGD on all instances. It is clear to see from Figure 12 that GA_S1 performs the best under both the ρ and IGD metrics, ranking eighth under the ρ metric and the first under the IGD metric for most cases.
The statistics of DE and its variants are presented in Table 5, Figure 13 and Figure 14, respectively. DE and its variants possess obvious variability in performance, as the Asymp. Sig. values for ρ and IGD are much less than 0.05 in Table 5. The average ranking of DE_Q1 in Figure 13 is the highest among all algorithms, while it ranks fourth based on the IGD metric. Similarly, the average ranking of DE_Q3 ranks the first under the IGD metric and the fifth under ρ metric. Therefore, both DE_Q1 and DE_Q3 are selected for further comparisons in the next section. The rank distributions of all algorithms for all instances under two metrics are shown in Figure 14.
Table 6, Figure 15 and Figure 16 show the statistics for HS and its variants. The Asymp. Sig. values for both metrics are less than 0.01, which suggests significant variability among HS and its variants. In Figure 15a, the HS_Q1 is ranked first in the average ranking of the ρ metric, while Figure 15b shows its lowest average ranking value (3.86) under the IGD metric. In Figure 16, the rank distributions of all algorithms for all instances are presented under the two metrics.
In summary, the proposed local search operators can improve the performance of the GA, DE, and HS, while the designed RL strategies can further enhance respective variants with local search operators. The aim of integrating local search and RL algorithms are to accelerate the convergence of algorithms for finding local optimal solutions in local solution space. The meta-heuristics combining RL guided local search operators are highly competitive for solving the multi-objective JSP with MHR.

5.4. Comparisons with Other Algorithms

To delve deeper into the effectiveness of the algorithms utilising the designed RL guided local search technique, we select four algorithms (GA_S1, DE_Q1, DE_Q3, HS_Q1) based on the comparisons in Section 5.3. The four algorithms are compared with the latest PEA [48] and two benchmark multi-objective algorithms: the MOEAD [73] and NSGA-II [74]. The results of the two metrics are presented in Tables S7 and S8 in the Supplementary File. As can be seen from Tables S7 and S8, the GA_S1 obtained the 76 best ρ values and 70 best IGD values out of 82 instances. This means that the GA_S1 have stronger competitiveness than its peers.
To highlight the disparity among the comparison algorithms, the Friedman test is performed on the ρ and IGD values. The disparity among the six algorithms is evident in Table 7, where their Asympt. Sig. values are below 0.01 and significantly lower than 0.05. Figure 17 illustrates that the average ranking of the GA_S1 algorithm is 6.72, which is the highest among all algorithms. Conversely, the average ranking value of the IGD is 1.4, which is the lowest among the algorithms. It indicates that GA_S1 outperforms the other six algorithms. Figure 18 shows the rankings of the metrics ρ and IGD, respectively. As can be seen from Figure 18a, GA_S1 is ranked sixth for most instances under the ρ metric. On the contrary, GA_S1 ranks the first for most instances under the IGD metric. It further validates that GA_S1 is strongly competitive. In the comparison of the seven algorithms, PEA is superior to MOEAD, NAGA-II, and HS_Q1 on two metrics. But the results of PEA are far from being as good as those of GA_S1, DE_Q1, and DE_Q3.

6. Conclusions and Future Work

This study develops a mathematical model for the integrated scheduling problems of multi-objective JSPs and MHRs. Three meta-heuristic algorithms are improved to optimise the makespan, earliness or tardiness, and total energy consumption. In the proposed algorithms, seven local search operators are designed. During iterations, the appropriate local search operators are selected by the variants of two kinds of RL algorithms, Q-learning, and SARSA. We assess the efficiency of the suggested algorithms using 82 cases with different scales. The experimental results and comparisons indicate that the GA, integrating the SARSA with the reward setting strategy S1, is the most competitive.
This study optimises resource allocation and job scheduling in the production process through a collaborative scheduling mechanism between the MHR and machines, significantly enhancing the overall processing efficiency of the factory. By enabling the efficient coordination between the MHR and machines, the system minimises idle equipment time and material transport delays, thereby ensuring the continuity and stability of the production process while maximising the resource utilisation efficiency.
Although this study achieved some positive results in the integrated scheduling problem of job shops with MHR, there are still some limitations. First, the reinforcement learning model used in this study assumes that the environment is static, whereas in real-world applications, the production system may be affected by equipment failures, workpiece delays, and other factors, and these dynamics are not fully considered in this study. Therefore, the results of this study may not be fully reproducible in real environments. Second, although we evaluated the performance of the algorithms in several experimental scenarios, the used test set is more limited and fails to cover all types of production scheduling problems.
Our future research intends to the following topics:
(1)
Conduct a more in-depth study of the problems by taking into additional practical constraints, e.g., setup time and blocking.
(2)
Craft specialised local search operators that address the unique aspects of the problems to improve the performance of meta-heuristics.
(3)
Utilise deep Q-network (DQN) to improve meta-heuristics for addressing JSP and MHR scheduling challenges.
(4)
We will collaborate with industry partners to test the proposed model and algorithm in actual production scenarios, which will provide insights into its performance, scalability, and adaptability.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/math13010102/s1, Table S1: Comparison of ρ values for GA and its variants; Table S2: Comparison of IGD values for GA and its variants; Table S3: Comparison of ρ values for DE and its variants; Table S4: Comparison of IGD values for DE and its variants; Table S5: Comparison of ρ values for HS and its variants; Table S6: Comparison of IGD values for HS and its variants; Table S7: Comparison of ρ values for seven heuristics; Table S8: Comparison of IGD values for seven heuristics.

Author Contributions

Conceptualization, Z.X. and K.G.; Methodology, Z.X.; Software, Z.X.; Formal analysis, Z.X. and Q.J.; Investigation, Z.X. and Q.J.; Resources, Z.X.; Data curation, Z.X. and Q.J.; Writing—original draft, Z.X.; Writing—review & editing, K.G., Y.F. and Q.S.; Visualization, Z.X. and Q.J.; Funding acquisition, K.G. and L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study is partially supported by the International Science and Technology project of Guangzhou Development District under Grant 2023GH08: Research on Collaborative Technologies of Multimodal, Multi Configuration, and Multi Structural Elements, the Guangdong Basic and Applied Basic Research Foundation (2023A1515011531), the Science and Technology Development Fund (FDCT), Macau SAR, under Grant 0019/2021/A, the National Natural Science Foundation of China under Grant 62173356, Zhuhai Industry-University-Research Project with Hong Kong and Macao under Grant ZH22017002210014PWC, and the Key Technologies for Scheduling and Optimization of Complex Distributed Manufacturing Systems under Grant 22JR10KA007.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. International Energy Agency. World Energy Outlook 2023. Available online: https://www.iea.org/reports/world-energy-outlook-2023 (accessed on 4 October 2024).
  2. Liu, Y.; Dong, H.; Lohse, N.; Petrovic, S.; Gindy, N. An investigation into minimising total energy consumption and total weighted tardiness in job shops. J. Clean. Prod. 2014, 65, 87–96. [Google Scholar] [CrossRef]
  3. Moreira, L.; Li, W.; Lu, X.; Fitzpatrick, M. Energy-Efficient machining process analysis and optimisation based on BS EN24T alloy steel as case studies. Robot. Comput. Integr. Manuf. 2019, 58, 1–12. [Google Scholar] [CrossRef]
  4. Peng, T.; Xu, X. Energy-efficient machining systems: A critical review. Int. J. Adv. Manuf. Technol. 2014, 72, 1389–1406. [Google Scholar] [CrossRef]
  5. Wang, S.; Zhu, Z.; Fang, K.; Chu, F.; Chu, C. Scheduling on a two-machine permutation flow shop under time-of-use electricity tariffs. Int. J. Prod. Res. 2018, 56, 3173–3187. [Google Scholar] [CrossRef]
  6. Sharma, P.; Jain, A. A review on job shop scheduling with setup times. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2016, 230, 517–533. [Google Scholar] [CrossRef]
  7. Garey, M.R.; Johnson, D.S.; Sethi, R. The complexity of flowshop and jobshop scheduling. Math. Oper. Res. 1976, 1, 117–129. [Google Scholar] [CrossRef]
  8. Xiong, J.; Xing, L.-n.; Chen, Y.-w. Robust scheduling for multi-objective flexible job-shop problems with random machine breakdowns. Int. J. Prod. Econ. 2013, 141, 112–126. [Google Scholar] [CrossRef]
  9. Jiang, P.; Fu, Y.; Zhu, Q.; Zheng, M. Event-driven graphical representative schema for job-shop-type material flows and data computing usingautomatic identification of radio frequency identification tags. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2012, 226, 339–352. [Google Scholar] [CrossRef]
  10. Rui, Z.; Shilong, W.; Zheqi, Z.; Lili, Y. An ant colony algorithm for job shop scheduling problem with tool flow. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 2014, 228, 959–968. [Google Scholar] [CrossRef]
  11. Zou, W.Q.; Pan, Q.K.; Meng, T.; Gao, L.; Wang, Y.L. An effective discrete artificial bee colony algorithm for multi-AGVs dispatching problem in a matrix manufacturing workshop. Expert Syst. Appl. 2020, 161, 113675. [Google Scholar] [CrossRef]
  12. Goli, A.; Tirkolaee, E.B.; Aydın, N.S. Fuzzy integrated cell formation and production scheduling considering automated guided vehicles and human factors. IEEE Trans. Fuzzy Syst. 2021, 29, 3686–3695. [Google Scholar] [CrossRef]
  13. Hichri, B.; Fauroux, J.C.; Adouane, L.; Doroftei, I.; Mezouar, Y. Design of cooperative mobile robots for co-manipulation and transportation tasks. Robot. Comput. Integr. Manuf. 2019, 57, 412–421. [Google Scholar] [CrossRef]
  14. Quemelli, M.B.; Brandão, A.S. Handling and pushing objects using unmanned guided vehicles. Robot. Comput. Integr. Manuf. 2020, 63, 101913. [Google Scholar] [CrossRef]
  15. Li, W.; Han, D.; Gao, L.; Li, X.; Li, Y. Integrated production and transportation scheduling method in hybrid flow shop. Chin. J. Mech. Eng. 2022, 35, 12. [Google Scholar] [CrossRef]
  16. Yuan, M.H.; Li, Y.D.; Pei, F.Q.; Gu, W.B. Dual-resource integrated scheduling method of AGV and machine in intelligent manufacturing job shop. J. Cent. South Univ. 2021, 28, 2423–2435. [Google Scholar] [CrossRef]
  17. Hernández-Gress, E.S.; Seck-Tuoh-Mora, J.C.; Hernández-Romero, N.; Medina-Marín, J.; Lagos-Eulogio, P.; Ortíz-Perea, J. The solution of the concurrent layout scheduling problem in the job-shop environment through a local neighborhood search algorithm. Expert Syst. Appl. 2020, 144, 113096. [Google Scholar] [CrossRef]
  18. Li, M.; Lei, D. An imperialist competitive algorithm with feedback for energy-efficient flexible job shop scheduling with transportation and sequence-dependent setup times. Eng. Appl. Artif. Intell. 2021, 103, 104307. [Google Scholar] [CrossRef]
  19. Yao, Y.J.; Liu, Q.H.; Li, X.Y.; Gao, L. A novel MILP model for job shop scheduling problem with mobile robots. Robot. Comput. Integr. Manuf. 2023, 81, 102506. [Google Scholar] [CrossRef]
  20. Bilge, Ü.; Ulusoy, G. A time window approach to simultaneous scheduling of machines and material handling system in an FMS. Oper. Res. 1995, 43, 1058–1070. [Google Scholar] [CrossRef]
  21. Hurink, J.; Knust, S. Tabu search algorithms for job-shop problems with a single transport robot. Eur. J. Oper. Res. 2005, 162, 99–111. [Google Scholar] [CrossRef]
  22. Deroussi, L.; Gourgand, M.; Tchernev, N. A simple metaheuristic approach to the simultaneous scheduling of machines and automated guided vehicles. Int. J. Prod. Res. 2008, 46, 2143–2164. [Google Scholar] [CrossRef]
  23. Mousavi, M.; Yap, H.J.; Musa, S.N.; Tahriri, F.; Md Dawal, S.Z. Multi-objective AGV scheduling in an FMS using a hybrid of genetic algorithm and particle swarm optimization. PLoS ONE 2017, 12, e0169817. [Google Scholar] [CrossRef]
  24. Reddy, B.; Rao, C. A hybrid multi-objective GA for simultaneous scheduling of machines and AGVs in FMS. Int. J. Adv. Manuf. Technol. 2006, 31, 602–613. [Google Scholar] [CrossRef]
  25. Dai, M.; Tang, D.; Giret, A.; Salido, M.A. Multi-objective optimization for energy-efficient flexible job shop scheduling problem with transportation constraints. Robot. Comput. Integr. Manuf. 2019, 59, 143–157. [Google Scholar] [CrossRef]
  26. Baruwa, O.T.; Piera, M.A. A Coloured Petri Net-Based Hybrid Heuristic Search Approach to Simultaneous Scheduling of Machines and Automated Guided Vehicles. Int. J. Prod. Res. 2016, 54, 4773–4792. [Google Scholar] [CrossRef]
  27. Amirteimoori, A.; Tirkolaee, E.B.; Simic, V.; Weber, G.W. A parallel heuristic for hybrid job shop scheduling problem considering conflict-free AGV routing. Swarm Evol. Comput. 2023, 79, 101312. [Google Scholar] [CrossRef]
  28. Fontes, D.B.M.; Homayouni, S.M. Joint production and transportation scheduling in flexible manufacturing systems. J. Glob. Optim. 2019, 74, 879–908. [Google Scholar] [CrossRef]
  29. Yang, S.; Zhang, Y.; Ma, L.; Song, Y.; Zhou, P.; Shi, G.; Chen, H. A Novel Maximin-Based Multi-Objective Evolutionary Algorithm Using One-by-One Update Scheme for Multi-Robot Scheduling Optimization. IEEE Access 2021, 9, 121316–121328. [Google Scholar] [CrossRef]
  30. Gnanavel Babu, A.; Jerald, J.; Noorul Haq, A.; Muthu Luxmi, V.; Vigneswaralu, T. Scheduling of machines and automated guided vehicles in FMS using differential evolution. Int. J. Prod. Res. 2010, 48, 4683–4699. [Google Scholar] [CrossRef]
  31. Han, X.; Cheng, W.; Meng, L.; Zhang, B.; Gao, K.; Zhang, C.; Duan, P. A dual population collaborative genetic algorithm for solving flexible job shop scheduling problem with AGV. Swarm Evol. Comput. 2024, 86, 101538. [Google Scholar] [CrossRef]
  32. Karimi, B.; Niaki, S.T.A.; Niknamfar, A.H.; Hassanlu, M.G. Multi-objective optimization of job shops with automated guided vehicles: A non-dominated sorting cuckoo search algorithm. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2021, 235, 306–328. [Google Scholar] [CrossRef]
  33. Zhou, B.; Zhu, Z. Multi-objective optimization of greening scheduling problems of part feeding for mixed model assembly lines based on the robotic mobile fulfillment system. Neural Comput. Appl. 2021, 33, 9913–9937. [Google Scholar] [CrossRef]
  34. Spanos, A.C.; Ponis, S.T.; Tatsiopoulos, I.P.; Christou, I.T.; Rokou, E. A new hybrid parallel genetic algorithm for the job-shop scheduling problem. Int. Trans. Oper. Res. 2014, 21, 479–499. [Google Scholar] [CrossRef]
  35. Gonçalves, J.F.; Resende, M.G. An extended Akers graphical method with a biased random-key genetic algorithm for job-shop scheduling. Int. Trans. Oper. Res. 2014, 21, 215–246. [Google Scholar] [CrossRef]
  36. Zhu, Z.; Ng, K.M.; Ong, H. A modified tabu search algorithm for cost-based job shop problem. J. Oper. Res. Soc. 2010, 61, 611–619. [Google Scholar] [CrossRef]
  37. Zobolas, G.; Tarantilis, C.D.; Ioannou, G. A hybrid evolutionary algorithm for the job shop scheduling problem. J. Oper. Res. Soc. 2009, 60, 221–235. [Google Scholar] [CrossRef]
  38. Tavakkoli-Moghaddam, R.; Safaei, N.; Kah, M. Accessing feasible space in a generalized job shop scheduling problem with the fuzzy processing times: A fuzzy-neural approach. J. Oper. Res. Soc. 2008, 59, 431–442. [Google Scholar] [CrossRef]
  39. Abdelmaguid, T.F.; Nassef, A.O.; Kamal, B.A.; Hassan, M.F. A hybrid GA/heuristic approach to the simultaneous scheduling of machines and automated guided vehicles. Int. J. Prod. Res. 2004, 42, 267–281. [Google Scholar] [CrossRef]
  40. May, G.; Stahl, B.; Taisch, M.; Prabhu, V. Multi-objective genetic algorithm for energy-efficient job shop scheduling. Int. J. Prod. Res. 2015, 53, 7071–7089. [Google Scholar] [CrossRef]
  41. Zhang, R.; Chiong, R. Solving the energy-efficient job shop scheduling problem: A multi-objective genetic algorithm with enhanced local search for minimizing the total weighted tardiness and total energy consumption. J. Clean. Prod. 2016, 112, 3361–3375. [Google Scholar] [CrossRef]
  42. Qian, B.; Wang, L.; Huang, D.X.; Wang, X. Scheduling multi-objective job shops using a memetic algorithm based on differential evolution. Int. J. Adv. Manuf. Technol. 2008, 35, 1014–1027. [Google Scholar] [CrossRef]
  43. Li, X.; Yang, X.; Zhao, Y.; Teng, Y.; Dong, Y. Metaheuristic for solving multi-objective job shop scheduling problem in a robotic cell. IEEE Access 2020, 8, 147015–147028. [Google Scholar] [CrossRef]
  44. Kumar, M.S.; Janardhana, R.; Rao, C. Simultaneous scheduling of machines and vehicles in an FMS environment with alternative routing. Int. J. Adv. Manuf. Technol. 2011, 53, 339–351. [Google Scholar] [CrossRef]
  45. Pan, Q.K.; Suganthan, P.N.; Liang, J.J.; Tasgetiren, M.F. A local-best harmony search algorithm with dynamic sub-harmony memories for lot-streaming flow shop scheduling problem. Expert Syst. Appl. 2011, 38, 3252–3259. [Google Scholar] [CrossRef]
  46. Wang, L.; Pan, Q.K.; Tasgetiren, M.F. A hybrid harmony search algorithm for the blocking permutation flow shop scheduling problem. Comput. Ind. Eng. 2011, 61, 76–83. [Google Scholar] [CrossRef]
  47. Gao, K.Z.; Suganthan, P.N.; Pan, Q.K.; Chua, T.J.; Cai, T.X.; Chong, C.S. Discrete harmony search algorithm for flexible job shop scheduling problem with multiple objectives. J. Intell. Manuf. 2016, 27, 363–374. [Google Scholar] [CrossRef]
  48. Li, Z.; Pan, Q.; Miao, Z.; Sang, H.; Li, W. Automated Guided Vehicle Scheduling Problem in Manufacturing Workshops: An Adaptive Parallel Evolutionary Algorithm. IEEE Trans. Autom. Sci. Eng. 2024. [Google Scholar] [CrossRef]
  49. Li, G.; Zeng, B.; Liao, W.; Li, X.; Gao, L. A new AGV scheduling algorithm based on harmony search for material transfer in a real-world manufacturing system. Adv. Mech. Eng. 2018, 10, 1687814018765560. [Google Scholar] [CrossRef]
  50. Zhang, W.; Dietterich, T.G. A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI 1995), San Francisco, CA, USA, 20–25 August 1995; Volume 95, pp. 1114–1120. [Google Scholar]
  51. Fu, Y.P.; Wang, Y.F.; Gao, K.Z.; Huang, M. Review on ensemble meta-heuristics and reinforcement learning for manufacturing scheduling problems. Comput. Electr. Eng. 2024, 120, 109780. [Google Scholar] [CrossRef]
  52. Zeng, Y.; Xiang, K.; Li, D.; Vasilakos, A.V. Directional routing and scheduling for green vehicular delay tolerant networks. Wirel. Netw. 2013, 19, 161–173. [Google Scholar] [CrossRef]
  53. Orhean, A.I.; Pop, F.; Raicu, I. New scheduling approach using reinforcement learning for heterogeneous distributed systems. J. Parallel Distrib. Comput. 2018, 117, 292–302. [Google Scholar] [CrossRef]
  54. Chen, R.; Wu, B.; Wang, H.; Tong, H.; Yan, F. A Q-Learning based NSGA-II for dynamic flexible job shop scheduling with limited transportation resources. Swarm Evol. Comput. 2024, 90, 101658. [Google Scholar] [CrossRef]
  55. Cui, W.; Yuan, B. A hybrid genetic algorithm based on reinforcement learning for the energy-aware production scheduling in the photovoltaic glass industry. Comput. Oper. Res. 2024, 163, 106521. [Google Scholar] [CrossRef]
  56. Yu, H.; Gao, K.Z.; Ma, Z.F.; Pan, Y.X. Improved meta-heuristics with Q-learning for solving distributed assembly permutation flowshop scheduling problems. Swarm Evol. Comput. 2023, 80, 101335. [Google Scholar] [CrossRef]
  57. Zhang, Z.P.; Fu, Y.P.; Gao, K.Z.; Pan, Q.K.; Huang, M. A learning-driven multi-objective cooperative artificial bee colony algorithm for distributed flexible job shop scheduling problems with preventive maintenance and transportation operations. Comput. Ind. Eng. 2024, 196, 110484. [Google Scholar] [CrossRef]
  58. Lin, Z.; Gao, K.; Wu, N.; Suganthan, P.N. Problem-Specific Knowledge Based Multi-Objective Meta-Heuristics Combined Q-Learning for Scheduling Urban Traffic Lights With Carbon Emissions. IEEE Trans. Intell. Transp. Syst. 2024. [Google Scholar] [CrossRef]
  59. Ren, Y.; Gao, K.; Fu, Y.; Sang, H.; Li, D.; Luo, Z. A novel Q-learning based variable neighborhood iterative search algorithm for solving disassembly line scheduling problems. Swarm Evol. Comput. 2023, 80, 101338. [Google Scholar] [CrossRef]
  60. Shao, C.; Yu, Z.; Tang, J.; Li, Z.; Zhou, B.; Wu, D.; Duan, J. Research on flexible job-shop scheduling problem based on variation-reinforcement learning. J. Intell. Fuzzy Syst. 2024, 25, 15053–15064. [Google Scholar] [CrossRef]
  61. Deng, L.; Di, Y.; Wang, L. A Reinforcement-Learning-Based 3-D Estimation of Distribution Algorithm for Fuzzy Distributed Hybrid Flow-Shop Scheduling Considering On-Time-Delivery. IEEE Trans. Cybern. 2023, 54, 1024–1036. [Google Scholar] [CrossRef] [PubMed]
  62. Gao, K.Z.; Suganthan, P.N.; Pan, Q.K.; Chua, T.J.; Cai, T.X.; Chong, C.S. Pareto-based grouping discrete harmony search algorithm for multi-objective flexible job shop scheduling. Inf. Sci. 2014, 289, 76–90. [Google Scholar] [CrossRef]
  63. Watkins, C.J.C.H. Learning from Delayed Rewards; Academic Press: Cambridge, MA, USA, 1989. [Google Scholar]
  64. Van Moffaert, K.; Nowé, A. Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 2014, 15, 3483–3512. [Google Scholar]
  65. Available online: https://people.brunel.ac.uk/~mastjjb/jeb/info.html (accessed on 10 October 2024).
  66. Weller, T.R.; Weller, D.R.; de Abreu Rodrigues, L.C.; Volpato, N. A framework for tool-path airtime optimization in material extrusion additive manufacturing. Robot. Comput. Integr. Manuf. 2021, 67, 101999. [Google Scholar] [CrossRef]
  67. Gholami, M.; Zandieh, M. Integrating simulation and genetic algorithm to schedule a dynamic flexible job shop. J. Intell. Manuf. 2009, 20, 481–498. [Google Scholar] [CrossRef]
  68. Xu, G.; Bao, Q.; Zhang, H. Multi-objective green scheduling of integrated flexible job shop and automated guided vehicles. Eng. Appl. Artif. Intell. 2023, 126, 106864. [Google Scholar] [CrossRef]
  69. Yu, H.; Gao, K.; Wu, N.; Zhou, M.; Suganthan, P.N.; Wang, S. Scheduling Multiobjective Dynamic Surgery Problems via Q-Learning-Based Meta-Heuristics. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 3321–3333. [Google Scholar] [CrossRef]
  70. Pan, Y.; Gao, K.; Li, Z.; Wu, N. Solving biobjective distributed flow-shop scheduling problems with lot-streaming using an improved Jaya algorithm. IEEE Trans. Cybern. 2022, 53, 3818–3828. [Google Scholar] [CrossRef] [PubMed]
  71. Zitzler, E.; Künzli, S. Indicator-based selection in multiobjective search. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Birmingham, UK, 18–22 September 2004; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar] [CrossRef]
  72. Singh, R.; Singh, A.K.; Dwivedi, A.K.; Nagabhushan, P. Computational Methodologies for Electrical and Electronics Engineers; IGI Global: Hershey, PA, USA, 2021. [Google Scholar]
  73. Zhang, Q.; Li, H. MOEA/D: A Multi-Objective Evolutionary Algorithm Based on Decomposition. IEEE Trans. Evol. Comput. 2007, 11, 712–731. [Google Scholar] [CrossRef]
  74. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Figure 1. An example of a scheduling solution for multi-objective JSP with MHR.
Figure 1. An example of a scheduling solution for multi-objective JSP with MHR.
Mathematics 13 00102 g001
Figure 2. Multi-objective JSP with MHR coding method.
Figure 2. Multi-objective JSP with MHR coding method.
Mathematics 13 00102 g002
Figure 3. Neighbourhood structures.
Figure 3. Neighbourhood structures.
Mathematics 13 00102 g003
Figure 4. Framework of RL.
Figure 4. Framework of RL.
Mathematics 13 00102 g004
Figure 5. The framework of the proposed algorithms.
Figure 5. The framework of the proposed algorithms.
Mathematics 13 00102 g005
Figure 6. Parameter level trend of GA.
Figure 6. Parameter level trend of GA.
Mathematics 13 00102 g006
Figure 7. Parameter level trend of DE.
Figure 7. Parameter level trend of DE.
Mathematics 13 00102 g007
Figure 8. Parameter level trend of HS.
Figure 8. Parameter level trend of HS.
Mathematics 13 00102 g008
Figure 9. Parameter level trend of Q-learning.
Figure 9. Parameter level trend of Q-learning.
Mathematics 13 00102 g009
Figure 10. Parameter level trend of SARSA.
Figure 10. Parameter level trend of SARSA.
Mathematics 13 00102 g010
Figure 11. Nemenyi post hoc analysis of algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Figure 11. Nemenyi post hoc analysis of algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Mathematics 13 00102 g011
Figure 12. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Figure 12. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Mathematics 13 00102 g012
Figure 13. Nemenyi post hoc analysis of algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Figure 13. Nemenyi post hoc analysis of algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Mathematics 13 00102 g013
Figure 14. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Figure 14. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Mathematics 13 00102 g014
Figure 15. Nemenyi post hoc analysis of algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Figure 15. Nemenyi post hoc analysis of algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Mathematics 13 00102 g015
Figure 16. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Figure 16. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Mathematics 13 00102 g016
Figure 17. Nemenyi post hoc analysis of seven algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Figure 17. Nemenyi post hoc analysis of seven algorithms on benchmark instances. (a) Nemenyi post hoc analysis of algorithms ρ values. (b) Nemenyi post hoc analysis of algorithms IGD values.
Mathematics 13 00102 g017
Figure 18. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Figure 18. Distribution of ranks for algorithms across benchmark instances. (a) Ranked distribution of algorithms ρ values. (b) Ranked distribution of algorithms IGD values.
Mathematics 13 00102 g018
Table 1. Processing time and transportation time of an instance.
Table 1. Processing time and transportation time of an instance.
Processing TimeTransportation Time
Time O 0 O 1 TimeL/U M 0 M 1
J 0 1310L/U097
J 1 816 M 0 908
M 1 780
Table 2. The notations used in the model.
Table 2. The notations used in the model.
NotationsDefinition
n:The number of jobs to be processed
m:The number of machines
r:The number of MHRs
o:The number of operations of each job
i , i :Index of jobs, i , i ∈ {0,1, …, n}
j , j :Index of operations, j , j ∈ {0,1, …, o}
k , k :Index of machines, k , k ∈ {0,1, …, m}
l:Index of an MHR, l ∈ {0,1, …, r}
J:The set of n jobs, { J 1 , J 2 , …, J n }
M:The set of m machines, { M 1 , M 2 , …, M m }
R:The set of r MHRs, { R 1 , R 2 , …, R r }
V:A large positive number
T:A fixed number
O i j :The j t h operation of job J i
T ˜ i j :The duration required for executing operation O i j
t ˜ i j :The travel time of transportation for operation O i j
a i j :The start position of transportation for operation O i j
b i j :The end position of transportation for operation O i j
T ^ k k :The transit duration from machine k to machine k
a ¯ i j :The start time of processing for operation O i j
a ˜ i j :The start time of loaded transportation for operation O i j
t ¯ i j :The average processing time of operation O i j
p i j k :The execution time of operation O i j on machine k
s i j k :The initiation time of operation O i j on machine k
c i j k :The completion time of operation O i j on machine k
e ¯ k :Energy consumption for unit processing time on machine k
e k :Idle energy consumption per unit on machine k
e ˜ :Energy consumption for unit transportation
t k :Idle time of machine k
c i :The completion time of job J i
d i :The due date of job J i
X i j l :Binary variable that takes value 1 if MHR l is selected to execute the transportation for operation O i j , and 0 otherwise
Y i j i j :Binary variable that takes value 1 if operation O i j is processed precedes operation O i j on the same machine, and 0 otherwise
Z i j i j l :Binary variable that assumes the value 1 if the transportation of operation O i j occurs prior to operation O i j on MHR l, and 0 otherwise
α i j k :Binary variable that assumes the value 1 if operation O i j is executed on machine k, and 0 otherwise
β i j i j k :Binary variable that assumes the value 1 if operation O i j is executed immediately following O i j on machine k, and 0 otherwise
E ¯ :Total processing energy consumption
E :Total idle energy consumption
E ˜ :Total transportation energy consumption
E :Total energy consumption
c ^ :Makespan
Ω :The earliness or tardiness of one job compared to the due date of the job d i
Table 3. Q-table example.
Table 3. Q-table example.
StateAction
A0A1A2A4A5A6
A0 Q ( 0 , 0 ) Q ( 0 , 1 ) Q ( 0 , 2 ) Q ( 0 , 4 ) Q ( 0 , 5 ) Q ( 0 , 6 )
A1 Q ( 1 , 0 ) Q ( 1 , 1 ) Q ( 1 , 2 ) Q ( 1 , 4 ) Q ( 1 , 5 ) Q ( 1 , 6 )
A2 Q ( 2 , 0 ) Q ( 2 , 1 ) Q ( 2 , 2 ) Q ( 2 , 4 ) Q ( 2 , 5 ) Q ( 2 , 6 )
A5 Q ( 5 , 0 ) Q ( 5 , 1 ) Q ( 5 , 2 ) Q ( 5 , 4 ) Q ( 5 , 5 ) Q ( 5 , 6 )
A6 Q ( 6 , 0 ) Q ( 6 , 1 ) Q ( 6 , 2 ) Q ( 6 , 4 ) Q ( 6 , 5 ) Q ( 6 , 6 )
Table 4. The test statistics of GA and its variants.
Table 4. The test statistics of GA and its variants.
ρ IGD
N82N82
Chi-square189.341Chi-square106.373
df7df7
Asymp. Sig.<0.001Asymp. Sig.<0.001
Table 5. The test statistics of DE and its variants.
Table 5. The test statistics of DE and its variants.
ρ IGD
N82N82
Chi-square129.741Chi-square103.324
df7df7
Asymp. Sig.<0.001Asymp. Sig.<0.001
Table 6. The test statistics of HS and its variants.
Table 6. The test statistics of HS and its variants.
ρ IGD
N82N82
Chi-square34.19Chi-square32.251
df7df7
Asymp. Sig.<0.001Asymp. Sig.<0.001
Table 7. The test statistics of seven algorithms.
Table 7. The test statistics of seven algorithms.
ρ IGD
N82N82
Chi-square329.497Chi-square320.138
df6df6
Asymp. Sig.<0.001Asymp. Sig.<0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, Z.; Jia, Q.; Gao, K.; Fu, Y.; Yin, L.; Sun, Q. Integrated Scheduling of Multi-Objective Job Shops and Material Handling Robots with Reinforcement Learning Guided Meta-Heuristics. Mathematics 2025, 13, 102. https://doi.org/10.3390/math13010102

AMA Style

Xu Z, Jia Q, Gao K, Fu Y, Yin L, Sun Q. Integrated Scheduling of Multi-Objective Job Shops and Material Handling Robots with Reinforcement Learning Guided Meta-Heuristics. Mathematics. 2025; 13(1):102. https://doi.org/10.3390/math13010102

Chicago/Turabian Style

Xu, Zhangying, Qi Jia, Kaizhou Gao, Yaping Fu, Li Yin, and Qiangqiang Sun. 2025. "Integrated Scheduling of Multi-Objective Job Shops and Material Handling Robots with Reinforcement Learning Guided Meta-Heuristics" Mathematics 13, no. 1: 102. https://doi.org/10.3390/math13010102

APA Style

Xu, Z., Jia, Q., Gao, K., Fu, Y., Yin, L., & Sun, Q. (2025). Integrated Scheduling of Multi-Objective Job Shops and Material Handling Robots with Reinforcement Learning Guided Meta-Heuristics. Mathematics, 13(1), 102. https://doi.org/10.3390/math13010102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop