CN117910748A

CN117910748A - Production line scheduling method and system based on deep neural network

Info

Publication number: CN117910748A
Application number: CN202410001969.4A
Authority: CN
Inventors: 韦慧玲; 张雨晨; 罗陆锋; 陈明猷; 王金海; 肖辉
Original assignee: Foshan University
Current assignee: Foshan University
Priority date: 2024-01-02
Filing date: 2024-01-02
Publication date: 2024-04-19

Abstract

The invention discloses a production line scheduling method and system based on a deep neural network, wherein the method comprises the following steps: acquiring a production order and constructing an intelligent scheduling system; defining a state representation vector of a production order and an action space of an intelligent scheduling system; performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network; and (3) combining the monitoring system with the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance. By using the method and the device, a new efficient scheduling strategy can be timely obtained according to the randomness change of the production order. The method and the system for scheduling the production line based on the deep neural network can be widely applied to the technical field of production line customization.

Description

Production line scheduling method and system based on deep neural network

Technical Field

The invention relates to the technical field of production line customization, in particular to a production line scheduling method and system based on a deep neural network.

Background

In the wave of the current manufacturing industry, personalized custom-made production lines become an important force leading innovation and meeting customer demands. The rise of this concept is indispensible from the increasing demand of the market for unique, customized products. The traditional mass production mode is difficult to meet the urgent requirements of consumers on product individuality, diversity and instantaneity, so that the individuality customizing factory gradually discloses the corner with the flexible and efficient manufacturing concept.

In a personalized custom production line, the scheduling problem is a complex challenge facing the manufacturing industry, and its background covers market demand change, development of digital technology, and high flexibility of production flow. In the existing personalized customized production line dispatching technical scheme, digital technology, artificial intelligence, machine learning and the like are widely applied to improve dispatching efficiency. However, these solutions face a number of challenges including timeliness issues, real-time monitoring challenges. The uncertainty of the order demands makes the scheduling plan change frequently, so that the prior art is difficult to adapt to the change rapidly, and therefore, the timeliness problem is obvious, namely, the prior art is difficult to make corresponding scheduling according to the demand change of the order in time.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a production line scheduling method and system based on a deep neural network, which can timely acquire a new efficient scheduling strategy according to the randomness change of a production order.

The first technical scheme adopted by the invention is as follows: a production line scheduling method based on a deep neural network comprises the following steps:

acquiring a production order and constructing an intelligent scheduling system;

defining a state representation vector of a production order and an action space of an intelligent scheduling system;

Performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network;

and (3) combining the monitoring system with the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.

Further, the steps of acquiring the production order and constructing the intelligent scheduling system specifically comprise:

Acquiring a production order based on a production line;

and constructing an intelligent scheduling system, wherein the intelligent scheduling system is used for realizing information interaction between the production line and the deep Q network.

Further, the step of defining a state representation vector of the production order and an action space of the intelligent scheduling system specifically includes:

defining a status representation vector of the production order, the status representation vector including a device status, a robotic arm status, an AGV status, and a task status;

and defining an action space of the intelligent scheduling system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action.

Further, the step of performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network specifically includes:

carrying out data preprocessing on the state representation vector of the production order to obtain a preprocessed state representation vector;

Inputting the preprocessed state representation vector into a deep Q network, and outputting an action corresponding to the maximum Q value through a greedy strategy;

The production line executes the action corresponding to the maximum Q value to obtain the next time state expression vector of the production order and execute action rewards;

Combining an action space of the intelligent scheduling system, a preprocessed state representing vector, a next time state representing vector of a production order and an execution action reward to construct an experience playback buffer zone;

Extracting the experience playback buffer zone, and training the deep Q network based on an extraction result to obtain a target Q value;

Calculating the mean square error between the maximum Q value and the target Q value and updating the parameters of the depth Q network;

the method comprises the steps of obtaining actions corresponding to the maximum Q value of a cycle, executing actions corresponding to the maximum Q value of a production line, constructing an experience playback buffer zone, extracting the experience playback buffer zone and updating parameters of a depth Q network until the cycle times meet preset times, and obtaining the trained depth Q network.

Further, the calculation expression for executing the action rewards is specifically as follows:

R(s,a)＝λ₁·R_equipment(s,a)+λ₂·R_{robot_am}(s,a)+λ₃·E_AGV(s,a)

In the above formula, lambda ₁、λ₂ and lambda ₃ represent super parameters, R _equipment (s, a) represents the ratio of the operating time of the equipment to the total time by using the rewards calculation equipment, R _{robot_am} (s, a) represents the ratio of the operating time of the mechanical arm to the total time by using the rewards calculation mechanical arm, R _AGV (s, a) represents the ratio of the operating time of the AGV to the total time by using the rewards calculation AGV, R (s, a) represents the current rewards obtained by executing the action rewards function, s represents the state, and a represents the action.

Further, the calculation expression of the target Q value is specifically as follows:

Q_target(s,a)＝R(s,a)+γ·max_a'Q(s',a'；θ)

In the above equation, Q _target (s, a) represents a target Q value, represents an expected jackpot for selecting action a in state s, R (s, a) represents a currently obtained prize, γ represents a discount factor, max _a' Q (s ', a'; θ) represents a maximum Q value for selecting action a 'in the next state s', and θ represents a network parameter.

Further, the parameter updating of the depth Q network is performed by a gradient descent minimization loss function, and the expression of the gradient descent minimization loss function is specifically shown as follows:

In the above formula, θ represents a network parameter, α represents a learning rate, Representing the gradient to the network parameter θ, Q _target (s, a) represents the target Q value, Q (s ', a'; θ) represents the predicted value of the current Q network for state s and action a.

The second technical scheme adopted by the invention is as follows: a deep neural network based production line scheduling system, comprising:

The first module is used for acquiring a production order and constructing an intelligent scheduling system;

A second module for defining a state representation vector of the production order and an action space of the intelligent scheduling system;

The third module is used for performing iterative interactive training on the depth Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained depth Q network;

and the fourth module is used for combining the monitoring system and the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.

The method and the system have the beneficial effects that: according to the invention, the production order is acquired, the intelligent scheduling system is constructed, the state expression vector of the production order and the action space of the intelligent scheduling system are defined, the production flow of the production line order and the running condition of each device are considered, the change of the scheduling plan of the order can be timely acquired, the deep Q network is subjected to iterative interactive training, the trained deep Q network is obtained, the task scheduling can be more effectively carried out, finally the monitoring system and the trained deep Q network are combined and deployed to the production line for scheduling guidance of the production order, the monitoring system is constructed, the cooperative scheduling strategy of the deep Q network, other devices and an AGV (automatic guided vehicle) can be adjusted in real time by combining the deep Q network, so that the efficient running of the production line is maintained, and the new efficient scheduling strategy can be timely acquired according to the random change of the production order.

Drawings

FIG. 1 is a flow chart of steps of a method for scheduling a production line based on a deep neural network of the present invention;

FIG. 2 is a block diagram of a deep neural network based production line scheduling system of the present invention;

FIG. 3 is a training flow chart of the deep Q network according to an embodiment of the present invention;

FIG. 4 is a flow chart illustrating steps for implementing deep Q-network based line scheduling in accordance with the present invention.

Detailed Description

The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Referring to fig. 1 and 4, the present invention provides a method for scheduling a production line based on a deep neural network, the method comprising the steps of:

S1, acquiring a production order and constructing an intelligent scheduling system;

Specifically, a production order is acquired based on a production line; and constructing an intelligent scheduling system, wherein the intelligent scheduling system is used for realizing information interaction between the production line and the deep Q network.

In this embodiment, the production line generates a custom production order according to customer requirements, including product types, quantity, delivery deadlines, etc., and it should be noted that, the production task is started to be generated according to the order system, and the product production requirements in the order are first classified into two types, namely standard production and special assembly. Standard production tasks typically involve conventional manufacturing flows, while special assembly task systems require additional robotic arms and equipment to work in concert.

S2, defining a state representation vector of a production order and an action space of the intelligent scheduling system;

Specifically, a status representation vector of a production order is defined, the status representation vector including a device status, a robotic arm status, an AGV status, and a task status; and defining an action space of the intelligent scheduling system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action.

In this embodiment, according to the production order definition state representation, an action space of the intelligent scheduling system is defined, including a task allocation action, a mechanical arm scheduling action and an AGV scheduling action, where the operation status of each device is considered by the device state, the mechanical arm state includes a busy state and a work progress of each mechanical arm, the AGV state includes a position and a loading state of the AGV, and the task state includes a remaining time of a task to be processed and a completion state of a special assembly task.

It should be noted that, in some embodiments, the status representation is defined according to the production order, including a device status, a robot status, an AGV status, and a task status, where the device status represents an operation status of each production device, whether the device is idle, and whether the device is in a fault status, for example, a device status a= [0,1,2, …,0], where each element corresponds to a device status, 0 represents idle, 1 represents in operation, and 2 represents a fault. The arm state considers the busy and idle state of each arm, such as arm state b= [0,1 … ], wherein 1 represents that the arm is busy and 0 represents that the arm is idle; the AGV state comprises information such as the position of the AGV, the cargo carrying state and the like, and if the AGV state c= [2,1,0, … ] indicates that the AGV is located at the position 2 and carries cargoes; the task state represents information of a task to be processed, such as the remaining time of the task, the completion state of a special assembly task, and the like, and the task state d= [5,0 … ] represents that the current task also needs 5 time units to complete.

Defining an action space of a system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action, the task allocation action allocates a task to be processed to specific production equipment, and the task allocation action e= [0,1 … ] represents that the task is selectively allocated to second production equipment; the mechanical arm dispatching action comprises dispatching of mechanical arms, such as selecting which mechanical arm to execute a task, and the mechanical arm dispatching action f= [1,0 … ] represents selecting the first mechanical arm to execute the task; the AGV scheduling work involves the scheduling of AGVs, including specifying a transport for the AGVs such as AGV scheduling actions g= [0,1,2, … ], indicating that the AGVs are not selected to be assigned at position 1, that the AGVs are selected to be assigned to position 2 but not to have the load, and that the AGVs are selected to be assigned to position 3 and to have the load.

S3, performing iterative interactive training on the depth Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained depth Q network;

Specifically, as shown in fig. 3, a deep neural network is adopted, a Convolutional Neural Network (CNN) or a fully connected neural network is selected, a state (a production order definition state) is taken as an input, and a Q value of each possible action is output; the selected action is performed, the rewards and next state of the environmental feedback are observed, and the tuples (state, action, rewards, next state) are stored in an experience playback buffer. The network learns historical data and real-time states, wherein the historical data represents the historical data, namely the operation data of the production line in the past period of time is collected, the operation data comprise equipment states, mechanical arm actions, AGV task allocation and related rewarding information, optimal actions are selected for each state, and a rewarding function is designed; and an experience playback mechanism is introduced, so that the deep Q network can learn more robustly and adapt to scheduling problems in different scenes, and the performance of the system is improved.

S31, carrying out data preprocessing on the state representation vector of the production order to obtain a preprocessed state representation vector;

S32, inputting the preprocessed state representation vector into a deep Q network, and outputting an action corresponding to the maximum Q value through a greedy strategy;

specifically, parameters and environments of the depth Q network are initialized, an initial state is obtained, actions are selected by using the depth Q network according to the current state, an epsilon-greedy strategy is used, namely actions are randomly selected with epsilon probability, and actions with the largest Q value are selected with 1-epsilon probability.

S33, the production line executes the action corresponding to the maximum Q value to obtain the next time state expression vector of the production order and execute action rewards;

specifically, a deep neural network is used, the state is taken as an input, the Q value of each possible action is output, for example, the input state is a vector s, the output is the Q value of each action a, the Q value is expressed as Q (s, a), and the network parameter of the deep Q network is expressed by θ. The network selects the optimal actions for each state by learning the historical data and the real-time state and designs a reward function R (s, a), wherein the expression of the reward function is specifically as follows:

R(s,a)＝λ₁·R_equipment(s,a)+λ₂·R_{robot_am}(s,a)+λ₃·R_AGV(s,a)

In the above formula, lambda ₁、λ₂ and lambda ₃ represent super parameters, E _equipment (s, a) represents the ratio of the operating time of the equipment to the total time by using the rewards calculation equipment, R _{robot_am} (s, a) represents the ratio of the operating time of the mechanical arm to the total time by using the rewards calculation mechanical arm, R _AGV (s, a) represents the ratio of the operating time of the AGV to the total time by using the rewards calculation AGV, R (s, a) represents the current rewards obtained by executing the action rewards function, s represents the state, and a represents the action.

It should be noted that, in this embodiment, for the ratio of the working time to the total time of the computing device, specifically, the time t _equipment of the actual working of the computing device after executing the action a is recorded, the computing device uses the reward Lambda _equipment is a weight super parameter of the device using rewards, and the value of the weight super parameter is between 0 and 1.

For calculating the ratio of the working time to the total time of the mechanical arm, specifically, recording the time t _{robot_am} of the actual working of the mechanical arm after executing the action a, calculating the utilization rewards of the mechanical armLambda _{robot_am} is a weight super parameter of the mechanical arm using rewards, and the value of the weight super parameter is between 0 and 1.

For calculating the ratio of the working time to the total time of the AGV, specifically, recording the actual working time t _AGV of the AGV after executing the action a, calculating the utilization rewards of the AGVLambda _AGV is a weight excess parameter of AGV using rewards, and the value is between 0 and 1.

It should be further noted that the rewards utilized by the equipment, the mechanical arm and the AGV are calculated, and the ratio of the working time to the total time of the equipment, the mechanical arm and the AGV is calculated to evaluate the utilization efficiency, and the more fully utilized, the higher the rewards.

S34, combining an action space of the intelligent scheduling system, the preprocessed state representing vector, the next time state representing vector of the production order and execution action rewards to construct an experience playback buffer zone;

specifically, the selected action is performed, the rewards and next state of the environmental feedback are observed, and the tuples of (state, action, rewards, next state) are stored in the experience playback buffer.

S35, extracting the experience playback buffer area, and training the deep Q network based on an extraction result to obtain a target Q value;

Specifically, a batch of experience is randomly sampled from the experience playback buffer zone and is used for training a depth Q network, so that the utilization efficiency of samples and the training stability are improved, and a target Q value is calculated by using the depth Q network, wherein the calculation formula is as follows:

Q_target(s,a)＝R(s,a)+γ·max_a'Q(s',a'；θ)

In the above equation, Q _target (s, a) represents a target Q value, represents an expected jackpot for selecting action a in state s, R (s, α) represents a currently obtained prize, γ represents a discount factor, max _a' Q (s ', a'; θ) represents a maximum Q value for selecting action a 'in the next state s', and θ represents a network parameter.

In this embodiment, the action is selected using the depth Q network according to the current state, an epsilon-greedy strategy is used, one action is randomly selected with epsilon probability by the epsilon-greedy strategy, the currently estimated best action is selected with 1-epsilon probability, the selected action is executed, interaction with the environment is performed, feedback of the obtained rewards and the next state is stored in the experience playback buffer, the experience playback buffer stores the current experience and the previous experience for training in the subsequent experience playback, and a batch of experiences are randomly sampled from the experience playback buffer for training the depth Q network.

S36, calculating the mean square error between the maximum Q value and the target Q value and updating parameters of the depth Q network;

S37, performing action acquisition step corresponding to the maximum Q value of the circulation, performing action step corresponding to the maximum Q value of the production line, constructing an experience playback buffer zone, performing extraction processing step on the experience playback buffer zone and updating parameters of the depth Q network until the circulation times meet the preset times, and obtaining the trained depth Q network.

Specifically, the parameters of the depth Q network are updated using the mean square error of the target Q value and the current Q value, and the loss function is minimized by gradient descent, and the formula of the gradient descent minimization loss function is as follows:

Steps S31 to S36 are repeatedly performed to continuously interact with the environment, learn, and optimize the deep Q network. As training progresses, the value of epsilon is gradually reduced, making the deep Q network more prone to select the best action learned after training.

In this embodiment, the status information is used as a feature of the input layer, and the status may include a device status, a robot status, an AGV status, a task status, and the like, and these status information are input into the neural network through appropriate data preprocessing so that the network can understand and learn the relationship therebetween. The number of nodes of the output layer is equal to the number of possible actions, each node representing the Q value of the corresponding action. Decision making is performed by selecting a specific action, namely selecting a node with the highest Q value; the rewards functions include a production efficiency rewards that obtains a positive rewards by reducing production time, a resource utilization rewards, and a cost rewards. The system encourages the deep Q network to select actions which can complete tasks faster so as to improve the efficiency of the whole production line; the resource utilization rewards encourage full utilization of the equipment, robotic arms, and AGVs. By selecting the action of fully utilizing the resources, the system can more effectively schedule the tasks, and idle waste of the resources is avoided; the cost rewards consider costs in the task execution process, such as energy consumption, equipment maintenance costs, and the like. The experience playback mechanism is to accumulate historical experiences continuously in the running process of the system, store each state, selected actions and obtained rewards to form an experience pool, randomly extract a batch of samples from the experience pool, avoid the model to learn only the latest data, and finally use the sampled data to train the network, so that the network can learn the relation between the states and the actions better, and the generalization capability of the network is improved.

And S4, combining the monitoring system and the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.

Specifically, a real-time monitoring system is established for periodically evaluating scheduling performance, and monitoring the operation state of production equipment according to monitoring results, including whether normal operation and failure occur; detecting the busy and idle state of the mechanical arm, namely, whether the mechanical arm is executing a task or not and is in a waiting state or not; monitoring the position and the cargo state of the AGV and whether the AGV moves according to a scheduling task or not; task state: tracking the execution condition of a task to be processed, including the information of the residual processing time, whether the task to be processed is completed or not and the like; the overall production efficiency is estimated by monitoring indexes such as actual production speed, task completion time and the like; the system can adjust the cooperative scheduling strategy of the deep Q network, other devices and AGVs in real time so as to keep the efficient operation of the production line.

It should be noted that, the embodiment of the invention adopts the monitoring system capable of collecting the information of the states of each component, the task execution condition, the production efficiency and the like in real time, and the monitoring system comprises the application related to the sensor, the data acquisition system and the visualization tool, and periodically evaluates the current scheduling performance to check whether the potential problem or bottleneck exists, which can be realized by comparing the difference between the actual production condition and the expected target, and finally based on the monitoring result, the system can adjust the cooperative scheduling strategy of the deep Q network, other devices and the AGV in real time. This includes dynamically adjusting task allocation, device coordination rules, or resource sharing policies to accommodate changes in the production environment.

In some embodiments, the key parameters to be monitored, such as the running state of each assembly device, the working progress of the mechanical arm, the position of the AGV, and the design specification of the product, are identified, and the sensors and devices suitable for furniture assembly, such as the mounting position sensor, the load sensor, the visual sensor, and the like, are selected to monitor the states of the devices and the products, and are mounted on the related devices to ensure that the devices and the products can accurately reflect the key information in the production process, such as the visual sensor is mounted on the mechanical arm for visual identification of the parts. SCADA software capable of displaying equipment states, product design drawings and assembly progress in real time is selected, monitoring software is configured, sensor connection information, monitoring parameters and data updating frequency are set, and the software is ensured to receive and process sensor data in real time and display the product design drawings. Through such real-time monitoring system, the manager of product assembly line can track the production progress at any time, inspect furniture specification, in time discover and solve potential problem, ensures that the personalized customization process of furniture is high-efficient and accurate.

Referring to fig. 2, a deep neural network-based production line scheduling system includes:

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. The production line scheduling method based on the deep neural network is characterized by comprising the following steps of:

acquiring a production order and constructing an intelligent scheduling system;

2. The method for scheduling production lines based on deep neural network according to claim 1, wherein the steps of obtaining production orders and constructing an intelligent scheduling system specifically comprise:

Acquiring a production order based on a production line;

3. The method for scheduling production lines based on deep neural network according to claim 2, wherein the step of defining the state representation vector of the production order and the action space of the intelligent scheduling system specifically comprises the following steps:

4. The method for scheduling a production line based on a deep neural network according to claim 3, wherein the step of performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain the trained deep Q network specifically comprises the following steps:

5. The method for scheduling production lines based on deep neural network according to claim 4, wherein the calculation expression for executing action rewards is specifically as follows:

R(s,a)＝λ₁·R_equipment(s,a)+λ₂·R_{robot_am}(s,a)+λ₃·R_AGV(s,a)

6. The method for scheduling a production line based on a deep neural network according to claim 4, wherein the calculation expression of the target Q value is specifically as follows:

Q_target(s,a)＝R(s,a)+γ·max_a'Q(s',a'；θ)

7. The method for scheduling production lines based on deep neural networks according to claim 4, wherein the parameter updating of the deep Q network is performed by a gradient descent minimization loss function, and the expression of the gradient descent minimization loss function is specifically as follows:

8. A deep neural network-based production line scheduling system, comprising the following modules: