[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117910748A - Production line scheduling method and system based on deep neural network - Google Patents

Production line scheduling method and system based on deep neural network Download PDF

Info

Publication number
CN117910748A
CN117910748A CN202410001969.4A CN202410001969A CN117910748A CN 117910748 A CN117910748 A CN 117910748A CN 202410001969 A CN202410001969 A CN 202410001969A CN 117910748 A CN117910748 A CN 117910748A
Authority
CN
China
Prior art keywords
network
action
deep
production
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410001969.4A
Other languages
Chinese (zh)
Inventor
韦慧玲
张雨晨
罗陆锋
陈明猷
王金海
肖辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN202410001969.4A priority Critical patent/CN117910748A/en
Publication of CN117910748A publication Critical patent/CN117910748A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a production line scheduling method and system based on a deep neural network, wherein the method comprises the following steps: acquiring a production order and constructing an intelligent scheduling system; defining a state representation vector of a production order and an action space of an intelligent scheduling system; performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network; and (3) combining the monitoring system with the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance. By using the method and the device, a new efficient scheduling strategy can be timely obtained according to the randomness change of the production order. The method and the system for scheduling the production line based on the deep neural network can be widely applied to the technical field of production line customization.

Description

Production line scheduling method and system based on deep neural network
Technical Field
The invention relates to the technical field of production line customization, in particular to a production line scheduling method and system based on a deep neural network.
Background
In the wave of the current manufacturing industry, personalized custom-made production lines become an important force leading innovation and meeting customer demands. The rise of this concept is indispensible from the increasing demand of the market for unique, customized products. The traditional mass production mode is difficult to meet the urgent requirements of consumers on product individuality, diversity and instantaneity, so that the individuality customizing factory gradually discloses the corner with the flexible and efficient manufacturing concept.
In a personalized custom production line, the scheduling problem is a complex challenge facing the manufacturing industry, and its background covers market demand change, development of digital technology, and high flexibility of production flow. In the existing personalized customized production line dispatching technical scheme, digital technology, artificial intelligence, machine learning and the like are widely applied to improve dispatching efficiency. However, these solutions face a number of challenges including timeliness issues, real-time monitoring challenges. The uncertainty of the order demands makes the scheduling plan change frequently, so that the prior art is difficult to adapt to the change rapidly, and therefore, the timeliness problem is obvious, namely, the prior art is difficult to make corresponding scheduling according to the demand change of the order in time.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a production line scheduling method and system based on a deep neural network, which can timely acquire a new efficient scheduling strategy according to the randomness change of a production order.
The first technical scheme adopted by the invention is as follows: a production line scheduling method based on a deep neural network comprises the following steps:
acquiring a production order and constructing an intelligent scheduling system;
defining a state representation vector of a production order and an action space of an intelligent scheduling system;
Performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network;
and (3) combining the monitoring system with the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.
Further, the steps of acquiring the production order and constructing the intelligent scheduling system specifically comprise:
Acquiring a production order based on a production line;
and constructing an intelligent scheduling system, wherein the intelligent scheduling system is used for realizing information interaction between the production line and the deep Q network.
Further, the step of defining a state representation vector of the production order and an action space of the intelligent scheduling system specifically includes:
defining a status representation vector of the production order, the status representation vector including a device status, a robotic arm status, an AGV status, and a task status;
and defining an action space of the intelligent scheduling system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action.
Further, the step of performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network specifically includes:
carrying out data preprocessing on the state representation vector of the production order to obtain a preprocessed state representation vector;
Inputting the preprocessed state representation vector into a deep Q network, and outputting an action corresponding to the maximum Q value through a greedy strategy;
The production line executes the action corresponding to the maximum Q value to obtain the next time state expression vector of the production order and execute action rewards;
Combining an action space of the intelligent scheduling system, a preprocessed state representing vector, a next time state representing vector of a production order and an execution action reward to construct an experience playback buffer zone;
Extracting the experience playback buffer zone, and training the deep Q network based on an extraction result to obtain a target Q value;
Calculating the mean square error between the maximum Q value and the target Q value and updating the parameters of the depth Q network;
the method comprises the steps of obtaining actions corresponding to the maximum Q value of a cycle, executing actions corresponding to the maximum Q value of a production line, constructing an experience playback buffer zone, extracting the experience playback buffer zone and updating parameters of a depth Q network until the cycle times meet preset times, and obtaining the trained depth Q network.
Further, the calculation expression for executing the action rewards is specifically as follows:
R(s,a)=λ1·Requipment(s,a)+λ2·Rrobot_am(s,a)+λ3·EAGV(s,a)
In the above formula, lambda 1、λ2 and lambda 3 represent super parameters, R equipment (s, a) represents the ratio of the operating time of the equipment to the total time by using the rewards calculation equipment, R robot_am (s, a) represents the ratio of the operating time of the mechanical arm to the total time by using the rewards calculation mechanical arm, R AGV (s, a) represents the ratio of the operating time of the AGV to the total time by using the rewards calculation AGV, R (s, a) represents the current rewards obtained by executing the action rewards function, s represents the state, and a represents the action.
Further, the calculation expression of the target Q value is specifically as follows:
Qtarget(s,a)=R(s,a)+γ·maxa'Q(s',a';θ)
In the above equation, Q target (s, a) represents a target Q value, represents an expected jackpot for selecting action a in state s, R (s, a) represents a currently obtained prize, γ represents a discount factor, max a' Q (s ', a'; θ) represents a maximum Q value for selecting action a 'in the next state s', and θ represents a network parameter.
Further, the parameter updating of the depth Q network is performed by a gradient descent minimization loss function, and the expression of the gradient descent minimization loss function is specifically shown as follows:
In the above formula, θ represents a network parameter, α represents a learning rate, Representing the gradient to the network parameter θ, Q target (s, a) represents the target Q value, Q (s ', a'; θ) represents the predicted value of the current Q network for state s and action a.
The second technical scheme adopted by the invention is as follows: a deep neural network based production line scheduling system, comprising:
The first module is used for acquiring a production order and constructing an intelligent scheduling system;
A second module for defining a state representation vector of the production order and an action space of the intelligent scheduling system;
The third module is used for performing iterative interactive training on the depth Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained depth Q network;
and the fourth module is used for combining the monitoring system and the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.
The method and the system have the beneficial effects that: according to the invention, the production order is acquired, the intelligent scheduling system is constructed, the state expression vector of the production order and the action space of the intelligent scheduling system are defined, the production flow of the production line order and the running condition of each device are considered, the change of the scheduling plan of the order can be timely acquired, the deep Q network is subjected to iterative interactive training, the trained deep Q network is obtained, the task scheduling can be more effectively carried out, finally the monitoring system and the trained deep Q network are combined and deployed to the production line for scheduling guidance of the production order, the monitoring system is constructed, the cooperative scheduling strategy of the deep Q network, other devices and an AGV (automatic guided vehicle) can be adjusted in real time by combining the deep Q network, so that the efficient running of the production line is maintained, and the new efficient scheduling strategy can be timely acquired according to the random change of the production order.
Drawings
FIG. 1 is a flow chart of steps of a method for scheduling a production line based on a deep neural network of the present invention;
FIG. 2 is a block diagram of a deep neural network based production line scheduling system of the present invention;
FIG. 3 is a training flow chart of the deep Q network according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating steps for implementing deep Q-network based line scheduling in accordance with the present invention.
Detailed Description
The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
Referring to fig. 1 and 4, the present invention provides a method for scheduling a production line based on a deep neural network, the method comprising the steps of:
S1, acquiring a production order and constructing an intelligent scheduling system;
Specifically, a production order is acquired based on a production line; and constructing an intelligent scheduling system, wherein the intelligent scheduling system is used for realizing information interaction between the production line and the deep Q network.
In this embodiment, the production line generates a custom production order according to customer requirements, including product types, quantity, delivery deadlines, etc., and it should be noted that, the production task is started to be generated according to the order system, and the product production requirements in the order are first classified into two types, namely standard production and special assembly. Standard production tasks typically involve conventional manufacturing flows, while special assembly task systems require additional robotic arms and equipment to work in concert.
S2, defining a state representation vector of a production order and an action space of the intelligent scheduling system;
Specifically, a status representation vector of a production order is defined, the status representation vector including a device status, a robotic arm status, an AGV status, and a task status; and defining an action space of the intelligent scheduling system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action.
In this embodiment, according to the production order definition state representation, an action space of the intelligent scheduling system is defined, including a task allocation action, a mechanical arm scheduling action and an AGV scheduling action, where the operation status of each device is considered by the device state, the mechanical arm state includes a busy state and a work progress of each mechanical arm, the AGV state includes a position and a loading state of the AGV, and the task state includes a remaining time of a task to be processed and a completion state of a special assembly task.
It should be noted that, in some embodiments, the status representation is defined according to the production order, including a device status, a robot status, an AGV status, and a task status, where the device status represents an operation status of each production device, whether the device is idle, and whether the device is in a fault status, for example, a device status a= [0,1,2, …,0], where each element corresponds to a device status, 0 represents idle, 1 represents in operation, and 2 represents a fault. The arm state considers the busy and idle state of each arm, such as arm state b= [0,1 … ], wherein 1 represents that the arm is busy and 0 represents that the arm is idle; the AGV state comprises information such as the position of the AGV, the cargo carrying state and the like, and if the AGV state c= [2,1,0, … ] indicates that the AGV is located at the position 2 and carries cargoes; the task state represents information of a task to be processed, such as the remaining time of the task, the completion state of a special assembly task, and the like, and the task state d= [5,0 … ] represents that the current task also needs 5 time units to complete.
Defining an action space of a system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action, the task allocation action allocates a task to be processed to specific production equipment, and the task allocation action e= [0,1 … ] represents that the task is selectively allocated to second production equipment; the mechanical arm dispatching action comprises dispatching of mechanical arms, such as selecting which mechanical arm to execute a task, and the mechanical arm dispatching action f= [1,0 … ] represents selecting the first mechanical arm to execute the task; the AGV scheduling work involves the scheduling of AGVs, including specifying a transport for the AGVs such as AGV scheduling actions g= [0,1,2, … ], indicating that the AGVs are not selected to be assigned at position 1, that the AGVs are selected to be assigned to position 2 but not to have the load, and that the AGVs are selected to be assigned to position 3 and to have the load.
S3, performing iterative interactive training on the depth Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained depth Q network;
Specifically, as shown in fig. 3, a deep neural network is adopted, a Convolutional Neural Network (CNN) or a fully connected neural network is selected, a state (a production order definition state) is taken as an input, and a Q value of each possible action is output; the selected action is performed, the rewards and next state of the environmental feedback are observed, and the tuples (state, action, rewards, next state) are stored in an experience playback buffer. The network learns historical data and real-time states, wherein the historical data represents the historical data, namely the operation data of the production line in the past period of time is collected, the operation data comprise equipment states, mechanical arm actions, AGV task allocation and related rewarding information, optimal actions are selected for each state, and a rewarding function is designed; and an experience playback mechanism is introduced, so that the deep Q network can learn more robustly and adapt to scheduling problems in different scenes, and the performance of the system is improved.
S31, carrying out data preprocessing on the state representation vector of the production order to obtain a preprocessed state representation vector;
S32, inputting the preprocessed state representation vector into a deep Q network, and outputting an action corresponding to the maximum Q value through a greedy strategy;
specifically, parameters and environments of the depth Q network are initialized, an initial state is obtained, actions are selected by using the depth Q network according to the current state, an epsilon-greedy strategy is used, namely actions are randomly selected with epsilon probability, and actions with the largest Q value are selected with 1-epsilon probability.
S33, the production line executes the action corresponding to the maximum Q value to obtain the next time state expression vector of the production order and execute action rewards;
specifically, a deep neural network is used, the state is taken as an input, the Q value of each possible action is output, for example, the input state is a vector s, the output is the Q value of each action a, the Q value is expressed as Q (s, a), and the network parameter of the deep Q network is expressed by θ. The network selects the optimal actions for each state by learning the historical data and the real-time state and designs a reward function R (s, a), wherein the expression of the reward function is specifically as follows:
R(s,a)=λ1·Requipment(s,a)+λ2·Rrobot_am(s,a)+λ3·RAGV(s,a)
In the above formula, lambda 1、λ2 and lambda 3 represent super parameters, E equipment (s, a) represents the ratio of the operating time of the equipment to the total time by using the rewards calculation equipment, R robot_am (s, a) represents the ratio of the operating time of the mechanical arm to the total time by using the rewards calculation mechanical arm, R AGV (s, a) represents the ratio of the operating time of the AGV to the total time by using the rewards calculation AGV, R (s, a) represents the current rewards obtained by executing the action rewards function, s represents the state, and a represents the action.
It should be noted that, in this embodiment, for the ratio of the working time to the total time of the computing device, specifically, the time t equipment of the actual working of the computing device after executing the action a is recorded, the computing device uses the reward Lambda equipment is a weight super parameter of the device using rewards, and the value of the weight super parameter is between 0 and 1.
For calculating the ratio of the working time to the total time of the mechanical arm, specifically, recording the time t robot_am of the actual working of the mechanical arm after executing the action a, calculating the utilization rewards of the mechanical armLambda robot_am is a weight super parameter of the mechanical arm using rewards, and the value of the weight super parameter is between 0 and 1.
For calculating the ratio of the working time to the total time of the AGV, specifically, recording the actual working time t AGV of the AGV after executing the action a, calculating the utilization rewards of the AGVLambda AGV is a weight excess parameter of AGV using rewards, and the value is between 0 and 1.
It should be further noted that the rewards utilized by the equipment, the mechanical arm and the AGV are calculated, and the ratio of the working time to the total time of the equipment, the mechanical arm and the AGV is calculated to evaluate the utilization efficiency, and the more fully utilized, the higher the rewards.
S34, combining an action space of the intelligent scheduling system, the preprocessed state representing vector, the next time state representing vector of the production order and execution action rewards to construct an experience playback buffer zone;
specifically, the selected action is performed, the rewards and next state of the environmental feedback are observed, and the tuples of (state, action, rewards, next state) are stored in the experience playback buffer.
S35, extracting the experience playback buffer area, and training the deep Q network based on an extraction result to obtain a target Q value;
Specifically, a batch of experience is randomly sampled from the experience playback buffer zone and is used for training a depth Q network, so that the utilization efficiency of samples and the training stability are improved, and a target Q value is calculated by using the depth Q network, wherein the calculation formula is as follows:
Qtarget(s,a)=R(s,a)+γ·maxa'Q(s',a';θ)
In the above equation, Q target (s, a) represents a target Q value, represents an expected jackpot for selecting action a in state s, R (s, α) represents a currently obtained prize, γ represents a discount factor, max a' Q (s ', a'; θ) represents a maximum Q value for selecting action a 'in the next state s', and θ represents a network parameter.
In this embodiment, the action is selected using the depth Q network according to the current state, an epsilon-greedy strategy is used, one action is randomly selected with epsilon probability by the epsilon-greedy strategy, the currently estimated best action is selected with 1-epsilon probability, the selected action is executed, interaction with the environment is performed, feedback of the obtained rewards and the next state is stored in the experience playback buffer, the experience playback buffer stores the current experience and the previous experience for training in the subsequent experience playback, and a batch of experiences are randomly sampled from the experience playback buffer for training the depth Q network.
S36, calculating the mean square error between the maximum Q value and the target Q value and updating parameters of the depth Q network;
S37, performing action acquisition step corresponding to the maximum Q value of the circulation, performing action step corresponding to the maximum Q value of the production line, constructing an experience playback buffer zone, performing extraction processing step on the experience playback buffer zone and updating parameters of the depth Q network until the circulation times meet the preset times, and obtaining the trained depth Q network.
Specifically, the parameters of the depth Q network are updated using the mean square error of the target Q value and the current Q value, and the loss function is minimized by gradient descent, and the formula of the gradient descent minimization loss function is as follows:
In the above formula, θ represents a network parameter, α represents a learning rate, Representing the gradient to the network parameter θ, Q target (s, a) represents the target Q value, Q (s ', a'; θ) represents the predicted value of the current Q network for state s and action a.
Steps S31 to S36 are repeatedly performed to continuously interact with the environment, learn, and optimize the deep Q network. As training progresses, the value of epsilon is gradually reduced, making the deep Q network more prone to select the best action learned after training.
In this embodiment, the status information is used as a feature of the input layer, and the status may include a device status, a robot status, an AGV status, a task status, and the like, and these status information are input into the neural network through appropriate data preprocessing so that the network can understand and learn the relationship therebetween. The number of nodes of the output layer is equal to the number of possible actions, each node representing the Q value of the corresponding action. Decision making is performed by selecting a specific action, namely selecting a node with the highest Q value; the rewards functions include a production efficiency rewards that obtains a positive rewards by reducing production time, a resource utilization rewards, and a cost rewards. The system encourages the deep Q network to select actions which can complete tasks faster so as to improve the efficiency of the whole production line; the resource utilization rewards encourage full utilization of the equipment, robotic arms, and AGVs. By selecting the action of fully utilizing the resources, the system can more effectively schedule the tasks, and idle waste of the resources is avoided; the cost rewards consider costs in the task execution process, such as energy consumption, equipment maintenance costs, and the like. The experience playback mechanism is to accumulate historical experiences continuously in the running process of the system, store each state, selected actions and obtained rewards to form an experience pool, randomly extract a batch of samples from the experience pool, avoid the model to learn only the latest data, and finally use the sampled data to train the network, so that the network can learn the relation between the states and the actions better, and the generalization capability of the network is improved.
And S4, combining the monitoring system and the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.
Specifically, a real-time monitoring system is established for periodically evaluating scheduling performance, and monitoring the operation state of production equipment according to monitoring results, including whether normal operation and failure occur; detecting the busy and idle state of the mechanical arm, namely, whether the mechanical arm is executing a task or not and is in a waiting state or not; monitoring the position and the cargo state of the AGV and whether the AGV moves according to a scheduling task or not; task state: tracking the execution condition of a task to be processed, including the information of the residual processing time, whether the task to be processed is completed or not and the like; the overall production efficiency is estimated by monitoring indexes such as actual production speed, task completion time and the like; the system can adjust the cooperative scheduling strategy of the deep Q network, other devices and AGVs in real time so as to keep the efficient operation of the production line.
It should be noted that, the embodiment of the invention adopts the monitoring system capable of collecting the information of the states of each component, the task execution condition, the production efficiency and the like in real time, and the monitoring system comprises the application related to the sensor, the data acquisition system and the visualization tool, and periodically evaluates the current scheduling performance to check whether the potential problem or bottleneck exists, which can be realized by comparing the difference between the actual production condition and the expected target, and finally based on the monitoring result, the system can adjust the cooperative scheduling strategy of the deep Q network, other devices and the AGV in real time. This includes dynamically adjusting task allocation, device coordination rules, or resource sharing policies to accommodate changes in the production environment.
In some embodiments, the key parameters to be monitored, such as the running state of each assembly device, the working progress of the mechanical arm, the position of the AGV, and the design specification of the product, are identified, and the sensors and devices suitable for furniture assembly, such as the mounting position sensor, the load sensor, the visual sensor, and the like, are selected to monitor the states of the devices and the products, and are mounted on the related devices to ensure that the devices and the products can accurately reflect the key information in the production process, such as the visual sensor is mounted on the mechanical arm for visual identification of the parts. SCADA software capable of displaying equipment states, product design drawings and assembly progress in real time is selected, monitoring software is configured, sensor connection information, monitoring parameters and data updating frequency are set, and the software is ensured to receive and process sensor data in real time and display the product design drawings. Through such real-time monitoring system, the manager of product assembly line can track the production progress at any time, inspect furniture specification, in time discover and solve potential problem, ensures that the personalized customization process of furniture is high-efficient and accurate.
Referring to fig. 2, a deep neural network-based production line scheduling system includes:
The first module is used for acquiring a production order and constructing an intelligent scheduling system;
A second module for defining a state representation vector of the production order and an action space of the intelligent scheduling system;
The third module is used for performing iterative interactive training on the depth Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained depth Q network;
and the fourth module is used for combining the monitoring system and the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.
The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.
While the preferred embodiment of the present application has been described in detail, the application is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (8)

1. The production line scheduling method based on the deep neural network is characterized by comprising the following steps of:
acquiring a production order and constructing an intelligent scheduling system;
defining a state representation vector of a production order and an action space of an intelligent scheduling system;
Performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained deep Q network;
and (3) combining the monitoring system with the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.
2. The method for scheduling production lines based on deep neural network according to claim 1, wherein the steps of obtaining production orders and constructing an intelligent scheduling system specifically comprise:
Acquiring a production order based on a production line;
and constructing an intelligent scheduling system, wherein the intelligent scheduling system is used for realizing information interaction between the production line and the deep Q network.
3. The method for scheduling production lines based on deep neural network according to claim 2, wherein the step of defining the state representation vector of the production order and the action space of the intelligent scheduling system specifically comprises the following steps:
defining a status representation vector of the production order, the status representation vector including a device status, a robotic arm status, an AGV status, and a task status;
and defining an action space of the intelligent scheduling system, wherein the action space comprises a task allocation action, a mechanical arm scheduling action and an AGV scheduling action.
4. The method for scheduling a production line based on a deep neural network according to claim 3, wherein the step of performing iterative interactive training on the deep Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain the trained deep Q network specifically comprises the following steps:
carrying out data preprocessing on the state representation vector of the production order to obtain a preprocessed state representation vector;
Inputting the preprocessed state representation vector into a deep Q network, and outputting an action corresponding to the maximum Q value through a greedy strategy;
The production line executes the action corresponding to the maximum Q value to obtain the next time state expression vector of the production order and execute action rewards;
Combining an action space of the intelligent scheduling system, a preprocessed state representing vector, a next time state representing vector of a production order and an execution action reward to construct an experience playback buffer zone;
Extracting the experience playback buffer zone, and training the deep Q network based on an extraction result to obtain a target Q value;
Calculating the mean square error between the maximum Q value and the target Q value and updating the parameters of the depth Q network;
the method comprises the steps of obtaining actions corresponding to the maximum Q value of a cycle, executing actions corresponding to the maximum Q value of a production line, constructing an experience playback buffer zone, extracting the experience playback buffer zone and updating parameters of a depth Q network until the cycle times meet preset times, and obtaining the trained depth Q network.
5. The method for scheduling production lines based on deep neural network according to claim 4, wherein the calculation expression for executing action rewards is specifically as follows:
R(s,a)=λ1·Requipment(s,a)+λ2·Rrobot_am(s,a)+λ3·RAGV(s,a)
In the above formula, lambda 1、λ2 and lambda 3 represent super parameters, R equipment (s, a) represents the ratio of the operating time of the equipment to the total time by using the rewards calculation equipment, R robot_am (s, a) represents the ratio of the operating time of the mechanical arm to the total time by using the rewards calculation mechanical arm, R AGV (s, a) represents the ratio of the operating time of the AGV to the total time by using the rewards calculation AGV, R (s, a) represents the current rewards obtained by executing the action rewards function, s represents the state, and a represents the action.
6. The method for scheduling a production line based on a deep neural network according to claim 4, wherein the calculation expression of the target Q value is specifically as follows:
Qtarget(s,a)=R(s,a)+γ·maxa'Q(s',a';θ)
In the above equation, Q target (s, a) represents a target Q value, represents an expected jackpot for selecting action a in state s, R (s, a) represents a currently obtained prize, γ represents a discount factor, max a' Q (s ', a'; θ) represents a maximum Q value for selecting action a 'in the next state s', and θ represents a network parameter.
7. The method for scheduling production lines based on deep neural networks according to claim 4, wherein the parameter updating of the deep Q network is performed by a gradient descent minimization loss function, and the expression of the gradient descent minimization loss function is specifically as follows:
In the above formula, θ represents a network parameter, α represents a learning rate, Representing the gradient to the network parameter θ, Q target (s, a) represents the target Q value, Q (s ', a'; θ) represents the predicted value of the current Q network for state s and action a.
8. A deep neural network-based production line scheduling system, comprising the following modules:
The first module is used for acquiring a production order and constructing an intelligent scheduling system;
A second module for defining a state representation vector of the production order and an action space of the intelligent scheduling system;
The third module is used for performing iterative interactive training on the depth Q network based on the state expression vector of the production order and the action space of the intelligent scheduling system to obtain a trained depth Q network;
and the fourth module is used for combining the monitoring system and the trained deep Q network, and deploying the deep Q network to a production line to conduct production order scheduling guidance.
CN202410001969.4A 2024-01-02 2024-01-02 Production line scheduling method and system based on deep neural network Pending CN117910748A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410001969.4A CN117910748A (en) 2024-01-02 2024-01-02 Production line scheduling method and system based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410001969.4A CN117910748A (en) 2024-01-02 2024-01-02 Production line scheduling method and system based on deep neural network

Publications (1)

Publication Number Publication Date
CN117910748A true CN117910748A (en) 2024-04-19

Family

ID=90684768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410001969.4A Pending CN117910748A (en) 2024-01-02 2024-01-02 Production line scheduling method and system based on deep neural network

Country Status (1)

Country Link
CN (1) CN117910748A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095811A (en) * 2024-04-29 2024-05-28 应急管理部沈阳消防研究所 Forest fire-extinguishing ground-air collaborative command scheduling method based on deep reinforcement learning model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095811A (en) * 2024-04-29 2024-05-28 应急管理部沈阳消防研究所 Forest fire-extinguishing ground-air collaborative command scheduling method based on deep reinforcement learning model
CN118095811B (en) * 2024-04-29 2024-07-19 应急管理部沈阳消防研究所 Forest fire-extinguishing ground-air collaborative command scheduling method based on deep reinforcement learning model

Similar Documents

Publication Publication Date Title
Zhou et al. Multi-agent reinforcement learning for online scheduling in smart factories
Diez-Olivan et al. Data fusion and machine learning for industrial prognosis: Trends and perspectives towards Industry 4.0
Zhou et al. Reinforcement learning with composite rewards for production scheduling in a smart factory
Huang et al. A proactive task dispatching method based on future bottleneck prediction for the smart factory
Cai et al. Real-time scheduling simulation optimisation of job shop in a production-logistics collaborative environment
CN112581303A (en) Artificial intelligence channel for industrial automation
CN112579653A (en) Progressive contextualization and analysis of industrial data
CN107831685B (en) Group robot control method and system
Onggo et al. Combining symbiotic simulation systems with enterprise data storage systems for real-time decision-making
Ouahabi et al. A distributed digital twin architecture for shop floor monitoring based on edge-cloud collaboration
Goby et al. Deep reinforcement learning with combinatorial actions spaces: An application to prescriptive maintenance
CN117910748A (en) Production line scheduling method and system based on deep neural network
KR20230017556A (en) System and operational methods for manufacturing execution based on artificial intelligence and bigdata
Sandengen et al. High performance manufacturing-an innovative contribution towards industry 4.0
CN111752710A (en) Data center PUE dynamic optimization method, system, equipment and readable storage medium
Borangiu et al. Smart manufacturing control with cloud-embedded digital twins
Müller-Zhang et al. Towards live decision-making for service-based production: Integrated process planning and scheduling with Digital Twins and Deep-Q-Learning
Khadivi et al. Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions
Lamprecht et al. Reinforcement learning based condition-oriented maintenance scheduling for flow line systems
CN113568747B (en) Cloud robot resource scheduling method and system based on task classification and time sequence prediction
Elsayed et al. Deep reinforcement learning based actor-critic framework for decision-making actions in production scheduling
Alamaniotis et al. Fuzzy leaky bucket with application to coordinating smart appliances in smart homes
CN116594358B (en) Multi-layer factory workshop scheduling method based on reinforcement learning
Gu et al. Dynamic scheduling mechanism for intelligent workshop with deep reinforcement learning method based on multi-agent system architecture
Balali et al. Data Intensive Industrial Asset Management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: No.33, Guangyun Road, Shishan town, Nanhai District, Foshan City, Guangdong Province 528225

Applicant after: Foshan University

Address before: No.33, Guangyun Road, Shishan town, Nanhai District, Foshan City, Guangdong Province 528225

Applicant before: FOSHAN University

Country or region before: China