CN108008627A - A kind of reinforcement learning adaptive PID control method of parallel optimization - Google Patents
A kind of reinforcement learning adaptive PID control method of parallel optimization Download PDFInfo
- Publication number
- CN108008627A CN108008627A CN201711325553.4A CN201711325553A CN108008627A CN 108008627 A CN108008627 A CN 108008627A CN 201711325553 A CN201711325553 A CN 201711325553A CN 108008627 A CN108008627 A CN 108008627A
- Authority
- CN
- China
- Prior art keywords
- pid
- parameter
- output
- control
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000005457 optimization Methods 0.000 title claims abstract description 12
- 230000002787 reinforcement Effects 0.000 title claims abstract description 6
- 230000006870 function Effects 0.000 claims abstract description 27
- 230000005540 biological transmission Effects 0.000 claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 9
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 230000000694 effects Effects 0.000 claims abstract description 8
- 230000002490 cerebral effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 239000003795 chemical substances by application Substances 0.000 claims description 3
- 210000004218 nerve net Anatomy 0.000 claims description 3
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 238000004321 preservation Methods 0.000 claims description 2
- 238000012800 visualization Methods 0.000 claims description 2
- 230000003252 repetitive effect Effects 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 3
- 230000004044 response Effects 0.000 description 10
- 230000008901 benefit Effects 0.000 description 8
- 238000001824 photoionisation detection Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 235000008331 Pinus X rigitaeda Nutrition 0.000 description 1
- 235000011613 Pinus brutia Nutrition 0.000 description 1
- 241000018646 Pinus brutia Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005530 etching Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B11/00—Automatic controllers
- G05B11/01—Automatic controllers electric
- G05B11/36—Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
- G05B11/42—Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P. I., P. I. D.
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a kind of reinforcement learning adaptive PID control method of parallel optimization, it is characterised in that comprises the following steps:Step S1:With matlab softwares, transmission function discretization, initialization controller parameter and M control thread are carried out by collateral learning by zero-order holder method;Step S2:Definition input signal passes to the transmission function in S1, calculates output valve, will input the input vector as control algolithm with the difference of output signal;Step S3:Input vector is passed to improved self-adaptive PID controller to be trained, trained model is obtained after iteration n times;Step S4:Test, record input, output signal, the changing value of pid parameter are controlled using trained model;Step S5:Visual testing data, control effect contrast.The invention preferably solves the problems, such as that conventional self-adaptive PID exists, the characteristic learnt using the multi-threaded parallel of A3C study, improves the stability and learning efficiency of algorithm.
Description
Technical field
The present invention relates to a kind of Adaptive PID Control method, belongs to control class technical field, and specifically one kind is based on
Improved self-adaptive PID (proportional-integral-differential) control algolithm of the actuator evaluator of parallel optimization.
Background technology
PID(Proportional/Integral/Differential;Proportional/integral/derivative) control system is a kind of
Linear controller, is controlled according to deviation principle, since its principle is simple, strong robustness, adjusts simple and is not required to
A kind of the advantages that obtaining the mathematical models of object, it has also become most common control system in Industry Control.In PID control
In the engineering practice that systematic parameter is adjusted, especially with regard to it is linear, when constant, weak time lag system pid control parameter adjust
In engineering practice, traditional setting method achieves rich experience, and is widely used.But in actual industrial process control
In engineering practice processed, many control targets have the features such as time-varying Hurst index, purely retarded, and control process mechanism is more complicated;
Under the influence of the factors such as noise, load disturbance, procedure parameter, even model structure, can change.Thus require
Pid parameter can realize on-line tuning to meet the requirement controlled in real time.In such cases, traditional parameters setting method is then difficult to
Meet the requirement of engineering practice, show significant limitation.
Adaptive PID Control technology is to solve a kind of effective way of problems.Adaptive PID Control model is drawn
The advantages of both self adaptive control thought and conventional PID controller.First, it is adaptive controller, has automatic identification quilt
Control process, automatic adjusting controller parameter, can adapt to the advantages that controlled process Parameters variation;Secondly, and there is Traditional PID control
Device processed is simple in structure, robustness is good, high reliability.Due to such a advantage, becoming one in engineering practice
The preferable industrial stokehold device of kind.After Adaptive PID Control is suggested, the research of extensive scholar is just received, is carried in succession
Fuzzy Self-adaptive PID, Neural Network Adaptive PID Control device, Actor-Critic self-adaptive PID controllers are gone out.
Such as document 1:Liu Guorong, positive constitution favour Fuzzy Self-adaptive PIDs [J] is controlled and decision-making, in 1995 (6)
The self-adaptive PID controller based on fuzzy rule is proposed, its main thought is:When system give mutation, go out present condition interference
Or during structure interference, its transient response can be divided into 9 kinds of situations, after system response is obtained in each sampling instant, so that it may
To deviate given situation and variation tendency according to the response of etching system at this time, knowledge is controlled according to existing system, with fuzzy
Control method, it is appropriate to increase control dynamics or reduce control dynamics, to control response towards given direction change is deviateed, make output
Tend to as early as possible given.But this control method needs the system that the experience of professional and parameter optimization could control complexity,
The inaccurate control effect of fuzzy rule setting does not reach satisfied effect then.
2 Liao Fang of document virtues, Xiao build research [J] the Journal of System Simulation of based on BP neural network pid parameter Self-tuning System,
2005 propose the Adaptive PID Control based on BP neural network, its control thought is:Neural network identifier is by control deviation
Neutral net self neural member is transferred back to, so that its own weights is corrected, setting input and the reality output of object of object
By being counter-propagating to nerve network controller after identifier, it is modified network weight using error signal deviation, warp
Repeatedly study is crossed, just can gradually keep up with the change of system.This method generally carries out parameter optimization, but teacher using supervised learning
Signal is difficult to obtain.
3 Chen Xue of document pines, Adaptive PID Control [J] control theories of the Yang Yi people based on the study of actuator-evaluator with
Using 2011 propose a kind of Adaptive PID Control of Actor-Critic structures.The control thought is:Utilize AC study
Model-free on-line study ability, adaptively adjusts pid parameter, realizes the strategy of Actor at the same time using a RBF network
The value function of function and Critic learn, and solve the deficiency that conventional PID controllers are not easy online setting parameter in real time, and have
The advantages that response speed adaptive ability is strong.But the unstability of AC learning structures in itself, which often leads to algorithm, to be difficult to restrain.
Patent CN201510492758 discloses a kind of executing agency's Adaptive PID Control method, which combines
Expert PID Controller and fuzzy controller and it is connected respectively with executing agency, executing agency is according to current state information
And it is expected information selection Expert PID Controller or fuzzy controller, although this controller can reduce overshoot,
Have the characteristics that control accuracy is high, but this controller still needs a large amount of prioris of professional, carrys out Decision Control
The use of device.
The content of the invention
The object of the invention:The characteristics of for Adaptive PID Control, it is proposed that the actuator evaluator based on parallel optimization
The method of the Adaptive PID Control (A3C) of habit, for the control in industry to system.The invention preferably solve it is conventional from
Adapt to PID there are the problem of, using A3C study multi-threaded parallel learn characteristic, improve stability and the study of algorithm
Efficiency.The advantages that self-adaptive PID controller based on A3C has fast response time, and adaptive ability is strong, strong antijamming capability.
The Adaptive PID Control method of actuator evaluator study based on parallel optimization, comprises the following steps:
Step S1:It is fixed with MATLAB (MATLAB, the business mathematics software that MathWorks companies of the U.S. produce) software
The adopted continuous transmission function of an arbitrary order by control system, by zero-order holder method by its discretization, obtains one and makes by oneself
The discretization transmission function of adopted time interval, initialization controller parameter and M control thread carry out collateral learning, wherein parameter
Mainly include BP neural network parameter and PID control environmental parameter, each thread is an independent control agents;
Step S2:After step S1, the control object for initializing BP nerve nets weighting parameter and PID controller, one is defined
A discrete input signal RIN, will be discrete after input signal according to definition time interval successively be passed to discretization after biography
Delivery function, calculates the output valve of transmission function, and input and the difference of output signal are calculated as A3C Adaptive PID Controls
The input vector x (t) of method;
Step S3:The input vector x (t) obtained in step S2 is passed to the A3C adaptive PID Controls put up
Training is iterated, trained model is obtained after iteration n times;
Step S31:Calculate error current e (t), first-order error Δ e (t), second order error Δ e2(t) input as algorithm
Vector x (t)=[e (t), Δ e (t), Δ2e(t)]T, and it is normalized with sigmod function pairs;
Step S32:Input vector is passed to the Actor networks of each thread, and obtains the new parameters of PID.Actor
Network is not the average and variance of the parameter value of direct output PID but the Gaussian Profile of output tri- parameters of PID, passes through three
The Gaussian Profile of a parameter estimates three parameter values, o=1, when 2,3, output layer output be pid parameter average, o=4,
When 5,6, output be pid parameter variance.Wherein Actor networks are totally 3 layers of a BP neural networks:1st layer is input layer,
The 2nd layer of input for hidden layer
The output ho of hidden layerk(t)=min (max (hik0), (t), 6) k=1,2,3 ... 20
3rd layer is output layer, the input of output layer
The output of output layer
Step S33:New pid parameter is assigned to controller, obtains control output, calculates control error, according to
Environment reward function R (t) calculates award value.R (t)=α1r1(t)+α2r2(t) To the vector value x ' (t) of next state;
Step S34:By reward function R (t), current state vector x (t), next state vector x ' (t) passes to
Critic networks, Critic network structures are similar with Actor network structures, and difference lies in output node to only have one.Critic
The main output state value of network simultaneously calculates TD errors, δTD=r (t)+γ V (St+1,Wv′)-V(St,Wv′);
Step S35:After TD errors are calculated, each Actor-Critic networks in A3C structures can't directly update
The network weight of itself, but the Actor-Critic nets of renewal middle cerebral (Global-net) storage are removed with the gradient of itself
Network parameter, update mode are
Wv=Wv+αcdWv, wherein WaFor the Actor network weights of middle cerebral storage, W 'aFor the power of the Actor networks of each AC structures
Value, WvFor the Critic network weights of middle cerebral storage, W 'vRepresent the Critic network weights of each AC structures, αaFor
The learning rate of Actor, αcFor the learning rate of Critic, it is newest to pass to each one, AC structures for middle cerebral after the updating
Parameter;
Step S36:Above to complete a training process, loop iteration n times, exit training, preservation model.
Step S4:Test is controlled using trained model, records input signal, exports signal, pid parameter
Changing value;
Step S41:The input signal defined using step S1, is delivered to the highest thread of trained reward function
Controlling model;
Step S42:Calculated after S41 it is current, once, second order error as input vector, be input to selected control
Model, unlike training process, it is only necessary to the pid parameter adjustment amount of Actor networks output, and the PID after adjustment is joined
Number passes to controller, obtains the output of controller;
Step S43:The input signal that step S42 is obtained is preserved, exports signal, and pid parameter changing value.
Step S5:Input signal using the experimental data that Matlab visualization steps S4 is obtained including controller,
Export signal, the changing value of pid parameter, and and Fuzzy Adaptive PID Control, AC-PID Adaptive PID Controls be controlled effect
Fruit contrasts.
Brief description of the drawings
Attached drawing 1 is process flow schematic diagram of the invention.
Attached drawing 2 is improved self-adaptive PID controller structure chart
Attached drawing 3 is as input signal, the output signal of improved controller using jump rank signal
Attached drawing 4 is the controlled quentity controlled variable of controller after improving
Attached drawing 5 is the control error for improving self-adaptive PID controller
Attached drawing 6 is the parameter adjustment curve of A3C self-adaptive PID controllers
Attached drawing 7 is controller after improvement and fuzzy, the comparison of AC structure adaptive PID controllers
The different controller control Experimental comparisons of attached drawing 8 and analysis
Embodiment
1-5 below in conjunction with the accompanying drawings, with MATLAB softwares, the invention will be further described:Based on parallel optimization
The Adaptive PID Control of actuator evaluator study, specific embodiment include the following steps that step is as shown in Figure 1:
(1) parameter initialization.Elected as by control systemOne third order transfer function, it is discrete
Time is set to 0.001s, use Z change discretization after transmission function for:Yourt (k)=- den (2) yourt (k-1)-den
(3) yourt (k-2)-den (4) yourt (k-1)+num (2) u (k-1)+num (3) u (k-2)+num (4) u (k-3), input letter
Number for jump rank signal of the value equal to 1.0, single train epochs be 1000 steps, time 1.0s, initialize 4 threads and represent 4
Independent self-adaptive PID controller, is trained.
(2) input vector is calculated.E (t)=rin (0)-yourt (0)=1.0 during t=0;E (t-1)=0;E (t-2)=0
Input vector x (t)=[e (t), Δ e (t), Δ2e(t)]TWherein e (t)=rin-yourt=1.0 Δ e (t)=e (t)-e (t-
1)=1.0 Δ2E (t)=e (t) -2*e (t-1)+e (t-2)=1.0;The x (t) of calculating=[1.0,1.0,1.0]TBy sigmod
The input vector that function normalization obtains finally is x [t]=[0.73,0.73,0.73]T。
(3) training pattern.Improved self-adaptive PID controller structure is first as shown in Fig. 2, after state vector is calculated
State vector is first passed to Actor networks, Actor networks output P, the mean μ and variances sigma of tri- parameters of I, D, according to Gauss
Sampling draws P, and new parameter value, is assigned to incremental timestamp device, controller is according to error and newly by the actual parameter value of I, D
Pid parameter calculate controlled quentity controlled variable u (t)
U (t)=u (t-1)+Δ u (t)=u (t-1)+KI(t)e(t)+KP(t)Δe(t)+KD(t)Δ2e(t)
Controlled quentity controlled variable effect it is discrete after transmission function, calculate the output signal value of subsequent time t+1 according to the process of (1)
Yourt (t+1), error amount, state vector.In addition, environment reward function goes out the award value of control agents according to error calculation,
Reward function is as follows:
R (t)=α1r1(t)+α2r2(t)
Wherein α 1=0.6, α 2=0.4, e (t)=0.001
Reward function is the important component of intensified learning, after the value that receives awards, award value and subsequent time
State vector passes to Critic networks, and Critic networks export the state value at t and t+1 moment, and calculate TD errors, calculate
Formula is as follows:δTD=r (t)+γ V (St+1,Wv′)-V(St,Wv'), Wv' it is Critic network weights.Because the fortune of thread
It is not synchronous to calculate speed, thus each controller be not fixed order must be to being stored in the Global Net in Fig. 2
Actor networks and Critic network parameters are updated, and more new formula is: Wherein WaFor the Actor of middle cerebral storage
Network weight, W 'aFor the weights of the Actor networks of each AC structures, WvFor the Critic network weights of middle cerebral storage, W 'v
Represent the Critic network weights of each AC structures, αa=0.001 be Actor learning rate, αc=0.01 is the study of Critic
Rate, has completed training once herein, and after iteration 3000 times, algorithm is to reach stable state.
(4) experimental data is gathered.Using trained controller model, because setting 4 threads is controlled training,
The highest thread of cumulative award is chosen as test controller when controlling and testing.According to the control parameter of setting in (1)
It is controlled test, a length of 1s during control, that is, carry out 1000 secondary controls.According to the calculation in (2), state vector is calculated,
And trained model is passed to, in test process is controlled, Critic networks no longer work, Actor outputs P, I, D
Parameter value, during test is controlled, yourt, rin, u, P, I, D values, which preserve, is used for visual analyzing.
(5) data visualization.The data preserved in (4) are utilized into matlab software visualization tools, visual analyzing:Such as
Shown in attached drawing 3, attached drawing 3 representsyThe output valve of ourt, controller can reach within the time less than 0.2s stable state and
With regulating power quickly.The output signal of the controlled quentity controlled variable of device in order to control of attached drawing 4, reaches quickly from what figure can obtain that controller can be
To stable state.The control error of the device in order to control of attached drawing 5, wherein control error subtracts output signal equal to input signal amount
Amount.Attached drawing 6 device P in order to control, I, the situation of change of D parameters, it can be seen that reach stablize before 3 parameters there is different journeys
The adjustment of degree, after system stabilization, parameter then no longer changes.Using identical control object and input signal, to fuzzy adaptive
PID controller and the adaptive pid controllers of Actor-Critic is answered to carry out Experimental comparison, the signal output contrast of three kinds of controllers
Scheme visible attached drawing 7, visible attached drawing 8 is analyzed in control in detail, as shown in figure 8, too many professional people is being not required in the controller of the present invention
While member's priori, it is same with fuzzy controller have a less overshoot but response speed faster, than AC-
While PID controller has faster pace of learning, overshoot and response speed all occupy very big advantage.
Present invention aim to address conventional self-adaptive PID controller there are the problem of, Fuzzy Adaptive PID and expert
Self-adaptive PID controller needs the relevant knowledge of a large amount of professionals, and the teacher signal of Neural Network Adaptive PID Control device is difficult
To obtain, but because A3C learning structures are a kind of learning algorithms of intensified learning, the ability without model on-line study is not required to
Want too many professional priori and teacher signal so as to solving fuzzy, Expert self-adaptive PID control device and nerve net
Network self-adaptive PID controller there are the problem of.Again because the learning algorithm substantially increases AC- in the study of CPU multi-threaded parallels
The learning rate of PID controller, and have more preferable control effect.The more visible attached drawing 7 of specific control effect, attached drawing 7 are choosing
Three kinds of controllers:The A3C-PID controllers of fuzzy controller, AC-PID controllers and the present invention carry out identical ginseng
Control under several is compared, and visible attached drawing 8 is analyzed in control in detail:The controller of the present invention is being not required too many professional's priori to know
While knowledge, it is same with fuzzy controller have a less overshoot but response speed faster, than AC-PID controller
While with faster pace of learning, overshoot and response speed all occupy very big advantage.
The present invention is not limited to above-mentioned embodiment, according to the above, according to the ordinary technical knowledge of this area
And customary means, under the premise of the above-mentioned basic fundamental thought of the present invention is not departed from, the present invention can also make other diversified forms
Equivalent modifications, replacement or change, belong to protection scope of the present invention.
Claims (3)
1. the reinforcement learning adaptive PID control method of a kind of parallel optimization, it is characterised in that comprise the following steps:
Step S1:With MATLAB softwares, the continuous transmission function of an arbitrary order by control system is defined, is kept by zeroth order
Its discretization is obtained the discretization transmission function at a self defined time interval, initialization controller parameter and M control by device method
Thread processed carries out collateral learning, and wherein parameter mainly includes BP neural network parameter and PID control environmental parameter, each thread
For an independent control agents;
Step S2:After the control object for initializing BP nerve nets weighting parameter and PID controller, a discrete input letter is defined
Number RIN, will be discrete after input signal be passed to the transmission function after discretization successively according to the time interval of definition, calculate biography
The output valve of delivery function, and using input and input vector x of the difference as A3C Adaptive PID Control algorithms for exporting signal
(t);
Step S3:The input vector x (t) obtained in step S2 is passed to the A3C adaptive PID Controls put up to carry out
Repetitive exercise, obtains trained model after iteration n times;
Step S4:Test is controlled using trained model, records input signal, exports signal, the change of pid parameter
Value;
Step S5:Input signal using the experimental data that Matlab visualization steps S4 is obtained including controller, output
Signal, the changing value of pid parameter, and and Fuzzy Adaptive PID Control, AC-PID Adaptive PID Controls be controlled effect pair
Than.
A kind of 2. reinforcement learning adaptive PID control method of parallel optimization according to claims, it is characterised in that
Step S3 comprises the following steps:
Step S31:Calculate error current e (t), first-order error Δ e (t), second order error Δ e2(t) the input vector x as algorithm
(t)=[e (t), Δ e (t), Δ2e(t)]T, and it is normalized with sigmod function pairs;
Step S32:Input vector is passed to the Actor networks of each thread, and obtains the new parameters of PID.Actor networks
It is not the average and variance of the parameter value of direct output PID but the Gaussian Profile of output tri- parameters of PID, passes through three ginsengs
Several Gaussian Profiles estimates three parameter values, o=1, when 2,3, output layer output be pid parameter average, o=4,5,6
When, output be pid parameter variance, wherein Actor networks are totally 3 layers of a BP neural networks:1st layer is input layer, the 2nd
Layer is the input of hidden layer
The output ho of hidden layerk(t)=min (max (hik(t), 0), 6) k=1,2,3 ... 20,
3rd layer is output layer, the input of output layer
The output of output layer
Step S33:New pid parameter is assigned to controller, obtains control output, calculates control error, according to
Environment reward function R (t) calculates award value, R (t)=α1r1(t)+α2r2(t), To the vector value x ' (t) of next state;
Step S34:By reward function R (t), current state vector x (t), next state vector x ' (t) passes to Critic nets
Network, Critic network structures are similar with Actor network structures, and difference lies in output node to only have one, and Critic networks are main
Output state value simultaneously calculates TD errors, δTD=r (t)+γ V (St+1,Wv′)-V(St,Wv′);
Step S35:After TD errors are calculated, each Actor-Critic networks in A3C structures can't directly update certainly
The network weight of body, but the Actor-Critic networks of renewal middle cerebral (Global-net) storage are removed with the gradient of itself
Parameter, update mode areWa=Wa+αadWa,
Wv=Wv+αcdWv, wherein WaFor the Actor network weights of middle cerebral storage, W 'aFor the power of the Actor networks of each AC structures
Value, WvFor the Critic network weights of middle cerebral storage, W 'vRepresent the Critic network weights of each AC structures, αaFor
The learning rate of Actor, αcFor the learning rate of Critic, it is newest to pass to each one, AC structures for middle cerebral after the updating
Parameter;
Step S36:Above to complete a training process, loop iteration n times, exit training, preservation model.
A kind of 3. reinforcement learning adaptive PID control method of parallel optimization according to claims, it is characterised in that
Step S4 comprises the following steps:
Step S41:The input signal defined using step S1, is delivered to the control of the highest thread of trained reward function
Model;
Step S42:Calculated after S41 it is current, once, second order error as input vector, be input to selected Controlling model,
Unlike training process, it is only necessary to the pid parameter adjustment amount of Actor networks output, and the pid parameter transmission after adjustment
To controller, the output of controller is obtained;
Step S43:The input signal that step S42 is obtained is preserved, exports signal, and pid parameter changing value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711325553.4A CN108008627B (en) | 2017-12-13 | 2017-12-13 | Parallel optimization reinforcement learning self-adaptive PID control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711325553.4A CN108008627B (en) | 2017-12-13 | 2017-12-13 | Parallel optimization reinforcement learning self-adaptive PID control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108008627A true CN108008627A (en) | 2018-05-08 |
CN108008627B CN108008627B (en) | 2022-10-28 |
Family
ID=62058629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711325553.4A Active CN108008627B (en) | 2017-12-13 | 2017-12-13 | Parallel optimization reinforcement learning self-adaptive PID control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108008627B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107346138A (en) * | 2017-06-16 | 2017-11-14 | 武汉理工大学 | A kind of unmanned boat method for lateral control based on enhancing learning algorithm |
CN108803348A (en) * | 2018-08-03 | 2018-11-13 | 北京深度奇点科技有限公司 | A kind of optimization method of pid parameter and the optimization device of pid parameter |
CN109063823A (en) * | 2018-07-24 | 2018-12-21 | 北京工业大学 | A kind of intelligent body explores batch A3C intensified learning method in the labyrinth 3D |
CN109521669A (en) * | 2018-11-12 | 2019-03-26 | 中国航空工业集团公司北京航空精密机械研究所 | A kind of turning table control methods of self-tuning based on intensified learning |
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
CN110308655A (en) * | 2019-07-02 | 2019-10-08 | 西安交通大学 | Servo system compensation method based on A3C algorithm |
CN110376879A (en) * | 2019-08-16 | 2019-10-25 | 哈尔滨工业大学(深圳) | A kind of PID type iterative learning control method neural network based |
CN111079936A (en) * | 2019-11-06 | 2020-04-28 | 中国科学院自动化研究所 | Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning |
CN111856920A (en) * | 2020-07-24 | 2020-10-30 | 重庆红江机械有限责任公司 | A3C-PID-based self-adaptive rail pressure adjusting method and storage medium |
CN112162861A (en) * | 2020-09-29 | 2021-01-01 | 广州虎牙科技有限公司 | Thread allocation method and device, computer equipment and storage medium |
CN112631120A (en) * | 2019-10-09 | 2021-04-09 | Oppo广东移动通信有限公司 | PID control method, device and video coding and decoding system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102588129A (en) * | 2012-02-07 | 2012-07-18 | 上海艾铭思汽车控制系统有限公司 | Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel |
-
2017
- 2017-12-13 CN CN201711325553.4A patent/CN108008627B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102588129A (en) * | 2012-02-07 | 2012-07-18 | 上海艾铭思汽车控制系统有限公司 | Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel |
Non-Patent Citations (5)
Title |
---|
WANG XUE-SONG等: "A Proposal of Adaptive PID Controller Based on Reinforcement Learning", 《JOURNAL OF CHINA UNIVERSITY OF MINING & TECHNOLOGY》 * |
张超等: "基于AC-PID控制器的焊接机器人仿真", 《焊接技术》 * |
林小峰等: "多目标执行依赖启发式动态规划励磁控制", 《电力系统及其自动化学报》 * |
陈学松: "强化学习及其在机器人系统中的应用研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
陈学松等: "基于执行器-评价器学习的自适应PID控制", 《控制理论与应用》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107346138B (en) * | 2017-06-16 | 2020-05-05 | 武汉理工大学 | Unmanned ship lateral control method based on reinforcement learning algorithm |
CN107346138A (en) * | 2017-06-16 | 2017-11-14 | 武汉理工大学 | A kind of unmanned boat method for lateral control based on enhancing learning algorithm |
CN109063823A (en) * | 2018-07-24 | 2018-12-21 | 北京工业大学 | A kind of intelligent body explores batch A3C intensified learning method in the labyrinth 3D |
CN109063823B (en) * | 2018-07-24 | 2022-06-07 | 北京工业大学 | Batch A3C reinforcement learning method for exploring 3D maze by intelligent agent |
CN108803348A (en) * | 2018-08-03 | 2018-11-13 | 北京深度奇点科技有限公司 | A kind of optimization method of pid parameter and the optimization device of pid parameter |
CN108803348B (en) * | 2018-08-03 | 2021-07-13 | 北京深度奇点科技有限公司 | PID parameter optimization method and PID parameter optimization device |
CN109521669A (en) * | 2018-11-12 | 2019-03-26 | 中国航空工业集团公司北京航空精密机械研究所 | A kind of turning table control methods of self-tuning based on intensified learning |
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
CN109696830B (en) * | 2019-01-31 | 2021-12-03 | 天津大学 | Reinforced learning self-adaptive control method of small unmanned helicopter |
CN110308655A (en) * | 2019-07-02 | 2019-10-08 | 西安交通大学 | Servo system compensation method based on A3C algorithm |
CN110376879A (en) * | 2019-08-16 | 2019-10-25 | 哈尔滨工业大学(深圳) | A kind of PID type iterative learning control method neural network based |
CN112631120A (en) * | 2019-10-09 | 2021-04-09 | Oppo广东移动通信有限公司 | PID control method, device and video coding and decoding system |
WO2021068748A1 (en) * | 2019-10-09 | 2021-04-15 | Oppo广东移动通信有限公司 | Pid control method and apparatus, and video encoding and decoding system |
CN112631120B (en) * | 2019-10-09 | 2022-05-17 | Oppo广东移动通信有限公司 | PID control method, device and video coding and decoding system |
CN111079936A (en) * | 2019-11-06 | 2020-04-28 | 中国科学院自动化研究所 | Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning |
CN111079936B (en) * | 2019-11-06 | 2023-03-14 | 中国科学院自动化研究所 | Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning |
CN111856920A (en) * | 2020-07-24 | 2020-10-30 | 重庆红江机械有限责任公司 | A3C-PID-based self-adaptive rail pressure adjusting method and storage medium |
CN112162861A (en) * | 2020-09-29 | 2021-01-01 | 广州虎牙科技有限公司 | Thread allocation method and device, computer equipment and storage medium |
CN112162861B (en) * | 2020-09-29 | 2024-04-19 | 广州虎牙科技有限公司 | Thread allocation method, thread allocation device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108008627B (en) | 2022-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108008627A (en) | A kind of reinforcement learning adaptive PID control method of parallel optimization | |
Ahamed et al. | A reinforcement learning approach to automatic generation control | |
CN108284442B (en) | Mechanical arm flexible joint control method based on fuzzy neural network | |
Wang | Intelligent critic control with robustness guarantee of disturbed nonlinear plants | |
DE69717987T2 (en) | METHOD AND DEVICE FOR SIMULATING DYNAMIC AND STATIONARY PREDICTION, REGULATION AND OPTIMIZATION METHODS | |
Song et al. | Neural-network-based synchronous iteration learning method for multi-player zero-sum games | |
Koryakovskiy et al. | Model-plant mismatch compensation using reinforcement learning | |
CN110134165B (en) | Reinforced learning method and system for environmental monitoring and control | |
Song et al. | Online optimal event-triggered H∞ control for nonlinear systems with constrained state and input | |
Radac et al. | Three-level hierarchical model-free learning approach to trajectory tracking control | |
EP3704550B1 (en) | Generation of a control system for a target system | |
CN101390024A (en) | Operation control method, operation control device and operation control system | |
CN115167102A (en) | Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation | |
Li et al. | Training a robust reinforcement learning controller for the uncertain system based on policy gradient method | |
Kumar et al. | Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming | |
Wang et al. | Asynchronous learning for actor–critic neural networks and synchronous triggering for multiplayer system | |
Bayramoglu et al. | Time-varying sliding-coefficient-based decoupled terminal sliding-mode control for a class of fourth-order systems | |
Hager et al. | Adaptive Neural network control of a helicopter system with optimal observer and actor-critic design | |
Ornelas-Tellez et al. | Neural networks: A methodology for modeling and control design of dynamical systems | |
CN117970782B (en) | Fuzzy PID control method based on fish scale evolution GSOM improvement | |
Eqra et al. | A novel adaptive multi-critic based separated-states neuro-fuzzy controller: Architecture and application to chaos control | |
US11164077B2 (en) | Randomized reinforcement learning for control of complex systems | |
Gupta et al. | Modified grey wolf optimised adaptive super-twisting sliding mode control of rotary inverted pendulum system | |
CN105279978B (en) | Intersection traffic signal control method and equipment | |
JP7327569B1 (en) | Information processing equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |