[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108008627A - A kind of reinforcement learning adaptive PID control method of parallel optimization - Google Patents

A kind of reinforcement learning adaptive PID control method of parallel optimization Download PDF

Info

Publication number
CN108008627A
CN108008627A CN201711325553.4A CN201711325553A CN108008627A CN 108008627 A CN108008627 A CN 108008627A CN 201711325553 A CN201711325553 A CN 201711325553A CN 108008627 A CN108008627 A CN 108008627A
Authority
CN
China
Prior art keywords
pid
parameter
output
control
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711325553.4A
Other languages
Chinese (zh)
Other versions
CN108008627B (en
Inventor
孙歧峰
任辉
段友祥
李洪强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN201711325553.4A priority Critical patent/CN108008627B/en
Publication of CN108008627A publication Critical patent/CN108008627A/en
Application granted granted Critical
Publication of CN108008627B publication Critical patent/CN108008627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B11/00Automatic controllers
    • G05B11/01Automatic controllers electric
    • G05B11/36Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
    • G05B11/42Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P. I., P. I. D.
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a kind of reinforcement learning adaptive PID control method of parallel optimization, it is characterised in that comprises the following steps:Step S1:With matlab softwares, transmission function discretization, initialization controller parameter and M control thread are carried out by collateral learning by zero-order holder method;Step S2:Definition input signal passes to the transmission function in S1, calculates output valve, will input the input vector as control algolithm with the difference of output signal;Step S3:Input vector is passed to improved self-adaptive PID controller to be trained, trained model is obtained after iteration n times;Step S4:Test, record input, output signal, the changing value of pid parameter are controlled using trained model;Step S5:Visual testing data, control effect contrast.The invention preferably solves the problems, such as that conventional self-adaptive PID exists, the characteristic learnt using the multi-threaded parallel of A3C study, improves the stability and learning efficiency of algorithm.

Description

A kind of reinforcement learning adaptive PID control method of parallel optimization
Technical field
The present invention relates to a kind of Adaptive PID Control method, belongs to control class technical field, and specifically one kind is based on Improved self-adaptive PID (proportional-integral-differential) control algolithm of the actuator evaluator of parallel optimization.
Background technology
PID(Proportional/Integral/Differential;Proportional/integral/derivative) control system is a kind of Linear controller, is controlled according to deviation principle, since its principle is simple, strong robustness, adjusts simple and is not required to A kind of the advantages that obtaining the mathematical models of object, it has also become most common control system in Industry Control.In PID control In the engineering practice that systematic parameter is adjusted, especially with regard to it is linear, when constant, weak time lag system pid control parameter adjust In engineering practice, traditional setting method achieves rich experience, and is widely used.But in actual industrial process control In engineering practice processed, many control targets have the features such as time-varying Hurst index, purely retarded, and control process mechanism is more complicated; Under the influence of the factors such as noise, load disturbance, procedure parameter, even model structure, can change.Thus require Pid parameter can realize on-line tuning to meet the requirement controlled in real time.In such cases, traditional parameters setting method is then difficult to Meet the requirement of engineering practice, show significant limitation.
Adaptive PID Control technology is to solve a kind of effective way of problems.Adaptive PID Control model is drawn The advantages of both self adaptive control thought and conventional PID controller.First, it is adaptive controller, has automatic identification quilt Control process, automatic adjusting controller parameter, can adapt to the advantages that controlled process Parameters variation;Secondly, and there is Traditional PID control Device processed is simple in structure, robustness is good, high reliability.Due to such a advantage, becoming one in engineering practice The preferable industrial stokehold device of kind.After Adaptive PID Control is suggested, the research of extensive scholar is just received, is carried in succession Fuzzy Self-adaptive PID, Neural Network Adaptive PID Control device, Actor-Critic self-adaptive PID controllers are gone out.
Such as document 1:Liu Guorong, positive constitution favour Fuzzy Self-adaptive PIDs [J] is controlled and decision-making, in 1995 (6) The self-adaptive PID controller based on fuzzy rule is proposed, its main thought is:When system give mutation, go out present condition interference Or during structure interference, its transient response can be divided into 9 kinds of situations, after system response is obtained in each sampling instant, so that it may To deviate given situation and variation tendency according to the response of etching system at this time, knowledge is controlled according to existing system, with fuzzy Control method, it is appropriate to increase control dynamics or reduce control dynamics, to control response towards given direction change is deviateed, make output Tend to as early as possible given.But this control method needs the system that the experience of professional and parameter optimization could control complexity, The inaccurate control effect of fuzzy rule setting does not reach satisfied effect then.
2 Liao Fang of document virtues, Xiao build research [J] the Journal of System Simulation of based on BP neural network pid parameter Self-tuning System, 2005 propose the Adaptive PID Control based on BP neural network, its control thought is:Neural network identifier is by control deviation Neutral net self neural member is transferred back to, so that its own weights is corrected, setting input and the reality output of object of object By being counter-propagating to nerve network controller after identifier, it is modified network weight using error signal deviation, warp Repeatedly study is crossed, just can gradually keep up with the change of system.This method generally carries out parameter optimization, but teacher using supervised learning Signal is difficult to obtain.
3 Chen Xue of document pines, Adaptive PID Control [J] control theories of the Yang Yi people based on the study of actuator-evaluator with Using 2011 propose a kind of Adaptive PID Control of Actor-Critic structures.The control thought is:Utilize AC study Model-free on-line study ability, adaptively adjusts pid parameter, realizes the strategy of Actor at the same time using a RBF network The value function of function and Critic learn, and solve the deficiency that conventional PID controllers are not easy online setting parameter in real time, and have The advantages that response speed adaptive ability is strong.But the unstability of AC learning structures in itself, which often leads to algorithm, to be difficult to restrain.
Patent CN201510492758 discloses a kind of executing agency's Adaptive PID Control method, which combines Expert PID Controller and fuzzy controller and it is connected respectively with executing agency, executing agency is according to current state information And it is expected information selection Expert PID Controller or fuzzy controller, although this controller can reduce overshoot, Have the characteristics that control accuracy is high, but this controller still needs a large amount of prioris of professional, carrys out Decision Control The use of device.
The content of the invention
The object of the invention:The characteristics of for Adaptive PID Control, it is proposed that the actuator evaluator based on parallel optimization The method of the Adaptive PID Control (A3C) of habit, for the control in industry to system.The invention preferably solve it is conventional from Adapt to PID there are the problem of, using A3C study multi-threaded parallel learn characteristic, improve stability and the study of algorithm Efficiency.The advantages that self-adaptive PID controller based on A3C has fast response time, and adaptive ability is strong, strong antijamming capability.
The Adaptive PID Control method of actuator evaluator study based on parallel optimization, comprises the following steps:
Step S1:It is fixed with MATLAB (MATLAB, the business mathematics software that MathWorks companies of the U.S. produce) software The adopted continuous transmission function of an arbitrary order by control system, by zero-order holder method by its discretization, obtains one and makes by oneself The discretization transmission function of adopted time interval, initialization controller parameter and M control thread carry out collateral learning, wherein parameter Mainly include BP neural network parameter and PID control environmental parameter, each thread is an independent control agents;
Step S2:After step S1, the control object for initializing BP nerve nets weighting parameter and PID controller, one is defined A discrete input signal RIN, will be discrete after input signal according to definition time interval successively be passed to discretization after biography Delivery function, calculates the output valve of transmission function, and input and the difference of output signal are calculated as A3C Adaptive PID Controls The input vector x (t) of method;
Step S3:The input vector x (t) obtained in step S2 is passed to the A3C adaptive PID Controls put up Training is iterated, trained model is obtained after iteration n times;
Step S31:Calculate error current e (t), first-order error Δ e (t), second order error Δ e2(t) input as algorithm Vector x (t)=[e (t), Δ e (t), Δ2e(t)]T, and it is normalized with sigmod function pairs;
Step S32:Input vector is passed to the Actor networks of each thread, and obtains the new parameters of PID.Actor Network is not the average and variance of the parameter value of direct output PID but the Gaussian Profile of output tri- parameters of PID, passes through three The Gaussian Profile of a parameter estimates three parameter values, o=1, when 2,3, output layer output be pid parameter average, o=4, When 5,6, output be pid parameter variance.Wherein Actor networks are totally 3 layers of a BP neural networks:1st layer is input layer, The 2nd layer of input for hidden layer
The output ho of hidden layerk(t)=min (max (hik0), (t), 6) k=1,2,3 ... 20
3rd layer is output layer, the input of output layer
The output of output layer
Step S33:New pid parameter is assigned to controller, obtains control output, calculates control error, according to Environment reward function R (t) calculates award value.R (t)=α1r1(t)+α2r2(t) To the vector value x ' (t) of next state;
Step S34:By reward function R (t), current state vector x (t), next state vector x ' (t) passes to Critic networks, Critic network structures are similar with Actor network structures, and difference lies in output node to only have one.Critic The main output state value of network simultaneously calculates TD errors, δTD=r (t)+γ V (St+1,Wv′)-V(St,Wv′);
Step S35:After TD errors are calculated, each Actor-Critic networks in A3C structures can't directly update The network weight of itself, but the Actor-Critic nets of renewal middle cerebral (Global-net) storage are removed with the gradient of itself Network parameter, update mode are Wv=WvcdWv, wherein WaFor the Actor network weights of middle cerebral storage, W 'aFor the power of the Actor networks of each AC structures Value, WvFor the Critic network weights of middle cerebral storage, W 'vRepresent the Critic network weights of each AC structures, αaFor The learning rate of Actor, αcFor the learning rate of Critic, it is newest to pass to each one, AC structures for middle cerebral after the updating Parameter;
Step S36:Above to complete a training process, loop iteration n times, exit training, preservation model.
Step S4:Test is controlled using trained model, records input signal, exports signal, pid parameter Changing value;
Step S41:The input signal defined using step S1, is delivered to the highest thread of trained reward function Controlling model;
Step S42:Calculated after S41 it is current, once, second order error as input vector, be input to selected control Model, unlike training process, it is only necessary to the pid parameter adjustment amount of Actor networks output, and the PID after adjustment is joined Number passes to controller, obtains the output of controller;
Step S43:The input signal that step S42 is obtained is preserved, exports signal, and pid parameter changing value.
Step S5:Input signal using the experimental data that Matlab visualization steps S4 is obtained including controller, Export signal, the changing value of pid parameter, and and Fuzzy Adaptive PID Control, AC-PID Adaptive PID Controls be controlled effect Fruit contrasts.
Brief description of the drawings
Attached drawing 1 is process flow schematic diagram of the invention.
Attached drawing 2 is improved self-adaptive PID controller structure chart
Attached drawing 3 is as input signal, the output signal of improved controller using jump rank signal
Attached drawing 4 is the controlled quentity controlled variable of controller after improving
Attached drawing 5 is the control error for improving self-adaptive PID controller
Attached drawing 6 is the parameter adjustment curve of A3C self-adaptive PID controllers
Attached drawing 7 is controller after improvement and fuzzy, the comparison of AC structure adaptive PID controllers
The different controller control Experimental comparisons of attached drawing 8 and analysis
Embodiment
1-5 below in conjunction with the accompanying drawings, with MATLAB softwares, the invention will be further described:Based on parallel optimization The Adaptive PID Control of actuator evaluator study, specific embodiment include the following steps that step is as shown in Figure 1:
(1) parameter initialization.Elected as by control systemOne third order transfer function, it is discrete Time is set to 0.001s, use Z change discretization after transmission function for:Yourt (k)=- den (2) yourt (k-1)-den (3) yourt (k-2)-den (4) yourt (k-1)+num (2) u (k-1)+num (3) u (k-2)+num (4) u (k-3), input letter Number for jump rank signal of the value equal to 1.0, single train epochs be 1000 steps, time 1.0s, initialize 4 threads and represent 4 Independent self-adaptive PID controller, is trained.
(2) input vector is calculated.E (t)=rin (0)-yourt (0)=1.0 during t=0;E (t-1)=0;E (t-2)=0 Input vector x (t)=[e (t), Δ e (t), Δ2e(t)]TWherein e (t)=rin-yourt=1.0 Δ e (t)=e (t)-e (t- 1)=1.0 Δ2E (t)=e (t) -2*e (t-1)+e (t-2)=1.0;The x (t) of calculating=[1.0,1.0,1.0]TBy sigmod The input vector that function normalization obtains finally is x [t]=[0.73,0.73,0.73]T
(3) training pattern.Improved self-adaptive PID controller structure is first as shown in Fig. 2, after state vector is calculated State vector is first passed to Actor networks, Actor networks output P, the mean μ and variances sigma of tri- parameters of I, D, according to Gauss Sampling draws P, and new parameter value, is assigned to incremental timestamp device, controller is according to error and newly by the actual parameter value of I, D Pid parameter calculate controlled quentity controlled variable u (t)
U (t)=u (t-1)+Δ u (t)=u (t-1)+KI(t)e(t)+KP(t)Δe(t)+KD(t)Δ2e(t)
Controlled quentity controlled variable effect it is discrete after transmission function, calculate the output signal value of subsequent time t+1 according to the process of (1) Yourt (t+1), error amount, state vector.In addition, environment reward function goes out the award value of control agents according to error calculation, Reward function is as follows:
R (t)=α1r1(t)+α2r2(t)
Wherein α 1=0.6, α 2=0.4, e (t)=0.001
Reward function is the important component of intensified learning, after the value that receives awards, award value and subsequent time State vector passes to Critic networks, and Critic networks export the state value at t and t+1 moment, and calculate TD errors, calculate Formula is as follows:δTD=r (t)+γ V (St+1,Wv′)-V(St,Wv'), Wv' it is Critic network weights.Because the fortune of thread It is not synchronous to calculate speed, thus each controller be not fixed order must be to being stored in the Global Net in Fig. 2 Actor networks and Critic network parameters are updated, and more new formula is: Wherein WaFor the Actor of middle cerebral storage Network weight, W 'aFor the weights of the Actor networks of each AC structures, WvFor the Critic network weights of middle cerebral storage, W 'v Represent the Critic network weights of each AC structures, αa=0.001 be Actor learning rate, αc=0.01 is the study of Critic Rate, has completed training once herein, and after iteration 3000 times, algorithm is to reach stable state.
(4) experimental data is gathered.Using trained controller model, because setting 4 threads is controlled training, The highest thread of cumulative award is chosen as test controller when controlling and testing.According to the control parameter of setting in (1) It is controlled test, a length of 1s during control, that is, carry out 1000 secondary controls.According to the calculation in (2), state vector is calculated, And trained model is passed to, in test process is controlled, Critic networks no longer work, Actor outputs P, I, D Parameter value, during test is controlled, yourt, rin, u, P, I, D values, which preserve, is used for visual analyzing.
(5) data visualization.The data preserved in (4) are utilized into matlab software visualization tools, visual analyzing:Such as Shown in attached drawing 3, attached drawing 3 representsyThe output valve of ourt, controller can reach within the time less than 0.2s stable state and With regulating power quickly.The output signal of the controlled quentity controlled variable of device in order to control of attached drawing 4, reaches quickly from what figure can obtain that controller can be To stable state.The control error of the device in order to control of attached drawing 5, wherein control error subtracts output signal equal to input signal amount Amount.Attached drawing 6 device P in order to control, I, the situation of change of D parameters, it can be seen that reach stablize before 3 parameters there is different journeys The adjustment of degree, after system stabilization, parameter then no longer changes.Using identical control object and input signal, to fuzzy adaptive PID controller and the adaptive pid controllers of Actor-Critic is answered to carry out Experimental comparison, the signal output contrast of three kinds of controllers Scheme visible attached drawing 7, visible attached drawing 8 is analyzed in control in detail, as shown in figure 8, too many professional people is being not required in the controller of the present invention While member's priori, it is same with fuzzy controller have a less overshoot but response speed faster, than AC- While PID controller has faster pace of learning, overshoot and response speed all occupy very big advantage.
Present invention aim to address conventional self-adaptive PID controller there are the problem of, Fuzzy Adaptive PID and expert Self-adaptive PID controller needs the relevant knowledge of a large amount of professionals, and the teacher signal of Neural Network Adaptive PID Control device is difficult To obtain, but because A3C learning structures are a kind of learning algorithms of intensified learning, the ability without model on-line study is not required to Want too many professional priori and teacher signal so as to solving fuzzy, Expert self-adaptive PID control device and nerve net Network self-adaptive PID controller there are the problem of.Again because the learning algorithm substantially increases AC- in the study of CPU multi-threaded parallels The learning rate of PID controller, and have more preferable control effect.The more visible attached drawing 7 of specific control effect, attached drawing 7 are choosing Three kinds of controllers:The A3C-PID controllers of fuzzy controller, AC-PID controllers and the present invention carry out identical ginseng Control under several is compared, and visible attached drawing 8 is analyzed in control in detail:The controller of the present invention is being not required too many professional's priori to know While knowledge, it is same with fuzzy controller have a less overshoot but response speed faster, than AC-PID controller While with faster pace of learning, overshoot and response speed all occupy very big advantage.
The present invention is not limited to above-mentioned embodiment, according to the above, according to the ordinary technical knowledge of this area And customary means, under the premise of the above-mentioned basic fundamental thought of the present invention is not departed from, the present invention can also make other diversified forms Equivalent modifications, replacement or change, belong to protection scope of the present invention.

Claims (3)

1. the reinforcement learning adaptive PID control method of a kind of parallel optimization, it is characterised in that comprise the following steps:
Step S1:With MATLAB softwares, the continuous transmission function of an arbitrary order by control system is defined, is kept by zeroth order Its discretization is obtained the discretization transmission function at a self defined time interval, initialization controller parameter and M control by device method Thread processed carries out collateral learning, and wherein parameter mainly includes BP neural network parameter and PID control environmental parameter, each thread For an independent control agents;
Step S2:After the control object for initializing BP nerve nets weighting parameter and PID controller, a discrete input letter is defined Number RIN, will be discrete after input signal be passed to the transmission function after discretization successively according to the time interval of definition, calculate biography The output valve of delivery function, and using input and input vector x of the difference as A3C Adaptive PID Control algorithms for exporting signal (t);
Step S3:The input vector x (t) obtained in step S2 is passed to the A3C adaptive PID Controls put up to carry out Repetitive exercise, obtains trained model after iteration n times;
Step S4:Test is controlled using trained model, records input signal, exports signal, the change of pid parameter Value;
Step S5:Input signal using the experimental data that Matlab visualization steps S4 is obtained including controller, output Signal, the changing value of pid parameter, and and Fuzzy Adaptive PID Control, AC-PID Adaptive PID Controls be controlled effect pair Than.
A kind of 2. reinforcement learning adaptive PID control method of parallel optimization according to claims, it is characterised in that Step S3 comprises the following steps:
Step S31:Calculate error current e (t), first-order error Δ e (t), second order error Δ e2(t) the input vector x as algorithm (t)=[e (t), Δ e (t), Δ2e(t)]T, and it is normalized with sigmod function pairs;
Step S32:Input vector is passed to the Actor networks of each thread, and obtains the new parameters of PID.Actor networks It is not the average and variance of the parameter value of direct output PID but the Gaussian Profile of output tri- parameters of PID, passes through three ginsengs Several Gaussian Profiles estimates three parameter values, o=1, when 2,3, output layer output be pid parameter average, o=4,5,6 When, output be pid parameter variance, wherein Actor networks are totally 3 layers of a BP neural networks:1st layer is input layer, the 2nd Layer is the input of hidden layer
The output ho of hidden layerk(t)=min (max (hik(t), 0), 6) k=1,2,3 ... 20,
3rd layer is output layer, the input of output layer
The output of output layer
Step S33:New pid parameter is assigned to controller, obtains control output, calculates control error, according to Environment reward function R (t) calculates award value, R (t)=α1r1(t)+α2r2(t), To the vector value x ' (t) of next state;
Step S34:By reward function R (t), current state vector x (t), next state vector x ' (t) passes to Critic nets Network, Critic network structures are similar with Actor network structures, and difference lies in output node to only have one, and Critic networks are main Output state value simultaneously calculates TD errors, δTD=r (t)+γ V (St+1,Wv′)-V(St,Wv′);
Step S35:After TD errors are calculated, each Actor-Critic networks in A3C structures can't directly update certainly The network weight of body, but the Actor-Critic networks of renewal middle cerebral (Global-net) storage are removed with the gradient of itself Parameter, update mode areWa=WaadWa, Wv=WvcdWv, wherein WaFor the Actor network weights of middle cerebral storage, W 'aFor the power of the Actor networks of each AC structures Value, WvFor the Critic network weights of middle cerebral storage, W 'vRepresent the Critic network weights of each AC structures, αaFor The learning rate of Actor, αcFor the learning rate of Critic, it is newest to pass to each one, AC structures for middle cerebral after the updating Parameter;
Step S36:Above to complete a training process, loop iteration n times, exit training, preservation model.
A kind of 3. reinforcement learning adaptive PID control method of parallel optimization according to claims, it is characterised in that Step S4 comprises the following steps:
Step S41:The input signal defined using step S1, is delivered to the control of the highest thread of trained reward function Model;
Step S42:Calculated after S41 it is current, once, second order error as input vector, be input to selected Controlling model, Unlike training process, it is only necessary to the pid parameter adjustment amount of Actor networks output, and the pid parameter transmission after adjustment To controller, the output of controller is obtained;
Step S43:The input signal that step S42 is obtained is preserved, exports signal, and pid parameter changing value.
CN201711325553.4A 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method Active CN108008627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711325553.4A CN108008627B (en) 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711325553.4A CN108008627B (en) 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method

Publications (2)

Publication Number Publication Date
CN108008627A true CN108008627A (en) 2018-05-08
CN108008627B CN108008627B (en) 2022-10-28

Family

ID=62058629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711325553.4A Active CN108008627B (en) 2017-12-13 2017-12-13 Parallel optimization reinforcement learning self-adaptive PID control method

Country Status (1)

Country Link
CN (1) CN108008627B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107346138A (en) * 2017-06-16 2017-11-14 武汉理工大学 A kind of unmanned boat method for lateral control based on enhancing learning algorithm
CN108803348A (en) * 2018-08-03 2018-11-13 北京深度奇点科技有限公司 A kind of optimization method of pid parameter and the optimization device of pid parameter
CN109063823A (en) * 2018-07-24 2018-12-21 北京工业大学 A kind of intelligent body explores batch A3C intensified learning method in the labyrinth 3D
CN109521669A (en) * 2018-11-12 2019-03-26 中国航空工业集团公司北京航空精密机械研究所 A kind of turning table control methods of self-tuning based on intensified learning
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter
CN110308655A (en) * 2019-07-02 2019-10-08 西安交通大学 Servo system compensation method based on A3C algorithm
CN110376879A (en) * 2019-08-16 2019-10-25 哈尔滨工业大学(深圳) A kind of PID type iterative learning control method neural network based
CN111079936A (en) * 2019-11-06 2020-04-28 中国科学院自动化研究所 Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning
CN111856920A (en) * 2020-07-24 2020-10-30 重庆红江机械有限责任公司 A3C-PID-based self-adaptive rail pressure adjusting method and storage medium
CN112162861A (en) * 2020-09-29 2021-01-01 广州虎牙科技有限公司 Thread allocation method and device, computer equipment and storage medium
CN112631120A (en) * 2019-10-09 2021-04-09 Oppo广东移动通信有限公司 PID control method, device and video coding and decoding system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102588129A (en) * 2012-02-07 2012-07-18 上海艾铭思汽车控制系统有限公司 Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102588129A (en) * 2012-02-07 2012-07-18 上海艾铭思汽车控制系统有限公司 Optimization cooperative control method for discharge of nitrogen oxides and particles of high-pressure common-rail diesel

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
WANG XUE-SONG等: "A Proposal of Adaptive PID Controller Based on Reinforcement Learning", 《JOURNAL OF CHINA UNIVERSITY OF MINING & TECHNOLOGY》 *
张超等: "基于AC-PID控制器的焊接机器人仿真", 《焊接技术》 *
林小峰等: "多目标执行依赖启发式动态规划励磁控制", 《电力系统及其自动化学报》 *
陈学松: "强化学习及其在机器人系统中的应用研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
陈学松等: "基于执行器-评价器学习的自适应PID控制", 《控制理论与应用》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107346138B (en) * 2017-06-16 2020-05-05 武汉理工大学 Unmanned ship lateral control method based on reinforcement learning algorithm
CN107346138A (en) * 2017-06-16 2017-11-14 武汉理工大学 A kind of unmanned boat method for lateral control based on enhancing learning algorithm
CN109063823A (en) * 2018-07-24 2018-12-21 北京工业大学 A kind of intelligent body explores batch A3C intensified learning method in the labyrinth 3D
CN109063823B (en) * 2018-07-24 2022-06-07 北京工业大学 Batch A3C reinforcement learning method for exploring 3D maze by intelligent agent
CN108803348A (en) * 2018-08-03 2018-11-13 北京深度奇点科技有限公司 A kind of optimization method of pid parameter and the optimization device of pid parameter
CN108803348B (en) * 2018-08-03 2021-07-13 北京深度奇点科技有限公司 PID parameter optimization method and PID parameter optimization device
CN109521669A (en) * 2018-11-12 2019-03-26 中国航空工业集团公司北京航空精密机械研究所 A kind of turning table control methods of self-tuning based on intensified learning
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter
CN109696830B (en) * 2019-01-31 2021-12-03 天津大学 Reinforced learning self-adaptive control method of small unmanned helicopter
CN110308655A (en) * 2019-07-02 2019-10-08 西安交通大学 Servo system compensation method based on A3C algorithm
CN110376879A (en) * 2019-08-16 2019-10-25 哈尔滨工业大学(深圳) A kind of PID type iterative learning control method neural network based
CN112631120A (en) * 2019-10-09 2021-04-09 Oppo广东移动通信有限公司 PID control method, device and video coding and decoding system
WO2021068748A1 (en) * 2019-10-09 2021-04-15 Oppo广东移动通信有限公司 Pid control method and apparatus, and video encoding and decoding system
CN112631120B (en) * 2019-10-09 2022-05-17 Oppo广东移动通信有限公司 PID control method, device and video coding and decoding system
CN111079936A (en) * 2019-11-06 2020-04-28 中国科学院自动化研究所 Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning
CN111079936B (en) * 2019-11-06 2023-03-14 中国科学院自动化研究所 Wave fin propulsion underwater operation robot tracking control method based on reinforcement learning
CN111856920A (en) * 2020-07-24 2020-10-30 重庆红江机械有限责任公司 A3C-PID-based self-adaptive rail pressure adjusting method and storage medium
CN112162861A (en) * 2020-09-29 2021-01-01 广州虎牙科技有限公司 Thread allocation method and device, computer equipment and storage medium
CN112162861B (en) * 2020-09-29 2024-04-19 广州虎牙科技有限公司 Thread allocation method, thread allocation device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN108008627B (en) 2022-10-28

Similar Documents

Publication Publication Date Title
CN108008627A (en) A kind of reinforcement learning adaptive PID control method of parallel optimization
Ahamed et al. A reinforcement learning approach to automatic generation control
CN108284442B (en) Mechanical arm flexible joint control method based on fuzzy neural network
Wang Intelligent critic control with robustness guarantee of disturbed nonlinear plants
DE69717987T2 (en) METHOD AND DEVICE FOR SIMULATING DYNAMIC AND STATIONARY PREDICTION, REGULATION AND OPTIMIZATION METHODS
Song et al. Neural-network-based synchronous iteration learning method for multi-player zero-sum games
Koryakovskiy et al. Model-plant mismatch compensation using reinforcement learning
CN110134165B (en) Reinforced learning method and system for environmental monitoring and control
Song et al. Online optimal event-triggered H∞ control for nonlinear systems with constrained state and input
Radac et al. Three-level hierarchical model-free learning approach to trajectory tracking control
EP3704550B1 (en) Generation of a control system for a target system
CN101390024A (en) Operation control method, operation control device and operation control system
CN115167102A (en) Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation
Li et al. Training a robust reinforcement learning controller for the uncertain system based on policy gradient method
Kumar et al. Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming
Wang et al. Asynchronous learning for actor–critic neural networks and synchronous triggering for multiplayer system
Bayramoglu et al. Time-varying sliding-coefficient-based decoupled terminal sliding-mode control for a class of fourth-order systems
Hager et al. Adaptive Neural network control of a helicopter system with optimal observer and actor-critic design
Ornelas-Tellez et al. Neural networks: A methodology for modeling and control design of dynamical systems
CN117970782B (en) Fuzzy PID control method based on fish scale evolution GSOM improvement
Eqra et al. A novel adaptive multi-critic based separated-states neuro-fuzzy controller: Architecture and application to chaos control
US11164077B2 (en) Randomized reinforcement learning for control of complex systems
Gupta et al. Modified grey wolf optimised adaptive super-twisting sliding mode control of rotary inverted pendulum system
CN105279978B (en) Intersection traffic signal control method and equipment
JP7327569B1 (en) Information processing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant