CN113246958A - TD 3-based multi-target HEV energy management method and system - Google Patents
TD 3-based multi-target HEV energy management method and system Download PDFInfo
- Publication number
- CN113246958A CN113246958A CN202110654498.3A CN202110654498A CN113246958A CN 113246958 A CN113246958 A CN 113246958A CN 202110654498 A CN202110654498 A CN 202110654498A CN 113246958 A CN113246958 A CN 113246958A
- Authority
- CN
- China
- Prior art keywords
- battery
- energy management
- soc
- engine
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title claims abstract description 61
- 239000000446 fuel Substances 0.000 claims abstract description 31
- 230000009471 action Effects 0.000 claims abstract description 24
- 230000002787 reinforcement Effects 0.000 claims abstract description 24
- 230000006870 function Effects 0.000 claims description 23
- 239000003795 chemical substances by application Substances 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 5
- 239000004576 sand Substances 0.000 claims description 4
- 238000003860 storage Methods 0.000 claims description 4
- 230000015556 catabolic process Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000006731 degradation reaction Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract description 8
- 230000001133 acceleration Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000011217 control strategy Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000017525 heat dissipation Effects 0.000 description 1
- 230000020169 heat generation Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
- B60W20/10—Controlling the power contribution of each of the prime movers to meet required power demand
- B60W20/13—Controlling the power contribution of each of the prime movers to meet required power demand in order to stay within battery power input or output limits; in order to prevent overcharging or battery depletion
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W10/00—Conjoint control of vehicle sub-units of different type or different function
- B60W10/04—Conjoint control of vehicle sub-units of different type or different function including control of propulsion units
- B60W10/06—Conjoint control of vehicle sub-units of different type or different function including control of propulsion units including control of combustion engines
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W10/00—Conjoint control of vehicle sub-units of different type or different function
- B60W10/24—Conjoint control of vehicle sub-units of different type or different function including control of energy storage means
- B60W10/26—Conjoint control of vehicle sub-units of different type or different function including control of energy storage means for electrical energy, e.g. batteries or capacitors
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W20/00—Control systems specially adapted for hybrid vehicles
- B60W20/10—Controlling the power contribution of each of the prime movers to meet required power demand
- B60W20/15—Control strategies specially adapted for achieving a particular effect
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0043—Signal treatments, identification of variables or parameters, parameter estimation or state estimation
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2710/00—Output or target parameters relating to a particular sub-units
- B60W2710/06—Combustion engines, Gas turbines
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2710/00—Output or target parameters relating to a particular sub-units
- B60W2710/06—Combustion engines, Gas turbines
- B60W2710/0666—Engine torque
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2710/00—Output or target parameters relating to a particular sub-units
- B60W2710/24—Energy storage means
- B60W2710/242—Energy storage means for electrical energy
- B60W2710/244—Charge state
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2710/00—Output or target parameters relating to a particular sub-units
- B60W2710/24—Energy storage means
- B60W2710/242—Energy storage means for electrical energy
- B60W2710/246—Temperature
Landscapes
- Engineering & Computer Science (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
- Automation & Control Theory (AREA)
- Human Computer Interaction (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
- Hybrid Electric Vehicles (AREA)
Abstract
A gradient multi-target HEV energy management method and system based on a double-delay depth certainty strategy are disclosed. The invention innovatively uses a double-delay depth certainty strategy gradient strategy to solve the problem of dimension disaster of a depth reinforcement learning energy management strategy based on a discrete action space and the problem of depth certainty strategy gradient over-estimation. And fuel consumption, battery temperature and battery life (SOH) are taken as optimization targets, and the practical value of the energy management strategy is improved.
Description
Technical Field
The invention relates to a Deep reinforcement learning algorithm for improving fuel economy of a new energy automobile and prolonging service life of a battery, in particular to a parallel Hybrid Electric Vehicle (HEV) multi-target energy management method based on a double delay Deep Deterministic strategy Gradient (TD 3).
Background
Energy crisis and climate change have attracted extensive attention from countries in the world, and fuel consumption and exhaust emission of vehicles are key factors that cannot be ignored. In order to alleviate severe energy crisis and climate change, vehicle motorization is the necessary way for the development of the automotive industry in the future. In new energy vehicles, hybrid vehicles need less fuel than traditional fuel vehicles, and have longer driving range than pure electric vehicles, so that hybrid vehicles become the most effective solution at present. However, the energy management system of the hybrid electric vehicle is very complex, not only needs to properly distribute the power of the engine and the power of the motor, but also needs to comprehensively ensure the driving performance and the economical efficiency of the vehicle, and the energy management method covers the contents in various aspects of energy management of the traditional automobile, the pure electric automobile and the oil-electric hybrid automobile, and becomes the focus of extensive research in the field of domestic and foreign automobiles.
Energy management policies can be largely divided into three categories. a) The rule-based energy management strategy depends on a rule set formulated through professional experience and does not need to predict the driving condition, although the practicability is high, the rule-based energy management cannot achieve optimal control of a vehicle, and the specific driving condition is single. The binary control strategy is a typical rule-based control strategy, which first drives the vehicle with the energy of the battery and switches to engine-driven vehicle when the battery SOC reaches a set minimum value. b) Based on an optimized energy management strategy, such as a dynamic programming strategy (DP), convex optimization, and a genetic algorithm, the optimal control of the vehicle is performed according to the known or predicted vehicle running conditions, so that the optimal or near optimal result of the vehicle under a specific condition cycle can be obtained, but all the running conditions of the vehicle need to be predicted, and the consumed computing resources are large and cannot be used for real-time control. To improve the utility of energy management strategies, real-time online optimization strategies are widely studied, such as Model Predictive Control (MPC), the pent-rieya-gold minimum principle (PMP) and the equivalent fuel consumption strategy (ECMS). However, due to the fact that equivalent fuel consumption of a system is calculated by adopting part of historical information, the historical information cannot necessarily represent future driving states, and the robustness of the algorithm is poor. A better-performing strategy needs to be adopted to make up for the defects of the above algorithm. c) A learning-based energy management policy. Machine Learning (data-driven optimization), in particular to a Deep Reinforcement Learning (Deep Learning) algorithm developed in recent years, provides a powerful research tool for system model and control parameter optimization, road condition feature and driving behavior feature extraction. Among the reinforcement Learning algorithms, discrete motion space reinforcement Learning algorithms such as Q Learning and Deep Q Network (DQN) are most widely used, but the above algorithms are only applicable to discrete and low-dimensional motion spaces, and the HEV energy management control task has a high-dimensional and continuous motion space. The above algorithm requires discretization of the motion space, which inevitably loses important information of the motion space and also constitutes a dimension of disaster (security) problem. The reinforcement learning algorithm of continuous action spaces such as a depth deterministic strategy gradient (DDPG) can well process the continuous action spaces without discretization, but the depth deterministic strategy gradient has an over-estimation problem, an estimated value function is often larger than a real value function, the stability of the energy management strategy is influenced, and the robustness of the algorithm is poor.
Furthermore, current energy management strategies only marginally improve vehicle fuel economy, ignoring the control strategy's impact on battery life. It is well known that the service life of a battery system is closely related to the operating conditions and the temperature of the battery, and that excessive internal temperature of the battery can cause thermal breakdown. The energy management policy must take these important factors into account or else there is no practical value.
Disclosure of Invention
The invention provides a gradient multi-target HEV energy management method and system based on a double-delay depth certainty strategy. The method and the system can well solve the over-estimation problem by using two sets of network representation value functions and a delay updating technology. The fuel consumption, the SOC of the battery, the temperature of the battery and the service life (SOH) of the battery of the vehicle are used as optimization targets, a multi-objective optimization energy management strategy is constructed, the vehicle works in a real optimal State, and the practical value of the energy management strategy is improved.
At least one embodiment of the present invention provides a HEV energy management method, comprising:
establishing a dynamic model, a battery thermal model and a battery service life model of the parallel hybrid electric vehicle, and calculating the fuel consumption rate m of the engine of the three modelsfEngine output torque TengBattery temperature TempThe SOC and the SOH of the battery are taken as control targets;
constructing a dual-delay depth deterministic strategy gradient TD3 network;
taking the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC of the control target as a TD3 state space signal S, taking the engine output torque as a TD3 action space signal A, and making a return function r of TD 3;
acquiring parameters and observation values influencing energy management in vehicle standard working condition driving, wherein the parameters and observation values influencing energy management in vehicle standard working condition driving and the return function r are used for training a TD3 network, so that the TD3 network can make an action A capable of maximizing the return function r according to a received state signal S, and a trained deep reinforcement learning intelligent agent is further obtained;
and acquiring parameters and observation values influencing energy management in the actual running of the vehicle, wherein the parameters and the observation values influencing the energy management in the actual running of the vehicle comprise the fuel consumption rate of an engine, the output torque of the engine, the temperature of a battery, the SOH of the battery and the SOC of the battery which are taken as control targets, and inputting the parameters and the observation values influencing the energy management in the actual running of the vehicle into the trained deep reinforcement learning intelligent body for energy management.
At least one embodiment of the present invention provides a HEV energy management method system, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform all or part of the steps of the method.
At least one embodiment of the invention provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs all or part of the steps of a method as described herein.
The invention adopts a double-delay depth certainty strategy gradient energy management strategy to optimize the power distribution of an engine and a motor and the use condition of a battery, not only can make up the problem of dimension disaster of a discrete action space depth reinforcement learning energy management strategy, but also can solve the problems of depth certainty strategy gradient overestimation and unstable training.
The invention not only optimizes the fuel consumption in the running process of the vehicle and keeps the SOC of the battery in a reasonable range, but also considers the influence of the control strategy on the temperature of the battery and the service life of the battery. A return function is innovatively designed, a multi-target energy management strategy for fuel economy, battery SOC, battery temperature and battery service life is constructed, and the vehicle can be comprehensively optimized in a multi-target mode.
The method collects actual road working condition data and verifies the optimality of the deep reinforcement learning TD3 energy management strategy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments will be briefly described below.
Fig. 1 is a flowchart of a multi-target HEV energy management method based on TD3 according to an embodiment of the present invention.
Fig. 2 is a structural diagram of a parallel hybrid electric vehicle according to an embodiment of the present invention.
Fig. 3 is a basic architecture diagram of an intelligent agent TD3 for deep reinforcement learning according to an embodiment of the present invention.
Fig. 4 is a speed curve of a vehicle under a standard operating condition according to an embodiment of the present invention.
Fig. 5 is a speed curve of a vehicle actually traveling at a certain location according to an embodiment of the present invention.
Detailed Description
Aiming at HEV energy management, the invention innovatively uses a double-delay depth certainty strategy gradient TD3 strategy to solve the problem of dimension disaster of a depth reinforcement learning energy management strategy based on a discrete action space and the problem of depth certainty strategy gradient overestimation. And fuel consumption, battery temperature and battery life (SOH) are taken as optimization targets, and the practical value of the energy management strategy is improved. The multi-target HEV energy management method based on TD3 will be described in detail with reference to fig. 1-5.
Step 1: the parallel hybrid power automobile model is established, the automobile power model is established according to an automobile power equation, the battery thermal model is established according to a battery heat generation and heat dissipation principle, and the battery service life model is established according to a battery capacity attenuation principle. The dynamic characteristics of the battery system can be predicted by combining the thermal model of the battery with the life model of the battery. The fuel consumption rates m of the engines of the three models are comparedfEngine output torque TengBattery temperature TempBattery SOH and battery SOC as control targets;
step 2: respectively constructing a critical network and an Actor network by using a deep neural network, commonly constructing a basic network framework, namely the Actor-critical network, of a double-delay deep deterministic strategy gradient strategy TD3 to construct a multi-target HEV energy management strategy learning network, and initializing and normalizing state data of parameters of the Actor-critical network, wherein the network parameters are shown in a table 2. And taking the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC of the control target as a TD3 state space signal S, taking the engine output torque as a TD3 action space signal A, and establishing a reasonable return function r of TD 3.
And step 3: the method comprises the steps of obtaining parameters and observation values influencing energy management in automobile standard working condition driving, wherein the parameters and the observation values influencing energy management in the automobile standard working condition driving and the return function r are used as control targets, training a basic network of TD3, enabling a TD3 energy management strategy to make an action A capable of maximizing the return function r according to a received state signal S, controlling the automobile to drive in an energy-saving and efficient manner, and further obtaining a trained deep reinforcement learning intelligent agent.
And 4, step 4: and acquiring parameters and observed values influencing energy management in the actual running of the automobile, wherein the parameters and the observed values influencing the energy management in the actual running are input into the trained deep reinforcement learning intelligent agent for energy management, and the parameters and the observed values influencing the energy management in the actual running comprise the fuel consumption rate of the engine, the output torque of the engine, the temperature of the battery, the SOH of the battery and the SOC of the battery which are used as control targets.
FIG. 2 shows a schematic diagram of a parallel hybrid vehicle drive system. In step 1, the automobile dynamic model may be calculated by an automobile dynamic equation, which is shown in formula (1):
wherein, FtIs the driving force for automobile running; ffIs the rolling resistance of the automobile during running; fiIs the slope resistance of the automobile; fωIs the air resistance of the automobile in running; fjIs the acceleration resistance of the automobile; m is the mass of the automobile; g is the acceleration of gravity; f is a rolling resistance coefficient; α is the motorroad grade; ρ is the air density; a is the frontal area of the automobile; cDCoefficient of air resistance; v is the vehicle speed; δ is a rotating mass conversion factor; and a is the running acceleration of the automobile.
The thermal model of the battery is shown as formula (2):
wherein, TempIs the battery temperature; t isambIs ambient temperature; m is the mass of the battery; c is the specific heat capacity of the battery; i is the working current of the battery; OCV is the battery open circuit voltage; v is the working voltage of the battery; h is the natural thermal convection constant.
The battery life model is shown in equation (3):
wherein, N (c)r,Temp) Is the equivalent cycle number before the end of the battery life, and the discharge rate C-rate (C) of the batteryr) And battery temperature (T)emp) Influence, shown by equation (4);
percentage of loss of battery capacity is CnB is an exponential factor, the value of which is given in table 1, R-8.314 is a universal gas constant,z0.55 is the power law coefficient, Ah is the battery throughput, EaIs the activation energy; when the capacity of the battery drops to 20%, the battery reaches the end of life. CnAh and EaIs defined by equation (5):
TABLE 1 relationship between index factor and discharge rate
In step 2, TD3 state space signal is S ═ SOC, mf,Teng,TempSOH), where SOC represents the State of Charge (SOC) of the battery; m isfIs the engine fuel consumption rate; t isengAn engine output torque; t isempIs the battery temperature. The motion space signal is A ═ Teng|Teng∈[-250,841]) (ii) a The reward function is defined by equation (6):
wherein b is an offset used for adjusting the range of the return function; j. the design is a squareiIs a loss function, i represents a time step; s and a represent the states of the ith time step (the engine fuel consumption rate of the control target, and the like), respectivelyEngine output torque, the battery temperature, the battery SOH, and the battery SOC), and action (the engine output torque);representing the fuel consumption rate of the engine; cbRepresents a battery degradation cost; psAnd PtRespectively representing SOC relative to a reference value SOCrefDeviation of (d) and penalty factor for excessive temperature; omega1And ω2Respectively represent PsAnd PtThe weight of the influencing factor. CbCalculated from equation (7):
Cb,i=λΔSOH (7)
where λ is the ratio of the battery replacement cost to the one kilogram fuel cost (n.kittner, f.lill, and d.m.kammen, "Energy storage deployment and innovation for the clean Energy transfer," Nature Energy, vol.2,2017, art.no. 17125.).
SOC relative to a reference value SOCrefThe deviation of (a) and the penalty coefficient for excessive temperatures are determined by equation (8) and equation (9):
therein, SOCref0.6 is the battery SOC reference value, TrefIs a penalty trigger threshold, which may be set at 40 ℃. Tau is1And τ2And adjusting the coefficient to enable the SOC deviation and the over-high temperature penalty coefficient of the battery to be in the same order of magnitude as the fuel consumption rate of the engine.
In step 2, the basic architecture of the dual-delay depth deterministic policy gradient algorithm is shown in fig. 3.
Wherein J represents a loss function, M number of gradient descent samples in batch, and θQAnd thetaμThe parameters of the Critic network and the Actor network are respectively, r represents a return function, and epsilon represents noiseSound, τ represents a soft update factor, y represents a time sequence difference error (TD error), LkIndicating the accumulated error.
The detailed parameters of the deep reinforcement learning TD3 agent are shown in table 2:
TABLE 2 TD3 agent specific parameters
The TD3 energy management policy implementation details are shown in table 3:
TABLE 3 TD3 Algorithm execution steps
Wherein theta isQAnd thetaμAre parameters of the Critic network and the Actor network, respectively. The deep reinforcement learning agent transmits observation signals (including the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC) to an Actor network, and the Actor network outputs a control action a ═ mu (s | theta) through a deterministic strategy function mu(s) and a random noise Nμ) + N. The controlled object obtains a new state s 'and a new reward r by executing an action a, stores (s, a, r, s') into an experience playback area, randomly samples M samples from the experience playback area, and inputs s 'into an Actor network in a target network to obtain a', wherein the action a obtained by the Critic network through the state s and the Actor network obtains a value function Q (s, a) by using Bellman equation learning, and the target Critic network obtains a target Q value Q '(s, a) ═ E [ r (s, a) + gamma Q' (s ', a')]Wherein Q '(s, a) represents a target Q value, s represents an observed quantity at this time, a represents an action selected by an Actor network in the agent, E represents an expectation operation, r (s, a) represents a return obtained under such a state value and action value, γ represents a discount silver, and Q' (s ', a') represents a target Q value of a next state, and the controlled object obtains a new state value s 'by performing the action a and obtains a new state value s' under the intelligent agentThe action a' at the next time instant selected in the volume, the TD error is calculated as follows
Where y represents the approximate equivalent of the target Q value, LkIs the cumulative error, Q(s)j,aj) Is the estimated Q value in the current network. The Actor network parameter in the current network is updated by mapping the state to a specified action through an action value function, and is updated through the gradient back propagation and soft update strategy of the neural network.
In step 3, the deep reinforcement learning agent learns in the process of interacting with the environment (vehicle and road working conditions), and selects the action capable of maximizing the return, but the action selected by the agent at the initial stage is far from the optimal value and can generate unexpected consequences, so that the deep reinforcement learning agent is trained in the standard working condition to obtain more stable intelligent agent hyper-parameters (learning rate, neuron number, network layer number, experience playback zone size, batch gradient sampling size and the like), and then the deep reinforcement learning agent is applied to the actual road working condition. The method comprises the steps of selecting a proper standard working condition, leading the proper standard working condition into a driver model, preprocessing road working condition information by the driver model, inputting speed, acceleration and gradient information of the working condition, and outputting speed, acceleration and total torque demand information required by vehicle running. And in the training process, the hyper-parameters of the TD3 intelligent agent are adjusted according to the vehicle information and the working condition information, so that the aim that the TD3 intelligent agent can quickly and accurately select the optimal control action is fulfilled. The deep reinforcement learning TD3 network can be trained using three typical standard conditions, but is not limited thereto. The speed parameters for the three conditions are shown in fig. 4, and the characteristics of each condition are shown in table 4:
table 4 Standard Condition characteristics
And 4, acquiring actual operation data of the vehicle, making actual road working condition data, importing the actual road working condition data into a driver model, and performing energy management by using the trained deep reinforcement learning intelligent agent. Meanwhile, the trained deep reinforcement learning TD3 energy management strategy can be verified, and the optimality of the energy management strategy can be tested. The actual road speed parameter is shown in fig. 5.
In conclusion, the method provided by the invention not only ensures that the fuel economy is optimal in the driving process of the vehicle, but also ensures that the battery works in a proper temperature range, prolongs the service life of the battery and ensures that the multi-target comprehensive performance of the hybrid vehicle is optimal.
In an exemplary embodiment, there is also provided a dual delay depth certainty strategy gradient-based multi-target HEV energy management system, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform all or part of the steps of the method.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, on which a computer program is stored, which when executed by a processor implements all or part of the steps of the method. For example, the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Claims (4)
1. A HEV energy management method comprising:
establishing a dynamic model, a battery thermal model and a battery service life model of the parallel hybrid electric vehicle, and calculating the fuel consumption rate m of the engine of the three modelsfEngine output torque TengBattery temperature TempThe SOC and the SOH of the battery are taken as control targets;
constructing a dual-delay depth deterministic strategy gradient TD3 network;
taking the engine fuel consumption rate, the engine output torque, the battery temperature, the battery SOH and the battery SOC of the control target as a TD3 state space signal S, taking the engine output torque as a TD3 action space signal A, and making a return function r of TD 3;
acquiring parameters and observation values influencing energy management in vehicle standard working condition driving, wherein the parameters and observation values influencing energy management in vehicle standard working condition driving and the return function r are used for training a TD3 network, so that the TD3 network can make an action capable of maximizing the return function r according to a received state signal S, and a trained deep reinforcement learning intelligent agent is further obtained;
and acquiring parameters and observation values influencing energy management in the actual running of the vehicle, wherein the parameters and the observation values influencing the energy management in the actual running of the vehicle comprise the fuel consumption rate of an engine, the output torque of the engine, the temperature of a battery, the SOH of the battery and the SOC of the battery which are taken as control targets, and inputting the parameters and the observation values influencing the energy management in the actual running of the vehicle into the trained deep reinforcement learning intelligent body for energy management.
2. The HEV energy management method of claim 1, wherein the TD3 state space signal is S ═ S (SOC, m)f,Teng,TempSOH), the motion space signal is a ═ T (T)eng|Teng∈[-250,841]) The reward function is defined by equation (1):
wherein b is an offset used for adjusting the range of the return function; i represents a time step;representing the fuel consumption rate of the engine; cbRepresents a battery degradation cost; psAnd PtRespectively representing SOC relative to a reference value SOCrefDeviation of (d) and penalty factor for excessive temperature; omega1And ω2Respectively represent PsAnd PtThe weight of the influencing factor; cbCalculated from equation (2):
Cb,i=λΔSOH (2)
where λ is the ratio of battery replacement cost to one kilogram of fuel cost;
SOC relative to a reference value SOCrefThe deviation of (a) and the penalty coefficient for excessive temperatures are determined by equation (8) and equation (9):
therein, SOCref0.6 is the battery SOC reference value, TrefIs a penalty trigger threshold, which can be set to 40 deg.C, tau1And τ2And adjusting the coefficient to enable the SOC deviation and the over-high temperature penalty coefficient of the battery to be in the same order of magnitude as the fuel consumption rate of the engine.
3. A HEV energy management method system, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the steps of the method of any one of claims 1-2.
4. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110654498.3A CN113246958B (en) | 2021-06-11 | 2021-06-11 | TD3-based multi-target HEV energy management method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110654498.3A CN113246958B (en) | 2021-06-11 | 2021-06-11 | TD3-based multi-target HEV energy management method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113246958A true CN113246958A (en) | 2021-08-13 |
CN113246958B CN113246958B (en) | 2022-06-14 |
Family
ID=77187634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110654498.3A Active CN113246958B (en) | 2021-06-11 | 2021-06-11 | TD3-based multi-target HEV energy management method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113246958B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114290959A (en) * | 2021-12-30 | 2022-04-08 | 重庆长安新能源汽车科技有限公司 | Power battery active service life control method and system and computer readable storage medium |
CN114852043A (en) * | 2022-03-23 | 2022-08-05 | 武汉理工大学 | A HEV energy management method and system based on tiered reward TD3 |
CN118092150A (en) * | 2023-11-13 | 2024-05-28 | 重庆大学 | Weightless training and testing methods for deep reinforcement learning-based energy management strategies |
CN118842103A (en) * | 2024-09-24 | 2024-10-25 | 湖南理工职业技术学院 | Hybrid energy storage photovoltaic power generation control method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102014222513A1 (en) * | 2014-11-04 | 2016-05-04 | Continental Automotive Gmbh | Method of operating a hybrid or electric vehicle |
CN108216201A (en) * | 2016-12-21 | 2018-06-29 | 株式会社电装 | Controller of vehicle, control method for vehicle and the recording medium for storing vehicle control program |
CN110254418A (en) * | 2019-06-28 | 2019-09-20 | 福州大学 | A hybrid electric vehicle reinforcement learning energy management control method |
CN110341690A (en) * | 2019-07-22 | 2019-10-18 | 北京理工大学 | A PHEV Energy Management Method Based on Deterministic Policy Gradient Learning |
CN112249002A (en) * | 2020-09-23 | 2021-01-22 | 南京航空航天大学 | A heuristic series-parallel hybrid energy management method based on TD3 |
CN112440974A (en) * | 2020-11-27 | 2021-03-05 | 武汉理工大学 | HEV energy management method based on distributed depth certainty strategy gradient |
-
2021
- 2021-06-11 CN CN202110654498.3A patent/CN113246958B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102014222513A1 (en) * | 2014-11-04 | 2016-05-04 | Continental Automotive Gmbh | Method of operating a hybrid or electric vehicle |
CN108216201A (en) * | 2016-12-21 | 2018-06-29 | 株式会社电装 | Controller of vehicle, control method for vehicle and the recording medium for storing vehicle control program |
CN110254418A (en) * | 2019-06-28 | 2019-09-20 | 福州大学 | A hybrid electric vehicle reinforcement learning energy management control method |
CN110341690A (en) * | 2019-07-22 | 2019-10-18 | 北京理工大学 | A PHEV Energy Management Method Based on Deterministic Policy Gradient Learning |
CN112249002A (en) * | 2020-09-23 | 2021-01-22 | 南京航空航天大学 | A heuristic series-parallel hybrid energy management method based on TD3 |
CN112440974A (en) * | 2020-11-27 | 2021-03-05 | 武汉理工大学 | HEV energy management method based on distributed depth certainty strategy gradient |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114290959A (en) * | 2021-12-30 | 2022-04-08 | 重庆长安新能源汽车科技有限公司 | Power battery active service life control method and system and computer readable storage medium |
CN114290959B (en) * | 2021-12-30 | 2023-05-23 | 重庆长安新能源汽车科技有限公司 | Active life control method and system for power battery and computer readable storage medium |
CN114852043A (en) * | 2022-03-23 | 2022-08-05 | 武汉理工大学 | A HEV energy management method and system based on tiered reward TD3 |
CN118092150A (en) * | 2023-11-13 | 2024-05-28 | 重庆大学 | Weightless training and testing methods for deep reinforcement learning-based energy management strategies |
CN118092150B (en) * | 2023-11-13 | 2024-10-22 | 重庆大学 | Weight-free training and testing method for deep reinforcement learning type energy management strategy |
CN118842103A (en) * | 2024-09-24 | 2024-10-25 | 湖南理工职业技术学院 | Hybrid energy storage photovoltaic power generation control method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113246958B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Optimal power management based on Q-learning and neuro-dynamic programming for plug-in hybrid electric vehicles | |
Liu et al. | Online energy management for multimode plug-in hybrid electric vehicles | |
CN107688343B (en) | Energy control method for a hybrid electric vehicle | |
CN113246958A (en) | TD 3-based multi-target HEV energy management method and system | |
Liu et al. | Reinforcement learning optimized look-ahead energy management of a parallel hybrid electric vehicle | |
Zhang et al. | A deep reinforcement learning-based energy management framework with Lagrangian relaxation for plug-in hybrid electric vehicle | |
CN107516107A (en) | A method for classification and prediction of driving conditions of hybrid electric vehicles | |
Gan et al. | Intelligent learning algorithm and intelligent transportation-based energy management strategies for hybrid electric vehicles: A review | |
CN112668799A (en) | Intelligent energy management method and storage medium for PHEV (Power electric vehicle) based on big driving data | |
CN113479186B (en) | Energy management strategy optimization method for hybrid electric vehicle | |
Zhu et al. | Energy management of hybrid electric vehicles via deep Q-networks | |
Ngo | Gear shift strategies for automotive transmissions | |
CN115107733B (en) | Energy management method and system for hybrid electric vehicle | |
CN108819934A (en) | A kind of power distribution control method of hybrid vehicle | |
CN115130266B (en) | Method and system for thermal management control of a vehicle | |
Chen et al. | Human-like energy management based on deep reinforcement learning and historical driving experiences | |
Guo et al. | Clustered energy management strategy of plug-in hybrid electric logistics vehicle based on Gaussian mixture model and stochastic dynamic programming | |
Montazeri-Gh et al. | Driving condition recognition for genetic-fuzzy HEV control | |
Yang et al. | Real-time energy management for a hybrid electric vehicle based on heuristic search | |
Wei et al. | Priority-driven multi-objective model predictive control for integrated motion control and energy management of hybrid electric vehicles | |
Liu | Reinforcement learning-enabled intelligent energy management for hybrid electric vehicles | |
Hu et al. | Supplementary learning control for energy management strategy of hybrid electric vehicles at scale | |
Zare et al. | A knowledge-assisted deep reinforcement learning approach for energy management in hybrid electric vehicles | |
Yang et al. | Real-time adaptive energy management for hybrid electric vehicles based on monte carlo tree search | |
Yuankai et al. | Benchmarking deep reinforcement learning based energy management systems for hybrid electric vehicles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |