[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114852043B - HEV energy management method and system based on layered return TD3 - Google Patents

HEV energy management method and system based on layered return TD3 Download PDF

Info

Publication number
CN114852043B
CN114852043B CN202210298825.0A CN202210298825A CN114852043B CN 114852043 B CN114852043 B CN 114852043B CN 202210298825 A CN202210298825 A CN 202210298825A CN 114852043 B CN114852043 B CN 114852043B
Authority
CN
China
Prior art keywords
energy management
layered
rewards
hev
return
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210298825.0A
Other languages
Chinese (zh)
Other versions
CN114852043A (en
Inventor
颜伏伍
王金海
杜常清
彭辅明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210298825.0A priority Critical patent/CN114852043B/en
Publication of CN114852043A publication Critical patent/CN114852043A/en
Application granted granted Critical
Publication of CN114852043B publication Critical patent/CN114852043B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/11Controlling the power contribution of each of the prime movers to meet required power demand using model predictive control [MPC] strategies, i.e. control methods based on models predicting performance
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W20/00Control systems specially adapted for hybrid vehicles
    • B60W20/10Controlling the power contribution of each of the prime movers to meet required power demand
    • B60W20/15Control strategies specially adapted for achieving a particular effect
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/06Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0062Adapting control system settings
    • B60W2050/0075Automatic parameter input, automatic initialising or calibrating means
    • B60W2050/0095Automatic control mode change
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/06Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot
    • B60W2050/065Improving the dynamic response of the control system, e.g. improving the speed of regulation or avoiding hunting or overshoot by reducing the computational load on the digital processor of the control computer
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2510/00Input parameters relating to a particular sub-units
    • B60W2510/02Clutches
    • B60W2510/0208Clutch engagement state, e.g. engaged or disengaged
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2510/00Input parameters relating to a particular sub-units
    • B60W2510/06Combustion engines, Gas turbines
    • B60W2510/0657Engine torque
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2510/00Input parameters relating to a particular sub-units
    • B60W2510/24Energy storage means
    • B60W2510/242Energy storage means for electrical energy
    • B60W2510/244Charge state
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/80Technologies aiming to reduce greenhouse gasses emissions common to all road transportation technologies
    • Y02T10/84Data processing systems or methods, management, administration

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
  • Hybrid Electric Vehicles (AREA)

Abstract

The invention belongs to the technical field of hybrid electric vehicle energy management, and discloses an HEV energy management method and system based on layered return TD 3. The invention combines the parallel hybrid electric vehicle model to select the state space signal and the action space signal of the HEV energy management strategy, adopts a layered return structure, and the layered return structure comprises two return functions and four regulating layers, so that the control strategy can be pertinently regulated according to different running states of the vehicle, unnecessary repeated exploration behaviors are reduced, and the overall performance of the energy management strategy is improved.

Description

HEV energy management method and system based on layered return TD3
Technical Field
The invention belongs to the technical field of hybrid electric vehicle energy management, and particularly relates to an HEV energy management method and system based on layered return TD 3.
Background
Among all the environmentally friendly vehicles, a Hybrid Electric Vehicle (HEV) enjoys a longer driving distance as compared to a pure electric vehicle, has lower fuel consumption as compared to a conventional fuel vehicle, and is more environmentally friendly. However, hybrid electric vehicle energy management systems are far more complex than conventional fuel vehicles and electric vehicles. Therefore, the energy management strategy (ENERGY MANAGEMENT STRATEGY, EMS) of hybrid vehicles has become a research hotspot in the automotive field.
Existing hybrid vehicle energy management strategies can be divided into three main categories: rule-based policies, optimization-based policies, and learning-based policies. Although rule-based energy management strategies are easy to implement, it is difficult to formulate reasonable rules for very complex operating conditions. The optimization-based energy management policies include global optimization policies and real-time optimization policies. Typical global optimization strategy algorithms are computationally expensive, typically performed off-line, often as a benchmark for evaluating the effectiveness of other online EMS. Real-time optimization strategies such as Pontrisia minimum principle have good optimization efficiency, but a common state (co-state) is difficult to obtain, and the calculated amount is relatively large. Real-time optimization strategies such as equivalent fuel consumption minimization strategies have good real-time characteristics, but historical road information for calculating equivalent fuel consumption often cannot represent future driving conditions, resulting in poor robustness of the algorithm. The key to success of real-time optimization strategies such as model predictive control is a fast prediction and fast optimization strategy, and road conditions need to be predicted in advance, which depends on a model with superior performance to a great extent. The energy management strategy based on the reinforcement learning algorithm Q-learning can greatly improve the fuel economy performance of the vehicle compared with the traditional rule-based strategy, but has the problem of dimension disaster. Depth deterministic strategy Gradient (DDPG) strategy, while it can be trained in environments with continuous or discrete state space and continuous action space, has the problem of overestimating the cost function resulting in incremental bias and sub-optimal strategies. Although the dual-delay depth deterministic strategy Gradient (TD 3) strategy can compensate for the problem of overestimation of the DDPG strategy, the existing energy management method based on TD3 cannot pertinently adjust the control strategy according to different driving states of the vehicle, and the overall performance of the energy management strategy needs to be further improved.
Disclosure of Invention
The invention provides an HEV energy management method and system based on a layered return TD3, which solve the problem that an energy management scheme based on the TD3 cannot pertinently adjust a control strategy according to different running states of a vehicle in the prior art.
The invention provides an HEV energy management method based on a layered return TD3, which comprises the following steps:
establishing a parallel hybrid vehicle model;
selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;
Constructing a layered return function by combining the parallel hybrid vehicle model; the layered reporting function comprises a first reporting function and a second reporting function, the first reporting function or the second reporting function is activated according to an activation condition, and the first reporting function and the second reporting function are respectively divided into two different adjusting layers according to the range of the battery state of charge;
Constructing an HEV energy management learning network based on the layered rewards TD3 based on the state space signal, the action space signal and the layered rewards function;
And training the HEV energy management learning network based on the layered return TD3, and executing an energy management strategy through the trained HEV energy management learning network based on the layered return TD 3.
Preferably, the operation modes of the vehicle in the parallel hybrid vehicle model include an electric-only mode, a neutral mode and a parallel mode.
Preferably, the state space signal is s= (v, SOC, m f, cs), and the motion space signal is a= (T eng|Teng e [ -250, 841 ]); wherein v represents the running speed of the vehicle, SOC represents the battery state of charge, and m f represents the engine fuel consumption rate; cs represents a state of a vehicle clutch, cs represents a clutch opening state, cs represents a clutch closing state, and cs represents a clutch 1; t eng represents the engine output torque.
Preferably, the activation mode of the layered return function is as follows: judging the state of the clutch, if the clutch is opened, activating the first return function, and if the clutch is closed, judging the instantaneous vehicle speed; activating the first return function if the instantaneous vehicle speed is zero, and activating the second return function if the instantaneous vehicle speed is not zero;
The first reporting function R soc is expressed as:
the second reporting function R com is expressed as:
Wherein the first regulating layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second regulating layer L2 corresponds to SOC (t) which is more than or equal to 0.3 and less than or equal to 0.8; m f represents an actual value of the engine fuel consumption rate; SOC ref represents a reference value of the battery state of charge; SOC (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; δ 1 and δ 2 are two weighting factors for balancing the impact of fuel consumption rate and battery state of charge variation on fuel consumption of a vehicle; omega 1、ω2 and omega 3 are three constants that are used to ensure that the values of the per-regulatory layer return functions are all on the same order of magnitude.
Preferably, parameters and observed values affecting energy management in the vehicle standard working condition simulation running are obtained, and the HEV energy management learning network based on the layered return TD3 is trained based on the parameters and observed values under the standard working condition.
Preferably, after obtaining the trained HEV energy management learning network based on the layered rewards TD3, the method further includes: and acquiring parameters and observed values affecting energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and observed values in actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
In another aspect, the present invention provides a HEV energy management system based on a layered rewards TD3, comprising: a processor and a memory; the memory stores a control program that, when executed by the processor, is configured to implement the above-described HEV energy management method based on the hierarchical rewards TD 3.
One or more technical schemes provided by the invention have at least the following technical effects or advantages:
In the invention, the state space signal and the action space signal of the HEV energy management strategy are selected by combining the parallel hybrid electric vehicle model, and a layered return structure is adopted, wherein the layered return structure comprises two return functions and four regulating layers in total, so that the control strategy can be pertinently regulated according to different running states of the vehicle, unnecessary repeated exploration behaviors are reduced, and the overall performance of the energy management strategy is improved. The invention not only solves the problem of discrete action space depth reinforcement learning energy management strategy dimension disaster and depth certainty strategy gradient overestimation by using a layered return double-delay depth deterministic strategy gradient algorithm, but also can improve the optimality of the energy management strategy as the layered return structure with four adjusting layers can pertinently adjust the control strategy according to different working conditions and vehicle running modes.
Drawings
Fig. 1 is a schematic diagram of a hybrid electric vehicle corresponding to an HEV energy management method based on a layered rewards TD3 provided in embodiment 1 of the present invention;
Fig. 2 is a schematic diagram of a hierarchical rewards structure in an HEV energy management method based on a hierarchical rewards TD3 provided in embodiment 1 of the present invention;
Fig. 3 is a basic structure diagram of a deep reinforcement learning TD3 proxy in an HEV energy management method based on a hierarchical rewards TD3 provided in embodiment 1 of the present invention;
FIG. 4 is a graph of standard operating mode speed variation;
Fig. 5 is a graph showing the actual road speed change.
Detailed Description
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
Example 1:
Embodiment 1 provides a hierarchical rewards TD3 based HEV energy management method comprising the steps of:
Step 1: and establishing a parallel hybrid vehicle model.
Specifically, a parallel hybrid vehicle model can be built through MATLAB/Simulink.
In the built parallel hybrid vehicle model, the engine and the motor are connected in parallel, and the engine can be coupled to or decoupled from the wheels through a clutch. The vehicle is mainly operated in an electric-only mode, a neutral mode and a parallel mode. The three modes of operation of the vehicle are dependent upon the state and gear of the clutch and are shown in block diagram form in fig. 1.
The vehicle powertrain must provide the traction required for vehicle travel and can be calculated from the vehicle dynamics equation, which is shown in equation (1):
Wherein F t is the driving force of the automobile, F f is the rolling resistance of the automobile, F i is the gradient resistance of the automobile, F ω is the air resistance of the automobile, F j is the acceleration resistance of the automobile, m is the mass of the automobile, g is the gravitational acceleration, F is the rolling resistance coefficient, α is the gradient of the road of the automobile, ρ is the air density, A is the windward area of the automobile, C D is the air resistance coefficient, v is the speed of the automobile, δ is the conversion coefficient of the rotational mass, and a is the acceleration of the automobile.
Step 2: and selecting a state space signal and an action space signal of an HEV energy management strategy by combining a parallel hybrid vehicle model.
Hybrid vehicle energy management strategies aim to reduce fuel consumption and keep battery SOC within reasonable intervals. And selecting a state space signal of the HEV energy management strategy as S= (v, SOC, m f, cs) according to the control target. Where v denotes the vehicle running speed, SOC denotes the battery state of charge, m f is the engine fuel consumption rate, cs is a boolean value 0 (clutch open) or 1 (clutch closed), and denotes the state of the vehicle clutch. The motion space signal is selected as the engine output torque T eng,A=(Teng|Teng E [ -250, 841 ]), and corresponding state information is acquired through a sensor.
Step 3: and designing a layered reporting structure and formulating a reporting function.
The hierarchical rewards structure is very important for the TD3 energy management strategy. A carefully designed return structure not only can fully utilize the information fed back by the environment, but also can reduce unnecessary repeated exploration behaviors, so that the agent can interact with the environment faster and deeper, the learning process is quickened, and the overall performance of the energy management strategy is improved.
The invention combines the parallel hybrid power vehicle model to construct a layered return function; the layered reporting function comprises a first reporting function and a second reporting function, the first reporting function or the second reporting function is activated according to an activation condition, and the first reporting function and the second reporting function are respectively divided into two different adjusting layers according to the range of the battery state of charge.
Specifically, as described in step 1, the vehicle mainly operates in the electric-only mode, the neutral mode, and the parallel mode. When the vehicle is running in the electric-only mode or the neutral mode, the engine of the vehicle is turned off (electric mode) or the engine is connected with the wheels through the clutch, but the rotating speed (neutral mode) can be changed arbitrarily, and the main energy consumption of the vehicle is the battery, so that we design a return function R soc based on the battery SOC, which means that keeping the SOC value in a reasonable interval is the most main purpose. Accordingly, when the vehicle is running in parallel mode, the engine and the motor simultaneously provide the power required by the vehicle to run, we design a comprehensive return function R com to coordinate fuel consumption and keep the battery SOC within a reasonable range to achieve minimum energy consumption. The two return functions are divided into two different adjusting layers according to the range of the SOC. The structure of the layered rewards function is shown in FIG. 2, with the layered rewards structure activating R soc or R com depending on the clutch state and the instantaneous vehicle speed V spd.
Specifically, referring to fig. 2, the activation manner of the layered reporting function is: judging the state of the clutch, if the clutch is disconnected, activating a first return function, and if the clutch is closed, judging the instantaneous vehicle speed; and if the instantaneous vehicle speed is not zero, activating the second return function.
The first reporting function R soc is expressed as:
the second reporting function R com is expressed as:
Wherein the first regulating layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second regulating layer L2 corresponds to SOC (t) which is more than or equal to 0.3 and less than or equal to 0.8; m f represents an actual value of the engine fuel consumption rate; SOC ref represents a reference value of the battery state of charge; SOC (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; delta 1 and delta 2 are two weighting factors for balancing the effects of fuel consumption rate and battery state of charge change on vehicle fuel consumption, with greater values indicating that the energy management strategy is focusing more on battery protection; omega 1、ω2 and omega 3 are three constants that are used to ensure that the values of the per-regulatory layer return functions are all on the same order of magnitude.
That is, the invention innovatively designs a reasonable and efficient hierarchical reporting structure, and the HEV energy management strategy of the hierarchical reporting TD3 can make an action A capable of maximizing the reporting function R according to the received state signal S, so as to control the vehicle to save energy, stably and efficiently run.
Step 4: based on the state space signal, the action space signal and the layered return function, an HEV energy management learning network based on the layered return TD3 is constructed.
And respectively constructing a Critic network and an Actor network by utilizing the principle of a deep neural network, commonly constructing an Actor-Critic network which is a basic network framework of a dual-delay depth deterministic strategy gradient strategy, constructing an HEV energy management learning network based on layered return TD3 by using a deep reinforcement learning TD3 agent basic structure diagram as shown in a figure 3, and initializing parameters of the Actor-Critic network and normalizing state data. Details of the HEV energy management strategy implementation of the hierarchical rewards TD3 are shown in table 1.
Table 1 hierarchical rewarding TD3 algorithm execution steps
Step 5: parameters and observed values affecting energy management in the standard working condition simulation running of the vehicle are obtained, and the HEV energy management learning network based on the layered return TD3 is trained based on the parameters and observed values under the standard working condition.
The method comprises the steps of obtaining parameters and observation values affecting energy management in the simulated running of the automobile under the standard working condition, and obtaining a trained deep reinforcement learning agent by combining the HEV energy management strategy target training learning network of the layered return TD 3.
The learning network may be trained using three typical standard conditions, but is not limited thereto. The speed parameters for the three conditions are shown in FIG. 4, and the characteristics for each condition are shown in Table 2.
Table 2 standard operating mode characteristics
Step 6: and acquiring parameters and observed values affecting energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters and observed values in actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
For example, the real vehicle driving data in the Wuhan urban area is collected, the actual road working condition is made, the actual road working condition is imported into the driver model, the trained layered return TD3 energy management strategy is verified, the optimization performance of the energy management strategy is tested, and the actual road speed parameter is shown in fig. 5.
Example 2:
Embodiment 2 provides an HEV energy management system based on a tiered rewards TD3, comprising: a processor and a memory; the memory stores a control program that when executed by the processor is configured to implement the hierarchical rewards TD3 based HEV energy management method as described in embodiment 1.
The HEV energy management method and system based on the layered return TD3 provided by the embodiment of the invention at least comprise the following technical effects:
(1) The invention precisely designs a layered return structure which has two return functions and four regulating layers, can pertinently regulate the control strategy according to different running states of the vehicle, reduces unnecessary repeated exploration behaviors, ensures the comprehensive regulation of the return functions for different running modes, avoids the waste of vehicle-mounted operation resources, ensures that an agent can interact with the environment more quickly and deeply, accelerates the learning speed of a deep reinforcement learning intelligent body and improves the integral performance of the energy management strategy.
(2) The invention not only collects the energy consumption index m f and the battery SOC, but also collects the vehicle dynamic index speed v and the clutch state cs as deep reinforcement learning state space signals. The two return functions are switched according to the vehicle speed and the clutch state, and accurate and efficient return functions can be selected according to different running modes of the vehicle. The invention not only can ensure the optimal fuel economy in the running process of the vehicle, but also can ensure that the battery works in a proper SOC (state of charge) interval, prevent the battery from being damaged due to overcharge or overdischarge, and prolong the service life of the battery.
(3) The invention adopts a layered return double-delay depth deterministic strategy gradient energy management strategy, which not only can make up the problem of discrete action space depth reinforcement learning energy management strategy dimension disaster, but also can solve the problems of overestimation of depth deterministic strategy gradient and unstable training.
Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to examples, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention, and all such modifications and equivalents are intended to be encompassed in the scope of the claims of the present invention.

Claims (7)

1. A hierarchical rewards TD3 based HEV energy management method comprising the steps of:
establishing a parallel hybrid vehicle model;
selecting a state space signal and an action space signal of an HEV energy management strategy by combining the parallel hybrid vehicle model;
Constructing a layered return function by combining the parallel hybrid vehicle model; the layered reporting function comprises a first reporting function and a second reporting function, the first reporting function or the second reporting function is activated according to an activation condition, and the first reporting function and the second reporting function are respectively divided into two different adjusting layers according to the range of the battery state of charge;
Constructing an HEV energy management learning network based on the layered rewards TD3 based on the state space signal, the action space signal and the layered rewards function;
And training the HEV energy management learning network based on the layered return TD3, and executing an energy management strategy through the trained HEV energy management learning network based on the layered return TD 3.
2. The hierarchical rewards TD3 based HEV energy management method of claim 1 wherein the operating modes of the vehicle in the parallel hybrid vehicle model include electric only mode, neutral mode and parallel mode.
3. The hierarchical rewards TD3 based HEV energy management method of claim 1 wherein the state space signal is s= (v, SOC, m f, cs) and the action space signal is a= (T eng|Teng e [ -250,841 ]); wherein v represents the running speed of the vehicle, SOC represents the battery state of charge, and m f represents the engine fuel consumption rate; cs represents a state of a vehicle clutch, cs represents a clutch opening state, cs represents a clutch closing state, and cs represents a clutch 1; t eng represents the engine output torque.
4. The HEV energy management method based on layered rewards TD3 of claim 1 wherein the manner of activation of the layered rewards function is: judging the state of the clutch, if the clutch is opened, activating the first return function, and if the clutch is closed, judging the instantaneous vehicle speed; activating the first return function if the instantaneous vehicle speed is zero, and activating the second return function if the instantaneous vehicle speed is not zero;
The first reporting function R soc is expressed as:
the second reporting function R com is expressed as:
Wherein the first regulating layer L1 corresponds to SOC (t) > 0.8 or SOC (t) < 0.3; the second regulating layer L2 corresponds to SOC (t) which is more than or equal to 0.3 and less than or equal to 0.8; m f represents an actual value of the engine fuel consumption rate; SOC ref represents a reference value of the battery state of charge; SOC (t) represents an actual value of the battery state of charge; pen represents a constant penalty factor; δ 1 and δ 2 are two weighting factors for balancing the impact of fuel consumption rate and battery state of charge variation on fuel consumption of a vehicle; omega 1、ω2 and omega 3 are three constants that are used to ensure that the values of the per-regulatory layer return functions are all on the same order of magnitude.
5. The HEV energy management method based on the layered rewards TD3 of claim 1, wherein parameters affecting energy management during simulated driving under standard conditions of the vehicle are obtained, and the HEV energy management learning network based on the layered rewards TD3 is trained based on parameters under standard conditions.
6. The method for HEV energy management based on layered rewards TD3 of claim 1, further comprising, after obtaining the trained HEV energy management learning network based on layered rewards TD 3: and acquiring parameters affecting energy management in actual running of the vehicle, verifying the trained HEV energy management learning network based on the layered return TD3 based on the parameters in actual running, and executing an energy management strategy through the trained and verified HEV energy management learning network based on the layered return TD 3.
7. An HEV energy management system based on a tiered rewards TD3, comprising: a processor and a memory; the memory stores a control program that, when executed by the processor, is configured to implement the hierarchical rewards TD3 based HEV energy management method as claimed in any one of claims 1-6.
CN202210298825.0A 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3 Active CN114852043B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210298825.0A CN114852043B (en) 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210298825.0A CN114852043B (en) 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3

Publications (2)

Publication Number Publication Date
CN114852043A CN114852043A (en) 2022-08-05
CN114852043B true CN114852043B (en) 2024-06-18

Family

ID=82629986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210298825.0A Active CN114852043B (en) 2022-03-23 2022-03-23 HEV energy management method and system based on layered return TD3

Country Status (1)

Country Link
CN (1) CN114852043B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112440974A (en) * 2020-11-27 2021-03-05 武汉理工大学 HEV energy management method based on distributed depth certainty strategy gradient
CN112590774A (en) * 2020-12-22 2021-04-02 同济大学 Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019241612A1 (en) * 2018-06-15 2019-12-19 The Regents Of The University Of California Systems, apparatus and methods to improve plug-in hybrid electric vehicle energy performance by using v2c connectivity
US10733510B2 (en) * 2018-08-24 2020-08-04 Ford Global Technologies, Llc Vehicle adaptive learning
CN113246958B (en) * 2021-06-11 2022-06-14 武汉理工大学 TD 3-based multi-target HEV energy management method and system
CN113501008B (en) * 2021-08-12 2023-05-19 东风悦享科技有限公司 Automatic driving behavior decision method based on reinforcement learning algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112440974A (en) * 2020-11-27 2021-03-05 武汉理工大学 HEV energy management method based on distributed depth certainty strategy gradient
CN112590774A (en) * 2020-12-22 2021-04-02 同济大学 Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN114852043A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN111731303B (en) HEV energy management method based on deep reinforcement learning A3C algorithm
Lü et al. Hybrid electric vehicles: A review of energy management strategies based on model predictive control
Huang et al. Model predictive control power management strategies for HEVs: A review
Poursamad et al. Design of genetic-fuzzy control strategy for parallel hybrid electric vehicles
Zhang et al. Comparative study of energy management in parallel hybrid electric vehicles considering battery ageing
CN106080579B (en) A kind of hybrid electric vehicle complete vehicle control method based on suspension vibration energy regenerating
CN110717218B (en) Electric drive vehicle distributed power drive system reconstruction control method and vehicle
Ouddah et al. From offline to adaptive online energy management strategy of hybrid vehicle using Pontryagin’s minimum principle
CN103935360A (en) Finished hybrid power automobile torque distribution system and method based on parallel control
CN112590760B (en) Double-motor hybrid electric vehicle energy management system considering mode switching frequency
CN112009456A (en) Energy management method for network-connected hybrid electric vehicle
CN113246958B (en) TD 3-based multi-target HEV energy management method and system
CN113815437A (en) Predictive energy management method for fuel cell hybrid electric vehicle
Chen et al. Driving cycle recognition based adaptive equivalent consumption minimization strategy for hybrid electric vehicles
Huang et al. Real-time long horizon model predictive control of a plug-in hybrid vehicle power-split utilizing trip preview
Ganji et al. A study on look-ahead control and energy management strategies in hybrid electric vehicles
Wang et al. Hierarchical rewarding deep deterministic policy gradient strategy for energy management of hybrid electric vehicles
Zeng et al. Cooperative optimization of speed planning and energy management for hybrid electric vehicles based on Nash equilibrium
Wang et al. Deep reinforcement learning with deep-Q-network based energy management for fuel cell hybrid electric truck
CN112440974B (en) HEV energy management method based on distributed depth certainty strategy gradient
CN117922373A (en) Self-adaptive control method for APU (auxiliary Power Unit) of extended range electric automobile at different altitudes
CN114852043B (en) HEV energy management method and system based on layered return TD3
Hou et al. Speed planning and energy management strategy of hybrid electric vehicles in a car-following scenario
Zhou et al. Energy optimization for intelligent hybrid electric vehicles based on hybrid system approach in a car‐following process
Chen et al. Reinforcement learning-based energy management control strategy of hybrid electric vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant