[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115503559B - Fuel cell automobile learning type cooperative energy management method considering air conditioning system - Google Patents

Fuel cell automobile learning type cooperative energy management method considering air conditioning system Download PDF

Info

Publication number
CN115503559B
CN115503559B CN202211385462.0A CN202211385462A CN115503559B CN 115503559 B CN115503559 B CN 115503559B CN 202211385462 A CN202211385462 A CN 202211385462A CN 115503559 B CN115503559 B CN 115503559B
Authority
CN
China
Prior art keywords
fuel cell
expressed
conditioning system
vehicle
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211385462.0A
Other languages
Chinese (zh)
Other versions
CN115503559A (en
Inventor
唐小林
邓磊
甘炯鹏
朱和龙
胡晓松
李佳承
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202211385462.0A priority Critical patent/CN115503559B/en
Publication of CN115503559A publication Critical patent/CN115503559A/en
Application granted granted Critical
Publication of CN115503559B publication Critical patent/CN115503559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L58/00Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles
    • B60L58/30Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles for monitoring or controlling fuel cells
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60HARRANGEMENTS OF HEATING, COOLING, VENTILATING OR OTHER AIR-TREATING DEVICES SPECIALLY ADAPTED FOR PASSENGER OR GOODS SPACES OF VEHICLES
    • B60H1/00Heating, cooling or ventilating [HVAC] devices
    • B60H1/00357Air-conditioning arrangements specially adapted for particular vehicles
    • B60H1/00385Air-conditioning arrangements specially adapted for particular vehicles for vehicles having an electrical drive, e.g. hybrid or fuel cell
    • B60H1/00392Air-conditioning arrangements specially adapted for particular vehicles for vehicles having an electrical drive, e.g. hybrid or fuel cell for electric vehicles having only electric drive means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60HARRANGEMENTS OF HEATING, COOLING, VENTILATING OR OTHER AIR-TREATING DEVICES SPECIALLY ADAPTED FOR PASSENGER OR GOODS SPACES OF VEHICLES
    • B60H1/00Heating, cooling or ventilating [HVAC] devices
    • B60H1/32Cooling devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L58/00Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles
    • B60L58/10Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles for monitoring or controlling batteries
    • B60L58/12Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles for monitoring or controlling batteries responding to state of charge [SoC]

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Sustainable Development (AREA)
  • Sustainable Energy (AREA)
  • Power Engineering (AREA)
  • Transportation (AREA)
  • Physics & Mathematics (AREA)
  • Thermal Sciences (AREA)
  • Fuel Cell (AREA)

Abstract

The invention relates to a fuel cell automobile learning type cooperative energy management method considering an air conditioning system, and belongs to the field of new energy automobiles. The method comprises the following steps: s1: acquiring vehicle state parameter information, fuel cell parameter information, power cell parameter information and air conditioning system parameter information of a fuel cell automobile; s2: establishing a fuel cell automobile cooperative energy management model; s3: the method comprises the steps of establishing a fuel cell automobile collaborative energy management optimization control strategy considering an air conditioning system, solving a multi-objective optimization problem comprising hydrogen economy and cabin temperature comfort by combining an SAC algorithm, and controlling the change of refrigerating/heating capacity of an air conditioner to maintain the cabin temperature in a comfort zone while performing energy flow optimization control. The invention can effectively solve the problem of compromise between hydrogen energy consumption and cabin temperature comfort, and optimize the hydrogen economy and cabin temperature comfort of the fuel cell automobile.

Description

考虑空调系统的燃料电池汽车学习型协同能量管理方法Learning-based collaborative energy management method for fuel cell vehicles considering air conditioning system

技术领域Technical Field

本发明属于新能源汽车领域,涉及一种考虑空调系统的燃料电池汽车学习型协同能量管理方法。The present invention belongs to the field of new energy vehicles and relates to a learning-based collaborative energy management method for a fuel cell vehicle taking into account an air-conditioning system.

背景技术Background Art

面对日益严峻的生态环境污染和化石燃料匮乏等问题,各大汽车厂商争相开始研发新能源汽车。随着燃料电池技术的发展,燃料电池汽车充分发挥着零排放、低能耗、强续航的优势,被认为是实现未来汽车可持续发展的重要研究方向之一。能量管理策略是燃料电池汽车多动力源系统的核心控制技术,其性能的优劣直接决定了整车的经济性能。当前研究中,能量管理方法主要分为三种类型:基于规则、基于优化与基于学习的能量管理策略。然而,基于规则和基于优化的能量管理方法面临着无法同时满足实时性和最优性的困境;对于传统深度强化学习算法而言,虽然能同时实现能量流优化的实时性和最优性,但在训练数据以及超参数设置方面存在一定不足。为此,软约束演员评论家算法的提出为解决以上难题提供了一种方法。Faced with increasingly severe ecological environmental pollution and fossil fuel shortages, major automobile manufacturers are scrambling to develop new energy vehicles. With the development of fuel cell technology, fuel cell vehicles give full play to the advantages of zero emissions, low energy consumption, and strong endurance, and are considered to be one of the important research directions for achieving sustainable development of future automobiles. Energy management strategy is the core control technology of the multi-power source system of fuel cell vehicles, and its performance directly determines the economic performance of the entire vehicle. In current research, energy management methods are mainly divided into three types: rule-based, optimization-based, and learning-based energy management strategies. However, rule-based and optimization-based energy management methods face the dilemma of being unable to simultaneously meet real-time and optimality; for traditional deep reinforcement learning algorithms, although they can simultaneously achieve real-time and optimality of energy flow optimization, they have certain deficiencies in training data and hyperparameter settings. To this end, the soft-constrained actor-critic algorithm provides a method to solve the above problems.

另一方面,空调系统作为燃料电池汽车必不可少的辅助设备,有助于为车内乘员提供舒适的乘坐环境。然而,空调系统的使用必然会增加燃料电池汽车的能耗,从而对整车的经济性能产生影响。在现今燃料电池汽车能量管理方法研究中,空调系统的能量消耗量通常视为定值或被忽略不计。但是,由于驾驶环境的变化,驾驶舱内外热交换量会随之变化,也就使得空调系统使用功率会发生变化。On the other hand, the air conditioning system, as an indispensable auxiliary equipment for fuel cell vehicles, helps to provide a comfortable riding environment for the passengers in the car. However, the use of the air conditioning system will inevitably increase the energy consumption of fuel cell vehicles, thus affecting the economic performance of the whole vehicle. In the current research on energy management methods for fuel cell vehicles, the energy consumption of the air conditioning system is usually regarded as a constant or ignored. However, due to changes in the driving environment, the heat exchange amount inside and outside the cockpit will change accordingly, which will also cause the power used by the air conditioning system to change.

因此,亟需一种新型的燃料电池汽车能量管理方法来协调控制空调系统和动力源部件,在考虑空调系统能耗变化的同时负责优化车辆中的能量流。Therefore, a new fuel cell vehicle energy management method is urgently needed to coordinate the control of the air-conditioning system and the power source components, and to optimize the energy flow in the vehicle while taking into account the changes in the energy consumption of the air-conditioning system.

发明内容Summary of the invention

有鉴于此,本发明的目的在于提供一种考虑空调系统的燃料电池汽车学习型协同能量管理方法,通过运用软约束演员评论家(Soft actor critic,SAC)算法协调控制燃料电池汽车的空调系统和动力源部件,在保证舱室舒适性的同时优化整车能量流,以降低燃料电池汽车的整车能耗。In view of this, the purpose of the present invention is to provide a learning-based collaborative energy management method for fuel cell vehicles taking into account the air-conditioning system. By using a soft actor critic (SAC) algorithm to coordinate the control of the air-conditioning system and power source components of the fuel cell vehicle, the energy flow of the whole vehicle is optimized while ensuring the comfort of the cabin, so as to reduce the energy consumption of the whole fuel cell vehicle.

为达到上述目的,本发明提供如下技术方案:In order to achieve the above object, the present invention provides the following technical solutions:

一种考虑空调系统的燃料电池汽车学习型协同能量管理方法,具体包括以下步骤:A learning-based collaborative energy management method for a fuel cell vehicle considering an air conditioning system comprises the following steps:

S1:获取燃料电池汽车的车辆状态参数信息、燃料电池参数信息、动力电池参数信息以及空调系统参数信息;S1: Obtain vehicle status parameter information, fuel cell parameter information, power battery parameter information and air conditioning system parameter information of the fuel cell vehicle;

S2:建立燃料电池汽车协同能量管理模型,包括:整车纵向动力学模型、燃料电池模型、动力电池模型、电机模型、空调系统模型和车舱热负荷模型;S2: Establish a fuel cell vehicle collaborative energy management model, including: vehicle longitudinal dynamics model, fuel cell model, power battery model, motor model, air conditioning system model and cabin heat load model;

S3:建立考虑空调系统的燃料电池汽车协同能量管理优化控制策略,结合SAC算法求解包含燃氢经济性和舱室温度舒适性的多目标优化问题,在进行能量流优化控制的同时,控制空调制冷/制热容量的变化以维持舱室温度处于舒适区间;所述SAC算法是软约束演员评论家算法。S3: Establish a fuel cell vehicle collaborative energy management optimization control strategy considering the air-conditioning system, and combine the SAC algorithm to solve the multi-objective optimization problem including hydrogen fuel economy and cabin temperature comfort. While performing energy flow optimization control, control the change of air-conditioning cooling/heating capacity to maintain the cabin temperature in a comfortable range; the SAC algorithm is a soft-constrained actor-critic algorithm.

进一步,步骤S1中,所述车辆状态参数信息包括:车速、车舱热负荷参数、电机运行效率以及传动系统特性参数;所述燃料电池参数信息包括:燃料电池的功率、效率以及氢能消耗量;所述动力电池参数信息包括:动力电池的荷电状态、内阻以及开路电压;所述空调系统参数信息包括:空调系统冷却容量/制热容量以及相应的功率。Further, in step S1, the vehicle status parameter information includes: vehicle speed, cabin heat load parameters, motor operating efficiency and transmission system characteristic parameters; the fuel cell parameter information includes: fuel cell power, efficiency and hydrogen energy consumption; the power battery parameter information includes: power battery state of charge, internal resistance and open circuit voltage; the air-conditioning system parameter information includes: air-conditioning system cooling capacity/heating capacity and corresponding power.

进一步,步骤S2中,建立的整车纵向动力学模型为:Further, in step S2, the longitudinal dynamics model of the whole vehicle is established as:

Pdrive=(Fair+Ff+Fi+m0a)·vP drive =(F air +F f +F i +m 0 a)·v

Figure BDA0003929529130000021
Figure BDA0003929529130000021

Pdem=Pb+Pfc·ηDC/DC PdemPb + Pfc · ηDC/DC

其中,m0表示整车质量;v表示为整车车速;a表示车辆加速度;Fair表示为空气阻力;Ff表示为滚动阻力;Fi表示为加速阻力;ηm、ηDC/AC、ηDC/DC以及ηmotor分别表示传动效率、DC/AC转换器效率、DC/DC转换器效率以及电机效率;Pdrive、Pdem、Pb以及Pfc分别表示车辆车轮处驱动功率、需求功率以及电池输出功率、燃料电池输出功率。Among them, m0 represents the vehicle mass; v represents the vehicle speed; a represents the vehicle acceleration; F air represents the air resistance; F f represents the rolling resistance; Fi represents the acceleration resistance; η m , η DC/AC , η DC/DC and η motor represent the transmission efficiency, DC/AC converter efficiency, DC/DC converter efficiency and motor efficiency respectively; P drive , P dem , P b and P fc represent the driving power at the vehicle wheels, the required power, the battery output power and the fuel cell output power respectively.

进一步,步骤S2中,建立的燃料电池模型为:Further, in step S2, the fuel cell model established is:

ηfc=fη(Pfc)η fc =f η (P fc )

Figure BDA0003929529130000022
Figure BDA0003929529130000022

其中,fη(·)和

Figure BDA0003929529130000023
分别表示为效率和氢能消耗量的拟合函数,可通过插值法计算效率与氢耗。Among them, f η (·) and
Figure BDA0003929529130000023
They are respectively expressed as fitting functions of efficiency and hydrogen energy consumption, and the efficiency and hydrogen consumption can be calculated by interpolation method.

进一步,步骤S2中,建立的动力电池模型为:Further, in step S2, the power battery model established is:

Figure BDA0003929529130000024
Figure BDA0003929529130000024

Figure BDA0003929529130000031
Figure BDA0003929529130000031

其中,IL表示为动力电池电流;Voc表示为动力电池开路电压;Rin表示为动力电池等效内阻;SOC0表示为初始SOC;Qt表示为动力电池最大容量;t0表示为初始时刻;tf表示为最终时刻。Among them, I L represents the power battery current; V oc represents the power battery open circuit voltage; R in represents the power battery equivalent internal resistance; SOC 0 represents the initial SOC; Q t represents the maximum capacity of the power battery; t 0 represents the initial time; t f represents the final time.

进一步,步骤S2中,建立的电机模型为:Further, in step S2, the motor model established is:

ηm=fmm,Tm)η m = f mm , T m )

Figure BDA0003929529130000032
Figure BDA0003929529130000032

其中,ωm和Tm分别表示电机转速和转矩;Pm表示为电机输出功率,fm(·)表示为电机工作效率的拟合函数,通过插值法可得到电机的工作效率。Wherein, ω m and T m represent the motor speed and torque respectively; P m represents the motor output power, and f m (·) represents the fitting function of the motor working efficiency. The motor working efficiency can be obtained by interpolation method.

进一步,步骤S2中,建立的空调系统模型为:Further, in step S2, the air conditioning system model established is:

Figure BDA0003929529130000033
Figure BDA0003929529130000033

其中,Qac表示为空调系统的制冷容量或制热容量;Pac表示为空调系统相应功耗;ηcop表示为空调系统性能系数。Wherein, Q ac represents the cooling capacity or heating capacity of the air-conditioning system; P ac represents the corresponding power consumption of the air-conditioning system; η cop represents the performance coefficient of the air-conditioning system.

进一步,步骤S2中,建立的车舱热负荷模型为:Further, in step S2, the cabin heat load model established is:

Qc=∑KF(Tout-Tin)Q c = ∑ KF (T out -T in )

Figure BDA0003929529130000034
Figure BDA0003929529130000034

Qh=145+116nQ h =145+116n

Qn=meξCpair(Tout-Tin)Q n =m e ξCp air (T out -T in )

Figure BDA0003929529130000035
Figure BDA0003929529130000035

其中,Qc、Qr、Qh以及Qn分别表示热传导负荷、辐射热负荷、车内人员产生热量(根据经验,驾驶员产生的热量约为145W,每位乘客约产生116W的热量)以及通风系统热负荷;K表示为传热系数;F表示为相应外壳的传热面积;Tout表示为环境温度;Tin表示为舱内空气温度;η表示为渗透率;I表示为太阳光光强大小;Ai表示为挡风玻璃、左右侧窗以及后窗面积;θi表示为太阳光入射角;β表示为阴影因子;n表示为车内乘客人数;me表示为通过蒸发器的空气质量;ξ表示为空气再循环系数;Cpair表示为室内空气热容;ρair和Vair分别表示为车舱内空气密度和车舱体积。Among them, Qc , Qr , Qh and Qn represent the heat conduction load, radiation heat load, heat generated by the occupants (according to experience, the heat generated by the driver is about 145W, and each passenger generates about 116W of heat) and ventilation system heat load respectively; K represents the heat transfer coefficient; F represents the heat transfer area of the corresponding shell; Tout represents the ambient temperature; Tin represents the cabin air temperature; η represents the permeability; I represents the solar light intensity; Ai represents the area of the windshield, left and right side windows and rear window; θi represents the solar incident angle; β represents the shadow factor; n represents the number of passengers in the car; me represents the air mass passing through the evaporator; ξ represents the air recirculation coefficient; Cp air represents the indoor air heat capacity; ρ air and V air represent the cabin air density and cabin volume respectively.

进一步,步骤S3中,建立考虑空调系统的燃料电池汽车协同能量管理优化控制策略,具体包括以下步骤:Further, in step S3, a fuel cell vehicle collaborative energy management optimization control strategy considering the air conditioning system is established, which specifically includes the following steps:

S301:确定状态空间:为能反映关键环境信息,将动力电池SOC、燃料电池输出功率Pfc、车辆速度v、空调系统的制冷/制热容量Qac设置为状态变量,构建状态空间S,可表示为:S301: Determine the state space: In order to reflect key environmental information, the power battery SOC, fuel cell output power P fc , vehicle speed v, and air conditioning system cooling/heating capacity Q ac are set as state variables to construct the state space S, which can be expressed as:

S={SOC,Pfc,v,Qac}S={SOC, Pfc ,v, Qac }

S302:确定动作空间:考虑空调系统的协同能量管理不但分配动力源功率,还应根据空调系统制冷/制热容量的变化维持舱室温度的热舒适性,为此,将燃料电池输出功率变化量

Figure BDA0003929529130000047
和空调系统制冷/制热容量变化量
Figure BDA0003929529130000046
设置为动作变量,构建动作空间A,可表示为:S302: Determine the action space: Considering the coordinated energy management of the air conditioning system, not only the power of the power source is allocated, but also the thermal comfort of the cabin temperature should be maintained according to the change of the cooling/heating capacity of the air conditioning system. To this end, the change of the fuel cell output power
Figure BDA0003929529130000047
and the change in cooling/heating capacity of the air conditioning system
Figure BDA0003929529130000046
Set as action variable and construct action space A, which can be expressed as:

Figure BDA0003929529130000041
Figure BDA0003929529130000041

S303:建立奖励函数:为保证舱室温度舒适性,将车舱室内温度维持在24℃左右,为此奖励函数中还应包括舱室温度变化这一优化项,于是将奖励函数R设置为氢能消耗量、SOC变化和舱室温度变化三个指标的加权求和,表示为:S303: Establishing a reward function: To ensure the cabin temperature comfort, the cabin temperature is maintained at about 24°C. Therefore, the reward function should also include the optimization item of cabin temperature change. Therefore, the reward function R is set to the weighted sum of the three indicators of hydrogen energy consumption, SOC change and cabin temperature change, expressed as:

R=-(ζ·fuel(t)+ψ·(SOC(t)-0.7)2+γ·(Tin-24)2)R=-(ζ·fuel(t)+ψ·(SOC(t)-0.7) 2 +γ·(T in -24) 2 )

其中,ζ、Ψ、γ为各优化项权重因子,通过调节权重因子来解决氢能消耗和舱室温度舒适性之间的折中问题,从而求解多目标优化问题;fuel(t)表示当前时刻下的氢能消耗量;SOC(t)表示当前时刻下的动力电池荷电状态。Among them, ζ, Ψ, and γ are weight factors of each optimization item. The trade-off between hydrogen energy consumption and cabin temperature comfort is solved by adjusting the weight factors, thereby solving the multi-objective optimization problem; fuel(t) represents the hydrogen energy consumption at the current moment; SOC(t) represents the state of charge of the power battery at the current moment.

进一步,步骤S3中,结合SAC算法求解包含燃氢经济性和舱室温度舒适性的多目标优化问题,具体包括以下步骤:Further, in step S3, the multi-objective optimization problem including hydrogen fuel economy and cabin temperature comfort is solved in combination with the SAC algorithm, which specifically includes the following steps:

S311:结合SAC算法求解能量管理中多目标优化问题,在SAC算法中引入了动作熵值使得动作输出更加分散,进而提升算法的探索能力、学习新任务能力以及稳定性,熵值表示为:S311: Combined with the SAC algorithm to solve the multi-objective optimization problem in energy management, the action entropy value is introduced into the SAC algorithm to make the action output more dispersed, thereby improving the algorithm's exploration ability, ability to learn new tasks and stability. The entropy value is expressed as:

H(π(·|st))=-logπ(·|st)H(π(·|s t ))=-logπ(·|s t )

其中,H为策略π(·|st)的熵。Where H is the entropy of the strategy π(·|s t ).

S312:求解过程中,智能体中演员网络以状态st作为输入,输出动作高斯分布的均值和方差,利用重参数化技术生成动作atS312: During the solution process, the actor network in the agent takes the state s t as input, outputs the mean and variance of the action Gaussian distribution, and generates the action a t using the reparameterization technique:

Figure BDA0003929529130000042
Figure BDA0003929529130000042

其中,τt表示从标准正态分布中采样的噪声信号;

Figure BDA0003929529130000043
表示函数输出均值和方差;
Figure BDA0003929529130000044
Figure BDA0003929529130000045
分别表示高斯分布的均值和方差。Where τ t represents the noise signal sampled from the standard normal distribution;
Figure BDA0003929529130000043
Represents the function output mean and variance;
Figure BDA0003929529130000044
and
Figure BDA0003929529130000045
represent the mean and variance of the Gaussian distribution respectively.

S313:执行动作at后,车辆环境向智能体反馈奖励rt,并转移到下一状态st+1,即可生成环境与智能体的交互数据{st,at,rt,st+1},并储存于经验池

Figure BDA0003929529130000051
中。S313: After executing action a t , the vehicle environment feeds back a reward r t to the agent and transfers to the next state s t+1 , thus generating the interaction data {s t , a t , r t , s t+1 } between the environment and the agent and storing it in the experience pool
Figure BDA0003929529130000051
middle.

S314:随机从经验池中抽取小批量经验样本,为避免最大化动作状态函数值时的高估以及利用自身网络计算目标时的进一步高估,引入参数为θ12的评估评论家网络以及参数为θ′1,θ′2的目标评论家网络,选择目标评论家网络输出较小的动作状态函数值作为目标值;针对特定状态st和动作at,SAC算法中软约束动作值函数Qsoft(st,at)更新公式如下:S314: Randomly extract a small batch of experience samples from the experience pool. In order to avoid overestimation when maximizing the action state function value and further overestimation when using the own network to calculate the target, introduce an evaluation critic network with parameters θ 1 , θ 2 and a target critic network with parameters θ′ 1 , θ′ 2. Select the target critic network to output a smaller action state function value as the target value. For a specific state s t and action a t , the update formula of the soft constraint action value function Q soft (s t ,a t ) in the SAC algorithm is as follows:

Figure BDA0003929529130000052
Figure BDA0003929529130000052

其中,r表示车辆获得的奖励;γ表示折扣因子;α表示温度系数。Among them, r represents the reward obtained by the vehicle; γ represents the discount factor; α represents the temperature coefficient.

S315:更新策略网络时,通过最小化损失函数L(θi)更新评估评论家网络,损失函数定义为

Figure BDA0003929529130000053
Figure BDA0003929529130000054
之间的均方误差,表示为:S315: When updating the policy network, the evaluation critic network is updated by minimizing the loss function L(θ i ), which is defined as
Figure BDA0003929529130000053
and
Figure BDA0003929529130000054
The mean square error between them is expressed as:

Figure BDA0003929529130000055
Figure BDA0003929529130000055

Figure BDA0003929529130000056
Figure BDA0003929529130000056

其中,

Figure BDA0003929529130000057
表示为评估评论家网络参数为θi时的评价函数,而
Figure BDA0003929529130000058
表为目标评论家网络参数为θ′i时的评价函数。in,
Figure BDA0003929529130000057
It is expressed as the evaluation function for evaluating the critic network parameter θ i , and
Figure BDA0003929529130000058
The table shows the evaluation function when the target critic network parameter is θ′ i .

S316:演员网络参数更新是通过最小化KL散度实现,KL值越小,输出动作对应的奖励之间的差异越小,则策略的收敛效果越好;演员网络的目标函数

Figure BDA0003929529130000059
定义为:S316: The actor network parameter update is achieved by minimizing the KL divergence. The smaller the KL value, the smaller the difference between the rewards corresponding to the output actions, and the better the convergence effect of the strategy; the objective function of the actor network
Figure BDA0003929529130000059
Defined as:

Figure BDA00039295291300000510
Figure BDA00039295291300000510

其中,DKL表示KL散度计算表达式;Z(st)是配分函数,用于归一化分布;

Figure BDA00039295291300000511
表示当前时刻下车辆状态st、执行动作at时的数学期望函数,
Figure BDA00039295291300000512
表示当前状态为st时的策略函数,
Figure BDA00039295291300000513
表示为策略函数的参数。Where D KL represents the KL divergence calculation expression; Z(s t ) is the partition function used to normalize the distribution;
Figure BDA00039295291300000511
represents the mathematical expectation function of the vehicle state s t at the current moment and the action a t ,
Figure BDA00039295291300000512
represents the policy function when the current state is s t ,
Figure BDA00039295291300000513
Represented as parameters of the policy function.

S317:按照梯度下降法更新演员网络参数,表示为:S317: Update the actor network parameters according to the gradient descent method, expressed as:

Figure BDA00039295291300000514
Figure BDA00039295291300000514

其中,

Figure BDA00039295291300000515
表示为关于策略函数参数
Figure BDA00039295291300000516
的下降梯度,
Figure BDA00039295291300000517
表示为关于当前时刻t下执行动作at的下降梯度。in,
Figure BDA00039295291300000515
Expressed as the policy function parameters
Figure BDA00039295291300000516
The descent gradient of
Figure BDA00039295291300000517
It is expressed as the descent gradient of executing action a t at the current time t.

S318:在SAC算法体系中,温度系数α的调节对于SAC算法训练效果至关重要,在不同的强化学习任务及训练时期,最佳温度系数的取值均不同。为实现温度系数的自动调节,通过对优化问题中目标函数求解最小值,即能更新得到每步最佳温度系数,目标函数表示为:S318: In the SAC algorithm system, the adjustment of the temperature coefficient α is crucial to the training effect of the SAC algorithm. In different reinforcement learning tasks and training periods, the value of the optimal temperature coefficient is different. In order to achieve automatic adjustment of the temperature coefficient, the optimal temperature coefficient of each step can be updated by solving the minimum value of the objective function in the optimization problem. The objective function is expressed as:

Figure BDA0003929529130000061
Figure BDA0003929529130000061

其中,H0表示预先定义的最小策略熵的阈值,

Figure BDA0003929529130000062
表示为依据策略函数πt执行动作at时的数学期望函数,πt(at|st)表示为策略函数,st表示为当前时刻t下燃料电池汽车所处的状态,at则表示为当前时刻t时依据策略函数执行的动作。Where H 0 represents the predefined threshold of the minimum policy entropy,
Figure BDA0003929529130000062
is represented as the mathematical expectation function when action a t is executed according to the strategy function π t , π t (a t |s t ) is represented as the strategy function, s t is represented as the state of the fuel cell vehicle at the current time t, and a t is represented as the action executed according to the strategy function at the current time t.

本发明的有益效果在于:The beneficial effects of the present invention are:

1)本发明设计了一种基于软约束演员评论家算法的能量管理策略,有效摆脱了传统深度强化学习算法在燃料电池汽车能量管理应用中对训练数据以及超参数设置的依赖性,有利于提高连续动作空间下控制任务的稳定性。1) The present invention designs an energy management strategy based on the soft-constrained actor-critic algorithm, which effectively gets rid of the dependence of traditional deep reinforcement learning algorithms on training data and hyperparameter settings in fuel cell vehicle energy management applications, and is conducive to improving the stability of control tasks in continuous action space.

2)考虑到在燃料电池汽车能量管理问题设计时通常忽略了空调系统能耗的变化,为此,本发明以氢能消耗、SOC维持以及舱室温度舒适性为优化目标,搭建了计及空调系统的协同能量管理优化控制框架,实现了能量管理与空调系统的协同控制。2) Considering that the changes in energy consumption of the air-conditioning system are usually ignored when designing the energy management problem of fuel cell vehicles, the present invention takes hydrogen energy consumption, SOC maintenance and cabin temperature comfort as optimization goals, builds a collaborative energy management optimization control framework taking the air-conditioning system into account, and realizes the collaborative control of energy management and air-conditioning system.

本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objectives and features of the present invention will be described in the following description to some extent, and to some extent, will be obvious to those skilled in the art based on the following examination and study, or can be taught from the practice of the present invention. The objectives and other advantages of the present invention can be realized and obtained through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作优选的详细描述,其中:In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be described in detail below in conjunction with the accompanying drawings, wherein:

图1为本发明的燃料电池汽车协同能量管理方法的流程图;FIG1 is a flow chart of a fuel cell vehicle collaborative energy management method according to the present invention;

图2为燃料电池汽车多动力源系统结构示意图;FIG2 is a schematic diagram of the structure of a multi-power source system for a fuel cell vehicle;

图3为车舱热负荷模型和空调系统结构示意图;FIG3 is a schematic diagram of a cabin heat load model and an air conditioning system structure;

图4为本发明中应用SAC算法搭建的计及空调系统的协同能量管理框架图。FIG4 is a diagram of a collaborative energy management framework taking into account the air conditioning system and constructed by applying the SAC algorithm in the present invention.

具体实施方式DETAILED DESCRIPTION

以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The following describes the embodiments of the present invention by specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed in various ways based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the illustrations provided in the following embodiments only illustrate the basic concept of the present invention in a schematic manner, and the following embodiments and the features in the embodiments can be combined with each other without conflict.

请参阅图1~图4,本发明基于软约束演员评论家算法设计了一种计及空调系统的燃料电池汽车协同能量管理优化方法。考虑到燃料电池汽车能量管理中通常忽略空调系统的能耗变化,因此分析了车辆舱室内温度舒适性的主要影响因素,建立了空调系统模型与车舱热负荷模型,以氢耗、SOC维持和舱室温度为优化目标,通过应用适用于连续动作空间下控制任务的软约束演员评论家算法,搭建了计及空调系统的协同能量管理优化控制框架,实现了能量管理与空调系统的协同控制,优化了燃料电池汽车的燃氢经济性以及舱室温度舒适性。如图1所示,该能量管理协同优化方法具体包括以下步骤:Please refer to Figures 1 to 4. The present invention designs a collaborative energy management optimization method for fuel cell vehicles taking into account the air-conditioning system based on the soft-constrained actor-critic algorithm. Considering that the energy consumption changes of the air-conditioning system are usually ignored in the energy management of fuel cell vehicles, the main influencing factors of the temperature comfort in the vehicle cabin are analyzed, and the air-conditioning system model and the cabin heat load model are established. Taking hydrogen consumption, SOC maintenance and cabin temperature as optimization goals, a collaborative energy management optimization control framework taking into account the air-conditioning system is established by applying the soft-constrained actor-critic algorithm suitable for control tasks in a continuous action space, which realizes the collaborative control of energy management and air-conditioning system, and optimizes the hydrogen combustion economy and cabin temperature comfort of fuel cell vehicles. As shown in Figure 1, the energy management collaborative optimization method specifically includes the following steps:

S1:获取燃料电池汽车关键参数信息,包括:S1: Obtain key parameter information of fuel cell vehicles, including:

车辆状态参数信息包括:车速、车舱热负荷参数、电机运行效率以及传动系统特性参数;The vehicle status parameter information includes: vehicle speed, cabin heat load parameters, motor operating efficiency and transmission system characteristic parameters;

燃料电池参数信息包括:燃料电池的功率、效率以及氢能消耗量;Fuel cell parameter information includes: fuel cell power, efficiency, and hydrogen energy consumption;

动力电池参数信息包括:动力电池的荷电状态、内阻以及开路电压;The power battery parameter information includes: the power battery's state of charge, internal resistance, and open circuit voltage;

空调系统参数信息包括:空调系统冷却容量/制热容量以及相应的功率。The air conditioning system parameter information includes: the air conditioning system cooling capacity/heating capacity and the corresponding power.

S2:建立燃料电池汽车协同能量管理模型,如图2和图3所示,具体步骤为:S2: Establish a fuel cell vehicle collaborative energy management model, as shown in Figures 2 and 3. The specific steps are:

S21:建立整车纵向动力学模型:S21: Establish the longitudinal dynamics model of the whole vehicle:

Pdrive=(Fair+Ff+Fi+m0a)·vP drive =(F air +F f +F i +m 0 a)·v

Figure BDA0003929529130000071
Figure BDA0003929529130000071

Pdem=Pb+Pfc·ηDC/DC PdemPb + Pfc · ηDC/DC

其中,m0表示整车质量;v表示为整车车速;a表示车辆加速度;Fair表示为空气阻力;Ff表示为滚动阻力;Fi表示为加速阻力;ηm、ηDC/AC、ηDC/DC以及ηmotor分别表示传动效率、DC/AC转换器效率、DC/DC转换器效率以及电机效率;Pdrive、Pdem、Pb以及Pfc分别表示车辆车轮处驱动功率、需求功率以及电池输出功率、燃料电池输出功率。Among them, m0 represents the vehicle mass; v represents the vehicle speed; a represents the vehicle acceleration; F air represents the air resistance; F f represents the rolling resistance; Fi represents the acceleration resistance; η m , η DC/AC , η DC/DC and η motor represent the transmission efficiency, DC/AC converter efficiency, DC/DC converter efficiency and motor efficiency respectively; P drive , P dem , P b and P fc represent the driving power at the vehicle wheels, the required power, the battery output power and the fuel cell output power respectively.

S22:建立燃料电池模型:S22: Establish fuel cell model:

ηfc=fη(Pfc)η fc =f η (P fc )

Figure BDA0003929529130000072
Figure BDA0003929529130000072

其中,fη(·)和

Figure BDA0003929529130000073
分别表示为效率和氢能消耗量的拟合函数,可通过插值法计算效率与氢耗。Among them, f η (·) and
Figure BDA0003929529130000073
They are respectively expressed as fitting functions of efficiency and hydrogen energy consumption, and the efficiency and hydrogen consumption can be calculated by interpolation method.

S23:建立动力电池模型:S23: Establishing power battery model:

Figure BDA0003929529130000074
Figure BDA0003929529130000074

Figure BDA0003929529130000081
Figure BDA0003929529130000081

其中,IL表示为动力电池电流;Voc表示为动力电池开路电压;Rin表示为动力电池等效内阻;SOC0表示为初始SOC;Qt表示为动力电池最大容量;t0表示为初始时刻;tf表示为最终时刻。Among them, I L represents the power battery current; V oc represents the power battery open circuit voltage; R in represents the power battery equivalent internal resistance; SOC 0 represents the initial SOC; Q t represents the maximum capacity of the power battery; t 0 represents the initial time; t f represents the final time.

S24:建立电机模型:S24: Build the motor model:

ηm=fmm,Tm)η m = f mm , T m )

Figure BDA0003929529130000082
Figure BDA0003929529130000082

其中,ωm和Tm分别表示电机转速和转矩;Pm表示为电机输出功率,fm(·)表示为电机工作效率的拟合函数,通过插值法可得到电机的工作效率。Wherein, ω m and T m represent the motor speed and torque respectively; P m represents the motor output power, and f m (·) represents the fitting function of the motor working efficiency. The motor working efficiency can be obtained by interpolation method.

S25:建立空调系统模型:S25: Establish air conditioning system model:

Figure BDA0003929529130000083
Figure BDA0003929529130000083

其中,Qac表示为空调系统的冷却容量或制热容量;Pac表示为空调系统相应功耗;ηcop表示为空调系统性能系数。Wherein, Q ac represents the cooling capacity or heating capacity of the air-conditioning system; P ac represents the corresponding power consumption of the air-conditioning system; η cop represents the performance coefficient of the air-conditioning system.

S26:建立车舱热负荷模型:S26: Establish cabin heat load model:

Qc=∑KF(Tout-Tin)Q c = ∑ KF (T out -T in )

Figure BDA0003929529130000084
Figure BDA0003929529130000084

Qh=145+116nQ h =145+116n

Qn=meξCpair(Tout-Tin)Q n =m e ξCp air (T out -T in )

Figure BDA0003929529130000085
Figure BDA0003929529130000085

其中,Qc、Qr、Qh以及Qn分别表示热传导负荷、辐射热负荷、车内人员产生热量(根据经验,驾驶员产生的热量约为145W,每位乘客约产生116W的热量)以及通风系统热负荷;K表示为传热系数;F表示为相应外壳的传热面积;Tout表示为环境温度;Tin表示为舱内空气温度;η表示为渗透率;I表示为太阳光光强大小;Ai表示为挡风玻璃、左右侧窗以及后窗面积;θi表示为太阳光入射角;β表示为阴影因子;n表示为车内乘客人数;me表示为通过蒸发器的空气质量;ξ表示为空气再循环系数;Cpair表示为室内空气热容;ρair和Vair分别表示为车舱内空气密度和车舱体积。Among them, Qc , Qr , Qh and Qn represent the heat conduction load, radiation heat load, heat generated by the occupants (according to experience, the heat generated by the driver is about 145W, and each passenger generates about 116W of heat) and ventilation system heat load respectively; K represents the heat transfer coefficient; F represents the heat transfer area of the corresponding shell; Tout represents the ambient temperature; Tin represents the cabin air temperature; η represents the permeability; I represents the solar light intensity; Ai represents the area of the windshield, left and right side windows and rear window; θi represents the solar incident angle; β represents the shadow factor; n represents the number of passengers in the car; me represents the air mass passing through the evaporator; ξ represents the air recirculation coefficient; Cp air represents the indoor air heat capacity; ρ air and V air represent the cabin air density and cabin volume respectively.

S3:基于SAC算法建立了计及空调系统的燃料电池汽车协同能量管理优化控制框架,求解包含燃氢经济性和舱室温度舒适性的多目标优化问题。如图3所示,通过应用软约束演员评论家算法实现了能量管理与空调系统的协同控制,优化了燃料电池汽车的燃氢经济性以及舱室温度舒适性,具体为:S3: Based on the SAC algorithm, a fuel cell vehicle collaborative energy management optimization control framework taking into account the air conditioning system is established to solve the multi-objective optimization problem including hydrogen fuel economy and cabin temperature comfort. As shown in Figure 3, the soft-constrained actor-critic algorithm is used to achieve collaborative control of energy management and air conditioning system, optimizing the hydrogen fuel economy and cabin temperature comfort of fuel cell vehicles, specifically:

S301:为能反映关键环境信息,将动力电池SOC、燃料电池输出功率Pfc、车辆速度v、空调制冷/制热容量Qac设置为状态变量,构建状态空间,可表示为:S301: In order to reflect key environmental information, the power battery SOC, fuel cell output power P fc , vehicle speed v, and air conditioning cooling/heating capacity Q ac are set as state variables to construct a state space, which can be expressed as:

S={SOC,Pfc,v,Qac}S={SOC, Pfc ,v, Qac }

S302:计及空调系统的协同能量管理不但分配动力源功率,还应根据空调系统制冷/制热容量的变化维持舱室温度的热舒适性,为此,将燃料电池输出功率变化量

Figure BDA0003929529130000091
和空调系统制冷/制热容量变化量
Figure BDA0003929529130000092
设置为动作变量,构建动作空间,可表示为:S302: The collaborative energy management of the air conditioning system should not only allocate the power of the power source, but also maintain the thermal comfort of the cabin temperature according to the change of the cooling/heating capacity of the air conditioning system. To this end, the change of the fuel cell output power
Figure BDA0003929529130000091
and the change in cooling/heating capacity of the air conditioning system
Figure BDA0003929529130000092
Set as action variable and construct action space, which can be expressed as:

Figure BDA0003929529130000093
Figure BDA0003929529130000093

S303:为保证舱室温度舒适性,将车舱室内温度维持在24℃左右,为此奖励函数中还应包括舱室温度变化这一优化项,于是奖励函数设置为氢能消耗量、SOC变化和舱室温度变化三个指标的加权求和,表示为:S303: To ensure the cabin temperature comfort, the cabin temperature is maintained at about 24°C. For this purpose, the reward function should also include the optimization item of cabin temperature change. Therefore, the reward function is set as the weighted sum of the three indicators of hydrogen energy consumption, SOC change and cabin temperature change, which can be expressed as:

R=-(ζ·fuel(t)+ψ·(SOC(t)-0.7)2+γ·(Tin-24)2)R=-(ζ·fuel(t)+ψ·(SOC(t)-0.7) 2 +γ·(T in -24) 2 )

其中,ζ、Ψ、γ为各优化项权重因子,通过调节权重因子来解决氢能消耗和舱室温度舒适性之间的折中问题,从而求解多目标优化问题;fuel(t)表示为当前时刻下的氢能消耗量;SOC(t)表示为当前时刻下的动力电池荷电状态。Among them, ζ, Ψ, and γ are weight factors of each optimization item. The trade-off between hydrogen energy consumption and cabin temperature comfort is solved by adjusting the weight factors, thereby solving the multi-objective optimization problem; fuel(t) represents the hydrogen energy consumption at the current moment; SOC(t) represents the state of charge of the power battery at the current moment.

S304:结合SAC算法求解能量管理中多目标优化问题,在SAC算法中引入了动作熵值使得动作输出更加分散,进而提升算法的探索能力、学习新任务能力以及稳定性,熵值表示为:S304: Combined with the SAC algorithm to solve the multi-objective optimization problem in energy management, the action entropy value is introduced into the SAC algorithm to make the action output more dispersed, thereby improving the algorithm's exploration ability, ability to learn new tasks and stability. The entropy value is expressed as:

H(π(·|st))=-logπ(·|st)H(π(·|s t ))=-logπ(·|s t )

其中,H即为策略π(·|st)的熵。Among them, H is the entropy of strategy π(·|s t ).

S305:求解过程中,智能体中演员网络以状态st作为输入,输出动作高斯分布的均值和方差,利用重参数化技术生成动作atS305: During the solution process, the actor network in the agent takes the state s t as input, outputs the mean and variance of the action Gaussian distribution, and generates the action a t using the reparameterization technique:

Figure BDA0003929529130000094
Figure BDA0003929529130000094

其中,τt表示为从标准正态分布中采样的噪声信号;

Figure BDA0003929529130000095
函数输出均值和方差;
Figure BDA0003929529130000096
Figure BDA0003929529130000097
分别表示高斯分布的均值和方差。Where τ t represents the noise signal sampled from the standard normal distribution;
Figure BDA0003929529130000095
The function outputs the mean and variance;
Figure BDA0003929529130000096
and
Figure BDA0003929529130000097
represent the mean and variance of the Gaussian distribution respectively.

S306:执行动作at后,车辆环境向智能体反馈奖励rt,并转移到下一状态st+1,即可生成环境与智能体的交互数据{st,at,rt,st+1},并储存于经验池

Figure BDA0003929529130000101
中。S306: After executing action a t , the vehicle environment feeds back a reward r t to the agent and transfers to the next state s t+1 , thereby generating the interaction data {s t , a t , r t , s t+1 } between the environment and the agent and storing it in the experience pool
Figure BDA0003929529130000101
middle.

S307:随机从经验池中抽取小批量经验样本,为避免最大化动作状态函数值时的高估以及利用自身网络计算目标时的进一步高估,引入参数为θ12的评估评论家网络以及参数为θ′1,θ′2的目标评论家网络,选择目标评论家网络输出较小的动作状态函数值作为目标值。针对特定状态st和动作at,SAC算法中软约束动作值函数Qsoft(st,at)更新公式如下:S307: Randomly extract a small batch of experience samples from the experience pool. In order to avoid overestimation when maximizing the action state function value and further overestimation when using the own network to calculate the target, introduce an evaluation critic network with parameters θ 1 , θ 2 and a target critic network with parameters θ′ 1 , θ′ 2. Select the target critic network to output a smaller action state function value as the target value. For a specific state s t and action a t , the update formula of the soft constraint action value function Q soft (s t, a t ) in the SAC algorithm is as follows:

Figure BDA0003929529130000102
Figure BDA0003929529130000102

其中,r表示为车辆获得的奖励;γ表示为折扣因子;α表示为温度系数。Among them, r represents the reward obtained by the vehicle; γ represents the discount factor; α represents the temperature coefficient.

S308:更新策略网络时,通过最小化损失函数L(θi)更新评估评论家网络,损失函数定义为

Figure BDA0003929529130000103
Figure BDA0003929529130000104
之间的均方误差,表示为:S308: When updating the policy network, the evaluation critic network is updated by minimizing the loss function L(θ i ), which is defined as
Figure BDA0003929529130000103
and
Figure BDA0003929529130000104
The mean square error between them is expressed as:

Figure BDA0003929529130000105
Figure BDA0003929529130000105

Figure BDA0003929529130000106
Figure BDA0003929529130000106

其中,

Figure BDA0003929529130000107
表示为评估评论家网络参数为θi时的评价函数,而
Figure BDA0003929529130000108
表为目标评论家网络参数为θ′i时的评价函数。in,
Figure BDA0003929529130000107
It is expressed as the evaluation function for evaluating the critic network parameter θ i , and
Figure BDA0003929529130000108
The table shows the evaluation function when the target critic network parameter is θ′ i .

S309:演员网络参数更新是通过最小化KL散度实现,KL值越小,输出动作对应的奖励之间的差异越小,则策略的收敛效果越好。演员网络的目标函数

Figure BDA0003929529130000109
定义为:S309: The actor network parameter update is achieved by minimizing the KL divergence. The smaller the KL value, the smaller the difference between the rewards corresponding to the output actions, and the better the convergence effect of the strategy. The objective function of the actor network
Figure BDA0003929529130000109
Defined as:

Figure BDA00039295291300001010
Figure BDA00039295291300001010

其中,DKL表示为KL散度计算表达式;Z(st)是配分函数,用于归一化分布;

Figure BDA00039295291300001011
表示当前时刻下车辆状态st、执行动作at时的数学期望函数,
Figure BDA00039295291300001012
表示当前状态为st时的策略函数,
Figure BDA00039295291300001013
表示为策略函数的参数。Where D KL is the KL divergence calculation expression; Z(s t ) is the partition function used to normalize the distribution;
Figure BDA00039295291300001011
represents the mathematical expectation function of the vehicle state s t at the current moment and the action a t ,
Figure BDA00039295291300001012
represents the policy function when the current state is s t ,
Figure BDA00039295291300001013
Represented as parameters of the policy function.

S310:按照梯度下降法更新演员网络参数,表示为:S310: Update the actor network parameters according to the gradient descent method, expressed as:

Figure BDA00039295291300001014
Figure BDA00039295291300001014

其中,

Figure BDA00039295291300001015
表示为关于策略函数参数
Figure BDA00039295291300001016
的下降梯度,
Figure BDA00039295291300001017
表示为关于当前时刻t下执行动作at的下降梯度;in,
Figure BDA00039295291300001015
Expressed as the policy function parameters
Figure BDA00039295291300001016
The descent gradient of
Figure BDA00039295291300001017
It is expressed as the descent gradient of executing action a t at the current time t;

S311:在SAC算法体系中,温度系数α的调节对于SAC算法训练效果至关重要,在不同的强化学习任务及训练时期,最佳温度系数的取值均不同。为实现温度系数的自动调节,通过对优化问题中目标函数求解最小值,即可更新得到每步最佳温度系数,目标函数表示为:S311: In the SAC algorithm system, the adjustment of the temperature coefficient α is crucial to the training effect of the SAC algorithm. In different reinforcement learning tasks and training periods, the value of the optimal temperature coefficient is different. In order to achieve automatic adjustment of the temperature coefficient, the optimal temperature coefficient of each step can be updated by solving the minimum value of the objective function in the optimization problem. The objective function is expressed as:

Figure BDA0003929529130000111
Figure BDA0003929529130000111

其中,H0表示为预先定义的最小策略熵的阈值,

Figure BDA0003929529130000112
表示为依据策略函数πt执行动作at时的数学期望函数,πt(at|st)表示为策略函数,st表示为当前时刻t下燃料电池汽车所处的状态,at则表示为当前时刻t时依据策略函数执行的动作。Where H 0 represents the predefined threshold of the minimum policy entropy.
Figure BDA0003929529130000112
is represented as the mathematical expectation function when action a t is executed according to the strategy function π t , π t (a t |s t ) is represented as the strategy function, s t is represented as the state of the fuel cell vehicle at the current time t, and a t is represented as the action executed according to the strategy function at the current time t.

最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solution of the present invention can be modified or replaced by equivalents without departing from the purpose and scope of the technical solution, which should be included in the scope of the claims of the present invention.

Claims (8)

1. A fuel cell vehicle learning type cooperative energy management method considering an air conditioning system, which is characterized by comprising the following steps:
s1: acquiring vehicle state parameter information, fuel cell parameter information, power cell parameter information and air conditioning system parameter information of a fuel cell automobile;
s2: establishing a fuel cell vehicle collaborative energy management model, comprising: a whole vehicle longitudinal dynamics model, a fuel cell model, a power cell model, a motor model, an air conditioning system model and a cabin thermal load model;
s3: establishing a fuel cell automobile collaborative energy management optimization control strategy considering an air conditioning system, solving a multi-objective optimization problem comprising hydrogen economy and cabin temperature comfort by combining an SAC algorithm, and controlling the change of refrigerating/heating capacity of an air conditioner to maintain the cabin temperature in a comfort zone while performing energy flow optimization control; the SAC algorithm is a soft constraint actor commentator algorithm; the method for establishing the fuel cell automobile collaborative energy management optimization control strategy considering the air conditioning system specifically comprises the following steps:
s301: determining a state space: SOC of power battery and output power P of fuel battery fc Vehicle speed v, refrigerating/heating capacity Q of air conditioning system ac Set as state variables, construct a state space S, denoted as:
S={SOC,P fc ,v,Q ac }
s302: determining an action space: variation of fuel cell output power ∈P fc And the variation of the refrigerating/heating capacity of the air conditioning system (Q) ac Set as action variables, construct action space a, denoted as:
A={▽P fc ,▽Q ac }
s303: establishing a reward function: the bonus function R is set as a weighted sum of three indicators of hydrogen energy consumption, SOC variation and cabin temperature variation, expressed as:
R=-(ζ·fuel(t)+ψ·(SOC(t)-0.7) 2 +γ·(T in -24) 2 )
zeta, ψ and gamma are weight factors of each optimization term, and the balance problem between hydrogen energy consumption and cabin temperature comfort is solved by adjusting the weight factors, so that the multi-objective optimization problem is solved; fuel (t) represents the hydrogen energy consumption at the current time; SOC (t) represents the state of charge of the power battery at the current time; t (T) in Expressed as cabin air temperature;
solving a multi-objective optimization problem comprising hydrogen economy and cabin temperature comfort by combining with a SAC algorithm, specifically comprising the following steps:
s311: solving a multi-objective optimization problem in energy management by combining a SAC algorithm, introducing motion entropy values into the SAC algorithm to enable motion output to be more dispersed, and further improving exploration capacity, new task learning capacity and stability of the algorithm, wherein the entropy values are expressed as:
H(π(·|s t ))=-logπ(·|s t )
wherein H is the strategy pi (|s) t ) Entropy of (2);
s312: during the solving process, the actor network in the agent is in the state s t As input, inputYielding the mean and variance of the gaussian distribution of the motion, generating motion a using a re-parameterization technique t
Figure FDA0004128743520000021
wherein ,τt Representing noise signals sampled from a standard normal distribution;
Figure FDA0004128743520000022
representing the mean and variance of the function output;
Figure FDA0004128743520000023
And
Figure FDA0004128743520000024
mean and variance of gaussian distribution are shown, respectively;
s313: executing action a t Thereafter, the vehicle environment feeds back the reward r to the agent t And transitions to the next state s t+1 I.e. generating interaction data { s } of the environment and the agent t ,a t ,r t ,s t+1 And store in experience pool
Figure FDA0004128743520000025
In (a) and (b);
s314: randomly extracting small-batch experience samples from an experience pool, and introducing parameters theta 12 Is a critics network and parameter θ 1 ′,θ 2 The target critics network selects a smaller action state function value output by the target critics network as a target value; for a specific state s t And action a t Soft constraint action value function Q in SAC algorithm soft (s t ,a t ) The update formula is as follows:
Figure FDA0004128743520000026
wherein r represents a reward earned by the vehicle; gamma represents a discount factor; alpha represents a temperature coefficient;
s315: when updating the policy network, the policy network is updated by minimizing the loss function L (θ i ) Updating an evaluation critic network, the loss function being defined as
Figure FDA0004128743520000027
And->
Figure FDA0004128743520000028
The mean square error between them is expressed as:
Figure FDA0004128743520000029
Figure FDA00041287435200000210
wherein ,
Figure FDA00041287435200000211
representing evaluating critics network parameters as θ i Evaluation function at time->
Figure FDA00041287435200000212
Representing the network parameter of the target critics as theta i ' evaluation function at time;
s316: actor network parameter updating is achieved by minimizing KL divergence; objective function of actor network
Figure FDA00041287435200000213
The definition is as follows:
Figure FDA00041287435200000214
wherein ,DKL Representing a KL divergence calculation expression; z(s) t ) Is a distribution function for normalizing the distribution;
Figure FDA00041287435200000215
representing the state s of the vehicle at the current time t Executing action a t Mathematical expectation function of time;
Figure FDA00041287435200000216
Representing the current state as s t Policy function at time->
Figure FDA00041287435200000217
Parameters expressed as policy functions;
s317: updating actor network parameters according to a gradient descent method, wherein the actor network parameters are expressed as follows:
Figure FDA0004128743520000031
wherein ,
Figure FDA0004128743520000032
expressed as about policy function parameters->
Figure FDA0004128743520000033
Gradient of decline of->
Figure FDA0004128743520000034
Represented as action a being performed in relation to the current time t t Is a decreasing gradient of (2);
s318: the optimal temperature coefficient of each step can be obtained by updating the minimum value of the objective function in the optimization problem, and the objective function is expressed as:
Figure FDA0004128743520000035
wherein ,H0 A threshold representing a predefined minimum policy entropy,
Figure FDA0004128743520000036
represented as a function of policy pi t Executing action a t Mathematical expectation function of time, pi t (a t |s t ) Expressed as a policy function, s t Is expressed as the state of the fuel cell automobile at the current time t, a t Then this is denoted as the action performed according to the policy function at the current time t.
2. The fuel cell vehicle learning collaborative energy management method according to claim 1, wherein in step S1, the vehicle state parameter information includes: vehicle speed, cabin thermal load parameters, motor operating efficiency and transmission system characteristic parameters; the fuel cell parameter information includes: power, efficiency, and hydrogen energy consumption of the fuel cell; the power battery parameter information includes: the state of charge, internal resistance and open circuit voltage of the power battery; the air conditioning system parameter information includes: air conditioning system cooling capacity/heating capacity and corresponding power.
3. The fuel cell vehicle learning collaborative energy management method according to claim 1, wherein in step S2, a vehicle longitudinal dynamics model is established as follows:
P drive =(F air +F f +F i +m 0 a)·v
Figure FDA0004128743520000037
P dem =P b +P fc ·η DC/DC
wherein ,m0 Representing the quality of the whole vehicle; v is the speed of the whole vehicle; a represents vehicle acceleration; f (F) air Expressed as air resistance; f (F) f Expressed as rolling resistance; f (F) i Expressed as acceleration resistance; η (eta) m 、η DC/AC 、η DC/DC η motor Respectively representing transmission efficiency, DC/AC converter efficiency, DC/DC converter efficiency and motor efficiency; p (P) drive 、P dem 、P b P fc Respectively representing the driving power at the wheels of the vehicle, the required power, the battery output power, and the fuel cell output power.
4. The fuel cell vehicle learning collaborative energy management method according to claim 3 wherein in step S2, a fuel cell model is established as:
η fc =f η (P fc )
Figure FDA0004128743520000038
wherein ,fη(·) and
Figure FDA0004128743520000039
Expressed as a fitted function of efficiency and hydrogen energy consumption, respectively, the efficiency and hydrogen consumption were calculated by interpolation.
5. The fuel cell vehicle learning collaborative energy management method according to claim 3, wherein in step S2, a power cell model is established as follows:
Figure FDA0004128743520000041
Figure FDA0004128743520000042
wherein ,IL Expressed as power cell current; v (V) oc Expressed as power cell open circuit voltage; r is R in Expressed as the equivalent internal resistance of the power battery;SOC 0 denoted as initial SOC; q (Q) t Expressed as power cell maximum capacity; t is t 0 Denoted as initial time; t is t f Represented as the final time.
6. The fuel cell vehicle learning collaborative energy management method according to claim 3, wherein in step S2, a motor model is built as follows:
η m =f mm ,T m )
Figure FDA0004128743520000043
wherein ,ωm and Tm Respectively representing the motor rotation speed and the motor torque; p (P) m Expressed as motor output power, f m (. Cndot.) represents a fitting function of the motor working efficiency, which is obtained by interpolation.
7. The fuel cell vehicle learning collaborative energy management method according to claim 1, wherein in step S2, an air conditioning system model is established as follows:
Figure FDA0004128743520000044
wherein ,Qac Expressed as cooling capacity or heating capacity of the air conditioning system; p (P) ac Expressed as the corresponding power consumption of the air conditioning system; η (eta) cop Expressed as an air conditioning system coefficient of performance.
8. The fuel cell vehicle learning collaborative energy management method according to claim 1, wherein in step S2, a vehicle cabin thermal load model is established as follows:
Q c =∑KF(T out -T in )
Figure FDA0004128743520000045
Q h =145+116n
Q n =m e ξCp air (T out -T in )
Figure FDA0004128743520000051
wherein ,Qc 、Q r 、Q h and Qn respectively representing heat conduction load, radiation heat load, heat generated by personnel in the vehicle and heat load of a ventilation system; k is expressed as a heat transfer coefficient; f represents the heat transfer area of the corresponding housing; t (T) out Expressed as ambient temperature;
T in expressed as cabin air temperature; η is expressed as permeability; i is expressed as the intensity of sunlight; a is that i Represented as windshield, left and right side windows, and rear window area; θ i Expressed as the incident angle of sunlight; beta is denoted as a shading factor; n represents the number of passengers in the vehicle;
m e represented as the mass of air passing through the evaporator; ζ is the air recirculation coefficient; cp air Expressed as indoor air heat capacity; ρ air and Vair Respectively as the air density in the cabin and the cabin volume.
CN202211385462.0A 2022-11-07 2022-11-07 Fuel cell automobile learning type cooperative energy management method considering air conditioning system Active CN115503559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211385462.0A CN115503559B (en) 2022-11-07 2022-11-07 Fuel cell automobile learning type cooperative energy management method considering air conditioning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211385462.0A CN115503559B (en) 2022-11-07 2022-11-07 Fuel cell automobile learning type cooperative energy management method considering air conditioning system

Publications (2)

Publication Number Publication Date
CN115503559A CN115503559A (en) 2022-12-23
CN115503559B true CN115503559B (en) 2023-05-02

Family

ID=84512880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211385462.0A Active CN115503559B (en) 2022-11-07 2022-11-07 Fuel cell automobile learning type cooperative energy management method considering air conditioning system

Country Status (1)

Country Link
CN (1) CN115503559B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116639135B (en) * 2023-05-26 2024-07-09 中国第一汽车股份有限公司 Cooperative control method and device for vehicle and vehicle
CN117968208B (en) * 2024-03-29 2024-07-19 中建安装集团有限公司 Environment system control method and control system
CN119247786B (en) * 2024-12-03 2025-03-25 合肥工业大学 Thermal-electric integrated optimization control method for intelligent connected fuel cell vehicles

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111731303A (en) * 2020-07-09 2020-10-02 重庆大学 A HEV energy management method based on deep reinforcement learning A3C algorithm
CN111785045A (en) * 2020-06-17 2020-10-16 南京理工大学 A joint control method for distributed traffic lights based on actor-critic algorithm
CN112287463A (en) * 2020-11-03 2021-01-29 重庆大学 An energy management method for fuel cell vehicles based on deep reinforcement learning algorithm
CN113071506A (en) * 2021-05-20 2021-07-06 吉林大学 Fuel cell automobile energy consumption optimization system considering cabin temperature
CN113085665A (en) * 2021-05-10 2021-07-09 重庆大学 Fuel cell automobile energy management method based on TD3 algorithm
CN113246805A (en) * 2021-07-02 2021-08-13 吉林大学 Fuel cell power management control method considering temperature of automobile cab

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210270622A1 (en) * 2020-02-27 2021-09-02 Cummins Enterprise Llc Technologies for energy source schedule optimization for hybrid architecture vehicles

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785045A (en) * 2020-06-17 2020-10-16 南京理工大学 A joint control method for distributed traffic lights based on actor-critic algorithm
CN111731303A (en) * 2020-07-09 2020-10-02 重庆大学 A HEV energy management method based on deep reinforcement learning A3C algorithm
CN112287463A (en) * 2020-11-03 2021-01-29 重庆大学 An energy management method for fuel cell vehicles based on deep reinforcement learning algorithm
CN113085665A (en) * 2021-05-10 2021-07-09 重庆大学 Fuel cell automobile energy management method based on TD3 algorithm
CN113071506A (en) * 2021-05-20 2021-07-06 吉林大学 Fuel cell automobile energy consumption optimization system considering cabin temperature
CN113246805A (en) * 2021-07-02 2021-08-13 吉林大学 Fuel cell power management control method considering temperature of automobile cab

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于延迟策略的最大熵优势演员评论家算法;祁文凯;桑国明;;小型微型计算机系统(第08期);90-98 *

Also Published As

Publication number Publication date
CN115503559A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
CN115503559B (en) Fuel cell automobile learning type cooperative energy management method considering air conditioning system
Xie et al. A Self-learning intelligent passenger vehicle comfort cooling system control strategy
CN111267831B (en) Intelligent time-domain-variable model prediction energy management method for hybrid electric vehicle
CN112287463B (en) An energy management method for fuel cell vehicles based on deep reinforcement learning algorithm
CN111731303A (en) A HEV energy management method based on deep reinforcement learning A3C algorithm
CN112776673A (en) Intelligent network fuel cell automobile real-time energy optimization management system
CN113479186B (en) Energy management strategy optimization method for hybrid electric vehicle
CN113071506B (en) Fuel cell automobile energy consumption optimization system considering cabin temperature
CN109591659B (en) Intelligent learning pure electric vehicle energy management control method
Jia et al. Deep reinforcement learning-based energy management strategy for fuel cell buses integrating future road information and cabin comfort control
CN110717218A (en) An electric drive vehicle distributed power drive system reconfiguration control method and vehicle
CN116734424B (en) Indoor thermal environment control method based on RC model and deep reinforcement learning
CN112255918B (en) Method and system for optimal control of vehicle platoon
CN115793445A (en) Hybrid electric vehicle control method based on multi-agent deep reinforcement learning
CN114017904B (en) Operation control method and device for building HVAC system
CN102865649A (en) Secondary fuzzy control-based multi-objective adjusting method of air quality inside carriage
CN113110052B (en) A Hybrid Energy Management Approach Based on Neural Networks and Reinforcement Learning
Haskara et al. Reinforcement learning based EV energy management for integrated traction and cabin thermal management considering battery aging
Rajan et al. Enhancing the performance and economic efficiency of range-extended electric vehicles: A hybrid dual stream spectrum deconvolution neural network with Beluga Whale Optimization
CN111562741A (en) Method for prolonging service life of battery of electric automobile
CN114925921B (en) Optimization method and system for integrated energy system including distributed photovoltaic and electric vehicles
Ma et al. Real-time predictive control for evs cabin thermal management considering air quality
CN117254529A (en) Power distribution network real-time scheduling method and system considering carbon emission and uncertainty
CN116468291A (en) A Hybrid Energy Dispatch Method for Commercial Buildings Including Electric Vehicle Charging Stations
CN115958935A (en) Energy-saving control method, controller, and vehicle for heat pump air-conditioning system of networked electric vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant