CN116612636B

CN116612636B - Signal lamp cooperative control method based on multi-agent reinforcement learning

Info

Publication number: CN116612636B
Application number: CN202310582760.7A
Authority: CN
Inventors: 欧阳雅捷; 殷力; 郭艺雯; 赵阔
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2024-01-23
Anticipated expiration: 2043-05-22
Also published as: CN116612636A

Abstract

The present invention proposes a collaborative control method for traffic lights based on multi-agent reinforcement learning and multi-modal signal perception, which includes: collecting data from various sensors and defining multi-modality, obtaining information in real time through data fusion technology; using collaborative vehicles The road multi-agent reinforcement learning algorithm performs collaborative control of traffic lights and vehicles; preprocesses data collected from various sensors, uses feature fusion methods to fuse data from different modalities, and builds a local state space for each agent; for traffic lights Design action spaces for agents and vehicle agents; design reward functions for multi-agent reinforcement learning based on the goal of traffic flow control; design communication protocols suitable for vehicle-road collaborative control scenarios; use historical data or simulation environments to strengthen multi-agents The learning model is trained to find the optimal strategy. By introducing vehicles as intelligent entities, the present invention achieves more effective vehicle-road collaboration and further improves the traffic control effect.

Description

Traffic light collaborative control method based on multi-agent reinforcement learning

技术领域Technical field

本发明属于车路协同领域，尤其涉及基于多智能体强化学习的信号灯协同控制方法。The invention belongs to the field of vehicle-road collaboration, and in particular relates to a signal lamp collaborative control method based on multi-agent reinforcement learning.

背景技术Background technique

随着城市交通的日益繁忙，传统的信号灯控制方法已经难以满足现代城市的高效交通需求。为了解决这一问题，研究者们已经开始采用智能交通系统(ITS)来提高道路通行效率。其中，基于多智能体强化学习和多模态信号感知的信号灯控制系统备受关注。As urban traffic becomes increasingly busy, traditional signal light control methods are no longer able to meet the efficient traffic needs of modern cities. In order to solve this problem, researchers have begun to use intelligent transportation systems (ITS) to improve road traffic efficiency. Among them, the traffic light control system based on multi-agent reinforcement learning and multi-modal signal perception has attracted much attention.

传统的信号灯控制方法通常基于固定的信号周期或者预定的交通流模式，缺乏对实时交通状况的适应性。因此，急需一种突破传统交通信号灯控制方法的局限性，提高对实时交通状况的适应性的协同方法。Traditional signal light control methods are usually based on fixed signal periods or predetermined traffic flow patterns, which lack adaptability to real-time traffic conditions. Therefore, there is an urgent need for a collaborative method that breaks through the limitations of traditional traffic light control methods and improves the adaptability to real-time traffic conditions.

发明内容Contents of the invention

本发明的目的是提出基于多智能体强化学习的信号灯协同控制方法，通过引入车辆作为智能体，实现更有效的车路协同，进一步提高交通控制效果。The purpose of the present invention is to propose a signal light collaborative control method based on multi-agent reinforcement learning, by introducing vehicles as intelligent agents, to achieve more effective vehicle-road collaboration and further improve the traffic control effect.

为了达到上述目的，在本发明提供基于多智能体强化学习的信号灯协同控制方法，所述方法包括：In order to achieve the above objectives, the present invention provides a signal light collaborative control method based on multi-agent reinforcement learning, which method includes:

S1、收集各种传感器的数据和进行多模态定义，通过数据融合技术实时获取信息；S1. Collect data from various sensors and perform multi-modal definition, and obtain information in real time through data fusion technology;

S2、采用协同车路多智能体强化学习算法对信号灯与车辆进行协同控制，通过学习找到最优策略来实现高效的交通流控制；S2. Use the collaborative vehicle-road multi-agent reinforcement learning algorithm to collaboratively control traffic lights and vehicles, and find the optimal strategy through learning to achieve efficient traffic flow control;

S3、根据各种传感器的收集数据进行预处理，利用特征融合方法将不同模态的数据融合，为每个智能体构建局部状态空间；S3. Preprocess data collected from various sensors, use feature fusion methods to fuse data from different modalities, and build a local state space for each agent;

S4、为智能体设计动作空间；S4. Design an action space for the agent;

S5、根据交通流控制的目标，为多智能体强化学习设计奖励函数；S5. Design a reward function for multi-agent reinforcement learning based on the goal of traffic flow control;

S6、设计通信协议；S6. Design communication protocol;

S7、使用历史数据或仿真环境对多智能体强化学习模型进行训练，找到最优策略。S7. Use historical data or simulation environment to train the multi-agent reinforcement learning model and find the optimal strategy.

进一步地，所述S1中多模态定义包括视觉模态和雷达模态；所述视觉模态通过摄像头收集的图像数据；所述雷达模块通过雷达收集的距离和速度信息；所述信息包括路况信息、车辆位置和速度。Further, the multi-modal definition in S1 includes visual mode and radar mode; the image data collected by the visual mode through the camera; the distance and speed information collected by the radar module through the radar; the information includes road conditions. information, vehicle position and speed.

进一步地，所述数据融合技术具体为：Further, the data fusion technology is specifically:

S1.1、将场景建模为一个图结构；S1.1. Model the scene as a graph structure;

S1.2、针对每种模态的数据进行特征提取；S1.2. Extract features from data of each modality;

S1.3、基于图卷积神经网络的特征融合；S1.3. Feature fusion based on graph convolutional neural network;

S1.4、输出强化学习状态。S1.4. Output the reinforcement learning status.

进一步地，所述协同车路多智能体强化学习算法的目标函数定义为：Further, the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm is defined as:

奖励函数定义为R(s,a)，其中s表示状态，a表示智能体的动作，通过调整所述智能体的动作以最大化累积奖励，表示为：The reward function is defined as R(s,a), where s represents the state and a represents the action of the agent. By adjusting the action of the agent to maximize the cumulative reward, it is expressed as:

J(θ)＝∑_t R(s_t,a_t)J(θ)＝∑_t R(s_t,a_t)

其中，J(θ)表示目标函数；s_t表示t时刻智能体所在的状态；a_t表示t时刻智能体所做的动作；Among them, J(θ) represents the objective function; s_t represents the state of the agent at time t; a_t represents the action taken by the agent at time t;

则损失函数L(θ)表示为：Then the loss function L(θ) is expressed as:

L(θ)＝0.5*E[(R(s,a)+γ*max_a'Q(s',a'；θ')-Q(s,a；θ))^2]L(θ)＝0.5*E[(R(s,a)+γ*max_a'Q(s',a';θ')-Q(s,a;θ))^2]

其中，E[·]表示期望值，θ表示当前智能体的参数，θ'表示目标智能体的参数，γ为折扣因子，a'表示智能体在状态s'所做出的动作，Q(s,a；θ)为动作价值函数，用于估计在状态s下采取动作a的累积奖励；Q(s',a'；θ')表示网络对在状态s'做出动作a'的动作价值评估，用于衡量的是在状态s'做出动作a'的动作的好坏。Among them, E[·] represents the expected value, θ represents the parameters of the current agent, θ' represents the parameters of the target agent, γ is the discount factor, a' represents the action taken by the agent in state s', Q(s, a; θ) is the action value function, used to estimate the cumulative reward of taking action a in state s; Q(s', a'; θ') represents the network's action value evaluation of action a' in state s' , used to measure the quality of action a' in state s'.

进一步地，所述协同车路多智能体强化学习算法的目标函数的实现步骤包括：Further, the implementation steps of the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm include:

S2.1、利用了集中式训练和分布式执行策略，通过在训练阶段进行集中式训练，实现智能体之间的协同，在执行阶段，各个智能体使用分布式策略，根据局部状态做出决策；S2.1, using centralized training and distributed execution strategies, through centralized training in the training phase, collaboration between agents is achieved. In the execution phase, each agent uses a distributed strategy to make decisions based on local states. ;

S2.2、通过所述步骤S1中各种传感器的数据，获取包含多模态信息的综合状态空间，使得智能体能够更准确地感知交通状况；S2.2. Obtain a comprehensive state space containing multi-modal information through the data of various sensors in step S1, so that the agent can more accurately perceive traffic conditions;

S2.3、将车辆和信号灯作为不同的智能体，实现车辆与信号灯之间的协同控制，提高交通流畅性。S2.3. Treat vehicles and traffic lights as different intelligent entities to achieve collaborative control between vehicles and traffic lights to improve traffic smoothness.

进一步地，所述步骤S3中所述数据包括车辆数据、路况数据和信号灯数据；Further, the data in step S3 includes vehicle data, road condition data and traffic light data;

进一步地，所述状态空间表示如下：Further, the state space is expressed as follows:

s＝{车辆数据,路况数据,信号灯数据}。s={vehicle data, road condition data, traffic light data}.

进一步地，所述步骤S4具体包括：Further, the step S4 specifically includes:

S4.1、将所述信号灯的控制策略离散化为一系列可选的动作；S4.1. Discretize the control strategy of the traffic light into a series of optional actions;

S4.2、根据实时路况数据，动态调整所述信号灯的相位设置；S4.2. Dynamically adjust the phase setting of the signal light based on real-time traffic data;

S4.3、设计自适应的信号灯控制策略。S4.3. Design an adaptive traffic light control strategy.

进一步地，所述通信协议包括车辆与路侧设施通信、信号灯之间的通信、中央控制器与信号灯通信和数据融合与处理。Further, the communication protocol includes communication between vehicles and roadside facilities, communication between signal lights, communication between the central controller and signal lights, and data fusion and processing.

进一步地，所述步骤S7具体为：使用历史数据或仿真环境对多智能体强化学习模型进行训练，找到最优策略，将所述最优策略部署到信号灯控制系统。Further, the step S7 specifically includes: using historical data or simulation environment to train the multi-agent reinforcement learning model, finding the optimal strategy, and deploying the optimal strategy to the signal light control system.

本发明的有益技术效果至少在于以下几点：The beneficial technical effects of the present invention lie in at least the following points:

(1)多模态信号感知技术则为信号灯控制系统提供了丰富的实时交通信息。通过融合来自摄像头、雷达、车载传感器等多种传感器的数据，系统能够更准确地感知交通状况，为信号灯控制提供更有针对性的决策依据。(1) Multi-modal signal sensing technology provides rich real-time traffic information for the signal light control system. By integrating data from multiple sensors such as cameras, radars, and vehicle-mounted sensors, the system can more accurately perceive traffic conditions and provide more targeted decision-making basis for signal light control.

(2)通过基于图卷积神经网络的多模态特征融合方法，可以更好地捕捉车辆和信号灯之间的拓扑关系。同时通过以上方法，可以将多模态信号感知数据融合为一个统一的状态空间，为多智能体强化学习算法提供更丰富、更准确的信息。(2) Through the multi-modal feature fusion method based on graph convolutional neural network, the topological relationship between vehicles and traffic lights can be better captured. At the same time, through the above methods, multi-modal signal sensing data can be fused into a unified state space, providing richer and more accurate information for multi-agent reinforcement learning algorithms.

(3)通过本发明的最优策略，我们的信号灯控制系统将具备较强的动态调整能力，能够在实际应用中更好地适应不断变化的交通状况，提高整体的信号灯控制效果。(3) Through the optimal strategy of the present invention, our signal light control system will have strong dynamic adjustment capabilities, be able to better adapt to changing traffic conditions in practical applications, and improve the overall signal light control effect.

附图说明Description of the drawings

利用附图对本发明作进一步说明，但附图中的实施例不构成对本发明的任何限制，对于本领域的普通技术人员，在不付出创造性劳动的前提下，还可以根据以下附图获得其它的附图。The present invention is further described using the accompanying drawings, but the embodiments in the accompanying drawings do not constitute any limitation to the present invention. For those of ordinary skill in the art, without exerting creative efforts, other embodiments can be obtained based on the following drawings. Picture attached.

图1为本发明基于多智能体强化学习的信号灯协同控制方法流程图。Figure 1 is a flow chart of the signal light collaborative control method based on multi-agent reinforcement learning of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be understood as limiting the present invention.

实施例一Embodiment 1

在一个或多个实施方式中，如图1所示，公开了一种基于多智能体强化学习的信号灯协同控制方法，包括以下步骤：In one or more embodiments, as shown in Figure 1, a signal light collaborative control method based on multi-agent reinforcement learning is disclosed, including the following steps:

S1、收集各种传感器的数据和进行多模态定义，通过数据融合技术实时获取信息；。S1. Collect data from various sensors and perform multi-modal definition, and obtain information in real time through data fusion technology;.

具体地，步骤S1负责收集各种传感器(如摄像头、雷达、车载传感器等)的数据，通过数据融合技术实时获取路况信息、车辆位置和速度等信息。这些信息将用于构建多智能体强化学习的状态空间。Specifically, step S1 is responsible for collecting data from various sensors (such as cameras, radars, vehicle-mounted sensors, etc.), and obtaining information such as road condition information, vehicle position and speed in real time through data fusion technology. This information will be used to construct the state space of multi-agent reinforcement learning.

多模态定义具体包括：The definition of multimodality specifically includes:

a.视觉模态：通过摄像头收集的图像数据，可以提供车辆位置、形状、速度等信息。摄像头可以安装在路边或车辆上，实时传输图像数据。a. Visual mode: Image data collected through cameras can provide information such as vehicle position, shape, speed, etc. The cameras can be installed on the roadside or in vehicles to transmit image data in real time.

b.雷达模态：通过雷达收集的距离和速度信息，有助于准确检测车辆的位置、速度和距离。雷达可以安装在路边或车辆上，实时传输距离数据。这些数据可以通过车载通信系统(如V2X通信)实时传输给信号灯控制系统。b. Radar mode: The distance and speed information collected through radar helps to accurately detect the position, speed and distance of the vehicle. The radar can be mounted on the roadside or on a vehicle and transmit distance data in real time. These data can be transmitted to the signal light control system in real time through the vehicle communication system (such as V2X communication).

步骤S1的数据融合的步骤具体包括：The steps of data fusion in step S1 specifically include:

S1.1、将场景建模为一个图结构。在交通信号灯控制的场景中，车辆和信号灯之间存在一定的拓扑关系。我们可以将场景建模为一个图结构，其中车辆和信号灯作为节点，它们之间的相互关系作为边；S1.1. Model the scene as a graph structure. In the traffic light control scenario, there is a certain topological relationship between vehicles and traffic lights. We can model the scene as a graph structure, with vehicles and traffic lights as nodes and their interrelationships as edges;

S1.2、针对每种模态的数据进行特征提取。针对每种模态的数据，首先进行特征提取。对于视觉模态，可以使用卷积神经网络(CNN)提取特征；对于雷达模态，可以使用一维卷积神经网络或循环神经网络(RNN)提取特征；对于车载传感器模态，可以使用全连接神经网络(FCNN)提取特征。然后，将提取的特征表示为节点特征。S1.2. Extract features from the data of each modality. For the data of each modality, feature extraction is first performed. For the visual modality, you can use a convolutional neural network (CNN) to extract features; for the radar modality, you can use a one-dimensional convolutional neural network or a recurrent neural network (RNN) to extract features; for the vehicle sensor modality, you can use fully connected Neural network (FCNN) extracts features. Then, the extracted features are represented as node features.

S1.3、基于图卷积神经网络的特征融合。将提取的多模态特征作为节点特征输入到图卷积神经网络(GCN)中。GCN可以在保持节点特征的同时，捕捉节点之间的拓扑关系。通过GCN的传播和汇聚操作，我们可以得到包含了多模态信息以及节点之间拓扑关系的综合特征。S1.3. Feature fusion based on graph convolutional neural network. The extracted multi-modal features are input into the graph convolutional neural network (GCN) as node features. GCN can capture the topological relationship between nodes while maintaining node characteristics. Through the propagation and convergence operations of GCN, we can obtain comprehensive features that contain multi-modal information and topological relationships between nodes.

S1.4、输出强化学习状态。将GCN融合后的特征作为强化学习的状态空间，用于后续的多智能体强化学习算法。S1.4. Output the reinforcement learning status. The features after GCN fusion are used as the state space of reinforcement learning for subsequent multi-agent reinforcement learning algorithms.

通过这种基于图卷积神经网络的多模态特征融合方法，我们可以更好地捕捉车辆和信号灯之间的拓扑关系。同时通过以上方法，可以将多模态信号感知数据融合为一个统一的状态空间，为多智能体强化学习算法提供更丰富、更准确的信息。Through this multi-modal feature fusion method based on graph convolutional neural network, we can better capture the topological relationship between vehicles and traffic lights. At the same time, through the above methods, multi-modal signal sensing data can be fused into a unified state space, providing richer and more accurate information for multi-agent reinforcement learning algorithms.

S2、采用协同车路多智能体强化学习算法对信号灯与车辆进行协同控制，其中，信号灯和车辆被视为多个智能体，通过学习找到最优策略来实现高效的交通流控制。S2. The collaborative vehicle-road multi-agent reinforcement learning algorithm is used to collaboratively control the traffic lights and vehicles. The traffic lights and vehicles are regarded as multiple agents, and efficient traffic flow control is achieved by finding the optimal strategy through learning.

具体地，协同车路多智能体强化学习算法被称为CVR-MARL，全称为CollaborativeVehicle-Road Multi-Agent Reinforcement Learning，CVR-MARL的目标函数是最大化系统的累积奖励。在这个模型中，信号灯和车辆被视为多个智能体，它们需要通过学习找到最优策略来实现高效的交通流控制。Specifically, the collaborative vehicle-road multi-agent reinforcement learning algorithm is called CVR-MARL, which stands for CollaborativeVehicle-Road Multi-Agent Reinforcement Learning. The objective function of CVR-MARL is to maximize the cumulative reward of the system. In this model, traffic lights and vehicles are regarded as multiple agents, which need to learn to find optimal strategies to achieve efficient traffic flow control.

CVR-MARL的目标函数定义为：The objective function of CVR-MARL is defined as:

奖励函数定义为R(s,a)，其中s表示状态(基于多模态感知的特征融合结果)，a表示智能体的动作，通过调整所述智能体的动作以最大化累积奖励，表示为：The reward function is defined as R(s,a), where s represents the state (feature fusion result based on multi-modal perception), and a represents the action of the agent. By adjusting the action of the agent to maximize the cumulative reward, it is expressed as :

J(θ)＝∑_t R(s_t,a_t)J(θ)＝∑_t R(s_t,a_t)

则损失函数L(θ)表示为：Then the loss function L(θ) is expressed as:

其中，实现该目标的方法具体为：Among them, the specific methods to achieve this goal are:

S2.1、多智能体协同：利用了集中式训练和分布式执行策略，通过在训练阶段进行集中式训练，实现智能体之间的协同，在执行阶段，各个智能体(包括车辆和信号灯)使用分布式策略，根据局部状态做出决策；S2.1. Multi-agent collaboration: Centralized training and distributed execution strategies are used to achieve collaboration between agents through centralized training in the training phase. In the execution phase, each agent (including vehicles and traffic lights) Use distributed strategies to make decisions based on local states;

S2.2、状态空间的构建：通过所述步骤S1中各种传感器的数据，获取包含多模态信息的综合状态空间，使得智能体能够更准确地感知交通状况，构建状态空间；S2.2. Construction of state space: Obtain a comprehensive state space containing multi-modal information through the data of various sensors in step S1, so that the agent can more accurately perceive traffic conditions and construct a state space;

S2.3、车路协同：将车辆和信号灯作为不同的智能体，实现车辆与信号灯之间的协同控制，提高交通流畅性。S2.3. Vehicle-road collaboration: Treat vehicles and traffic lights as different intelligent entities to achieve collaborative control between vehicles and traffic lights to improve traffic smoothness.

S3、根据各种传感器的收集数据进行预处理，利用特征融合方法将不同模态的数据融合在一起，为每个智能体构建局部状态空间。S3. Preprocess the data collected from various sensors, use the feature fusion method to fuse different modal data together, and build a local state space for each agent.

具体地，根据多模态信号感知模块收集的数据，为每个智能体(信号灯和车辆)构建局部状态空间，具体CVR-MARL从车辆、路况和信号灯三方面收集多模态数据。包括以下数据：Specifically, based on the data collected by the multi-modal signal sensing module, a local state space is constructed for each agent (signal light and vehicle). Specifically, CVR-MARL collects multi-modal data from three aspects: vehicles, road conditions and signal lights. Includes the following data:

车辆数据：Vehicle data:

a)位置信息：车辆的经纬度、航向角等。a) Position information: vehicle’s latitude and longitude, heading angle, etc.

b)速度信息：车辆的实时速度。b) Speed information: the real-time speed of the vehicle.

c)加速度信息：车辆的实时加速度。c) Acceleration information: real-time acceleration of the vehicle.

d)车辆类型：如轿车、卡车、公交车等。d) Vehicle type: such as cars, trucks, buses, etc.

e)车辆通信数据：车辆之间的通信信息，如车联网数据(V2V)等。e) Vehicle communication data: communication information between vehicles, such as vehicle-to-vehicle data (V2V), etc.

路况数据：Traffic data:

a)道路结构：道路的宽度、车道数量、分隔带等信息。a) Road structure: road width, number of lanes, separation zones and other information.

b)交通流量：各车道的车辆数量、密度等。b) Traffic flow: the number and density of vehicles in each lane, etc.

c)道路环境信息：路面状况、天气、光照等因素。c) Road environment information: road conditions, weather, lighting and other factors.

信号灯数据：Traffic light data:

a)状态信息：信号灯的当前状态(红灯、绿灯、黄灯)。a) Status information: the current status of the signal light (red light, green light, yellow light).

b)剩余时间：信号灯状态变化的剩余时间。b) Remaining time: the remaining time for the signal light status to change.

c)信号灯控制策略：如固定周期控制、感应控制等。c) Signal light control strategy: such as fixed period control, induction control, etc.

d)车辆-信号灯通信数据：车辆与信号灯之间的通信信息，如车路通信数据(V2I)等。d) Vehicle-to-signal light communication data: communication information between vehicles and signal lights, such as vehicle-to-road communication data (V2I), etc.

根据收集到的多模态数据，可以构造状态空间如下：Based on the collected multimodal data, the state space can be constructed as follows:

s＝{车辆数据,路况数据,信号灯数据}s={vehicle data, traffic data, traffic light data}

在构建状态空间时，需要对收集到的多模态数据进行预处理，以消除数据之间的量纲和尺度差异。例如，可以对数据进行归一化处理，使其位于同一范围内。接下来，利用特征融合方法将不同模态的数据融合在一起，生成一个综合的状态表示。这个综合状态表示能够充分利用多模态信息，为CVR-MARL提供更丰富的环境感知，从而更好地实现车路协同控制。When constructing the state space, the collected multimodal data need to be preprocessed to eliminate the dimensional and scale differences between the data. For example, the data can be normalized so that it falls within the same range. Next, feature fusion methods are used to fuse data from different modalities together to generate a comprehensive state representation. This comprehensive state representation can make full use of multi-modal information to provide CVR-MARL with richer environmental perception, thereby better realizing vehicle-road collaborative control.

S4、为信号灯智能体和车辆智能体设计可行的动作空间。S4. Design feasible action spaces for the traffic light agent and vehicle agent.

具体地，为信号灯智能体和车辆智能体设计可行的动作空间。在设计动作空间时，我们需要考虑信号灯的控制策略，使得动作空间能够适应不同的路况。为实现动态信号灯动作空间，采用以下方法：Specifically, feasible action spaces are designed for the traffic light agent and the vehicle agent. When designing the action space, we need to consider the control strategy of the signal light so that the action space can adapt to different road conditions. In order to realize the dynamic signal light action space, the following methods are used:

S4.1、将所述信号灯的控制策略离散化为一系列可选的动作，例如，可以根据信号灯的相位、持续时间和变化速率等参数将其划分为若干个离散动作。这种方法简化了动作空间的表示，便于强化学习算法进行探索和优化。S4.1. Discretize the control strategy of the signal light into a series of optional actions. For example, it can be divided into several discrete actions according to parameters such as phase, duration, and change rate of the signal light. This approach simplifies the representation of the action space and facilitates exploration and optimization by reinforcement learning algorithms.

S4.2、根据实时路况数据，动态调整所述信号灯的相位设置，例如，在交通流量较大的车道上增加绿灯时间，以减轻拥堵。此外，可以根据道路结构、车辆类型和天气等因素调整相位顺序和时长，以提高交通效率。S4.2. Dynamically adjust the phase setting of the signal light based on real-time traffic data, for example, increase the green light time on lanes with large traffic volume to reduce congestion. In addition, the phase sequence and duration can be adjusted based on factors such as road structure, vehicle type and weather to improve traffic efficiency.

S4.3、设计自适应的信号灯控制策略，例如，在流量较小的路口采用感应控制，而在流量较大的路口采用协同控制。此外，可以根据实时交通数据动态调整控制策略的参数，以适应路况的变化。S4.3. Design an adaptive signal light control strategy. For example, sensor control is used at intersections with small traffic volume, and collaborative control is used at intersections with large traffic volume. In addition, the parameters of the control strategy can be dynamically adjusted based on real-time traffic data to adapt to changes in road conditions.

所述动作空间表示为：The action space is expressed as:

A＝{动作1,动作2,...,动作n}A＝{action 1, action 2,..., action n}

其中，每个动作对应一种信号灯控制策略或参数设置。通过动动态调整动作空间和控制策略，我们可以实现信号灯的智能控制，从而提高交通效率和安全性。Among them, each action corresponds to a signal light control strategy or parameter setting. By dynamically adjusting the action space and control strategy, we can achieve intelligent control of traffic lights, thereby improving traffic efficiency and safety.

S5、根据交通流控制的目标，为多智能体强化学习设计合适的奖励函数。S5. According to the goal of traffic flow control, design an appropriate reward function for multi-agent reinforcement learning.

具体地，根据交通流控制的目标(如减少拥堵、降低排放等)，为多智能体强化学习设计合适的奖励函数。奖励函数需要平衡各种因素，以实现最优的车路协同控制效果。具体来说，将转移奖励函数R(s,a,s')设计为以下形式：Specifically, according to the goals of traffic flow control (such as reducing congestion, reducing emissions, etc.), an appropriate reward function is designed for multi-agent reinforcement learning. The reward function needs to balance various factors to achieve the optimal vehicle-road collaborative control effect. Specifically, the transfer reward function R(s,a,s') is designed as the following form:

R(s,a,s')＝w1*T(s,a,s')+w2*D(s,a,s')+w3*S(s,a,s')R(s,a,s')=w1*T(s,a,s')+w2*D(s,a,s')+w3*S(s,a,s')

其中：in:

s：当前状态；s: current status;

a：智能体执行的动作；a: Action performed by the agent;

s'：执行动作后的新状态；s': the new state after executing the action;

w1,w2,w3：权重参数，用于平衡各项指标的重要性；w1, w2, w3: weight parameters, used to balance the importance of various indicators;

T(s,a,s')：交通效率指标，如车辆通过路口的平均速度或等待时间；T(s,a,s'): Traffic efficiency index, such as the average speed or waiting time of vehicles passing through the intersection;

D(s,a,s')：交通拥堵程度指标，如车辆在路口排队的长度或等待车辆数；D(s,a,s'): Traffic congestion level indicator, such as the length of vehicle queues at intersections or the number of waiting vehicles;

S(s,a,s')：交通安全指标，如交通事故发生的概率或车辆与行人之间的安全距离。S(s,a,s'): Traffic safety indicators, such as the probability of traffic accidents or the safe distance between vehicles and pedestrians.

奖励函数的设计需要兼顾交通效率、拥堵程度和安全性等多个方面，以引导智能体作出有利于整体交通状况的决策。在实际应用中，可以根据具体场景和需求调整权重参数和指标函数，以实现更优的控制效果。The design of the reward function needs to take into account multiple aspects such as traffic efficiency, congestion, and safety to guide the agent to make decisions that are beneficial to the overall traffic situation. In practical applications, weight parameters and indicator functions can be adjusted according to specific scenarios and needs to achieve better control effects.

S6、设计适用于车路协同控制场景的通信协议。S6. Design a communication protocol suitable for vehicle-road collaborative control scenarios.

具体地，设计适用于车路协同控制场景的通信协议，使得智能体之间可以高效、安全地交换信息。通信协议需要考虑低延时、高可靠性以及安全性等因素。我们认为具体的各个设备之前应该采用不同的通信方式才能保证实现本发明的目的。Specifically, a communication protocol suitable for vehicle-road collaborative control scenarios is designed so that information can be exchanged between agents efficiently and safely. Communication protocols need to consider factors such as low latency, high reliability, and security. We believe that each specific device should adopt different communication methods to ensure that the purpose of the present invention is achieved.

所述通信协议包括车辆与路侧设施通信、信号灯之间的通信、中央控制器与信号灯通信和数据融合与处理，具体为：The communication protocol includes communication between vehicles and roadside facilities, communication between signal lights, communication between the central controller and signal lights, and data fusion and processing, specifically:

a)车辆与路侧设施通信(车路通信)：采用专用短程通信(DSRC)或车载互联网(V2X)技术实现车辆与路侧设施(如信号灯、传感器等)之间的双向通信。车辆可以将自身状态信息(如位置、速度、行驶方向等)发送给路侧设施，同时接收来自路侧设施的指令(如信号灯变化、限速信息等)。a) Communication between vehicles and roadside facilities (vehicle-to-road communication): Dedicated short-range communications (DSRC) or vehicle-to-everything (V2X) technology is used to achieve two-way communication between vehicles and roadside facilities (such as traffic lights, sensors, etc.). The vehicle can send its own status information (such as position, speed, driving direction, etc.) to roadside facilities, and at the same time receive instructions from roadside facilities (such as signal light changes, speed limit information, etc.).

b)信号灯之间的通信：信号灯之间可以通过无线传感器网络(WSN)或蜂窝网络实现信息交换，以便进行协同控制。通过这种通信机制，相邻的信号灯可以分享局部交通信息，例如车流量、等待时间等。b) Communication between signal lights: Information exchange between signal lights can be realized through wireless sensor network (WSN) or cellular network for collaborative control. Through this communication mechanism, adjacent traffic lights can share local traffic information, such as traffic flow, waiting time, etc.

c)中央控制器与信号灯通信：中央控制器通过有线或无线网络与各个信号灯进行通信。中央控制器负责处理来自信号灯和车辆的信息，运行多智能体强化学习算法(CVR-MARL)，并将控制策略发送给相应的信号灯。c) Communication between the central controller and the signal lights: The central controller communicates with each signal light through a wired or wireless network. The central controller is responsible for processing information from traffic lights and vehicles, running a multi-agent reinforcement learning algorithm (CVR-MARL), and sending control strategies to the corresponding traffic lights.

d)数据融合与处理：多模态信号感知模块将收集到的数据发送至数据处理模块。数据处理模块负责对多模态数据进行特征融合，构建状态空间，并传递给多智能体强化学习算法。d) Data fusion and processing: The multi-modal signal sensing module sends the collected data to the data processing module. The data processing module is responsible for feature fusion of multi-modal data, constructing a state space, and passing it to the multi-agent reinforcement learning algorithm.

具体地，使用历史数据或仿真环境对多智能体强化学习模型进行训练，找到最优策略。在实际应用中，除了将最优策略部署到信号灯控制系统上，我们还需要考虑动态调整能力，以便在实际应用中更好地适应不断变化的交通状况。为了实现这一目标，我们可以采取以下策略：Specifically, the multi-agent reinforcement learning model is trained using historical data or simulation environment to find the optimal strategy. In practical applications, in addition to deploying the optimal strategy to the signal light control system, we also need to consider dynamic adjustment capabilities to better adapt to changing traffic conditions in practical applications. To achieve this goal, we can adopt the following strategies:

S7.1、在线学习和更新：通过在线学习和更新策略，我们可以实时地根据当前的交通状况更新强化学习模型。这意味着我们的系统可以不断地学习和适应实际交通环境，从而提高信号灯控制的效果。S7.1. Online learning and updating: Through online learning and updating strategies, we can update the reinforcement learning model according to the current traffic conditions in real time. This means that our system can continuously learn and adapt to the actual traffic environment, thereby improving the effectiveness of signal light control.

S7.2、探索-利用权衡：在实施阶段，我们需要在探索和利用之间进行权衡。通过引入一定程度的探索，我们可以让模型在实际应用中不断尝试新的策略，以发现更优的解决方案。然而，过多的探索可能会降低系统的稳定性。因此，我们需要在探索和利用之间找到一个合适的平衡。S7.2. Exploration-exploitation trade-off: During the implementation phase, we need to make a trade-off between exploration and exploitation. By introducing a certain degree of exploration, we can allow the model to continuously try new strategies in practical applications to discover better solutions. However, too much exploration may reduce the stability of the system. Therefore, we need to find a suitable balance between exploration and exploitation.

S7.3、异常情况处理：在实际应用中，可能会出现一些异常情况，如交通事故、道路封闭等。针对这些情况，我们需要设计一套异常处理机制，以便在遇到这些问题时，系统能够自动进行调整。例如，当检测到道路封闭时，系统可以自动调整信号灯策略，引导车辆绕行。S7.3. Handling of abnormal situations: In actual applications, some abnormal situations may occur, such as traffic accidents, road closures, etc. In response to these situations, we need to design an exception handling mechanism so that the system can automatically adjust when encountering these problems. For example, when a road closure is detected, the system can automatically adjust the signal light strategy to guide vehicles around.

S7.4、实时反馈与调整：为了进一步提高系统的动态调整能力，我们可以在信号灯控制系统中引入实时反馈机制。通过收集实时交通数据，并与模型预测结果进行比较，系统可以根据实际情况进行自我调整，从而更好地适应实际交通状况。S7.4. Real-time feedback and adjustment: In order to further improve the dynamic adjustment capability of the system, we can introduce a real-time feedback mechanism into the signal light control system. By collecting real-time traffic data and comparing it with model predictions, the system can adjust itself to better suit actual traffic conditions.

通过以上策略，我们的信号灯控制系统将具备较强的动态调整能力，能够在实际应用中更好地适应不断变化的交通状况，提高整体的信号灯控制效果。Through the above strategies, our signal light control system will have strong dynamic adjustment capabilities, be able to better adapt to changing traffic conditions in practical applications, and improve the overall signal light control effect.

尽管已经示出和描述了本发明的实施例，本领域技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变形，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and purposes of the invention. The scope is defined by the claims and their equivalents.

Claims

1. A signal light collaborative control method based on multi-agent reinforcement learning, characterized in that the method includes:

S1. Collect data from various sensors and perform multi-modal definition, and obtain information in real time through data fusion technology;

S2. Use the collaborative vehicle-road multi-agent reinforcement learning algorithm to collaboratively control traffic lights and vehicles, and find the optimal strategy through learning to achieve efficient traffic flow control;

S3. Preprocess data collected from various sensors, use feature fusion methods to fuse data from different modalities, and build a local state space for each agent;

S4. Design an action space for the agent;

S5. Design a reward function for multi-agent reinforcement learning based on the goal of traffic flow control;

S6. Design communication protocol;

S7. Use historical data or simulation environment to train the multi-agent reinforcement learning model and find the optimal strategy;

The multi-modal definition in S1 includes visual mode and radar mode; the image data collected by the visual mode through the camera; the distance and speed information collected by the radar module through the radar; the information includes road condition information, vehicle position and velocity;

The objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm is defined as:

The reward function is defined as R(s,a), where s represents the state and a represents the action of the agent. By adjusting the action of the agent to maximize the cumulative reward, it is expressed as:

J(θ)＝∑_t R(s_t,a_t)

Among them, J(θ) represents the objective function; s_t represents the state of the agent at time t; a_t represents the action taken by the agent at time t;

Then the loss function L(θ) is expressed as:

L(θ)＝0.5*E[(R(s,a)+γ*max_a'Q(s',a';θ')-Q(s,a;θ))^2]

Among them, E[·] represents the expected value, θ represents the parameters of the current agent, θ' represents the parameters of the target agent, γ is the discount factor, a' represents the action taken by the agent in state s', Q(s, a; θ) is the action value function, used to estimate the cumulative reward of taking action a in state s; Q(s', a'; θ') represents the network's action value evaluation of action a' in state s' , used to measure the quality of action a' in state s';

The implementation steps of the objective function of the collaborative vehicle-road multi-agent reinforcement learning algorithm include:

S2.1, using centralized training and distributed execution strategies, through centralized training in the training phase, collaboration between agents is achieved. In the execution phase, each agent uses a distributed strategy to make decisions based on local states. ;

S2.2. Obtain a comprehensive state space containing multi-modal information through the data of various sensors in step S1, so that the agent can more accurately perceive traffic conditions;

S2.3. Treat vehicles and traffic lights as different intelligent entities to achieve collaborative control between vehicles and traffic lights to improve traffic smoothness.

2. The signal light collaborative control method based on multi-agent reinforcement learning according to claim 1, characterized in that the data fusion technology is specifically:

S1.1. Model the scene as a graph structure;

S1.2. Extract features from data of each modality;

S1.3. Feature fusion based on graph convolutional neural network;

S1.4. Output the reinforcement learning status.

3. The collaborative control method of traffic lights based on multi-agent reinforcement learning according to claim 1, characterized in that the data in step S3 includes vehicle data, road condition data and traffic light data.

4. The traffic light collaborative control method based on multi-agent reinforcement learning and multi-modal signal perception according to claim 3, characterized in that the state space is expressed as follows:

s={vehicle data, road condition data, traffic light data}.

5. The signal light collaborative control method based on multi-agent reinforcement learning according to claim 1, characterized in that the step S4 specifically includes:

S4.1. Discretize the control strategy of the traffic light into a series of optional actions;

S4.2. Dynamically adjust the phase setting of the signal light based on real-time traffic data;

S4.3. Design an adaptive traffic light control strategy.

6. The collaborative control method of signal lights based on multi-agent reinforcement learning according to claim 1, characterized in that the communication protocol includes communication between vehicles and roadside facilities, communication between signal lights, communication between the central controller and the signal lights, and Data fusion and processing.

7. The signal light collaborative control method based on multi-agent reinforcement learning according to claim 1, characterized in that the step S7 is specifically: using historical data or simulation environment to train the multi-agent reinforcement learning model to find the optimal The optimal strategy is deployed to the signal light control system.