WO2020010517A1 - Trajectory prediction method and apparatus - Google Patents
Trajectory prediction method and apparatus Download PDFInfo
- Publication number
- WO2020010517A1 WO2020010517A1 PCT/CN2018/095144 CN2018095144W WO2020010517A1 WO 2020010517 A1 WO2020010517 A1 WO 2020010517A1 CN 2018095144 W CN2018095144 W CN 2018095144W WO 2020010517 A1 WO2020010517 A1 WO 2020010517A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- trajectory
- vehicle
- information
- video sequence
- neural network
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000003062 neural network model Methods 0.000 claims abstract description 28
- 238000003709 image segmentation Methods 0.000 claims abstract description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 230000006403 short-term memory Effects 0.000 claims description 12
- 230000002123 temporal effect Effects 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000015654 memory Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims 2
- 238000012549 training Methods 0.000 description 8
- 230000003068 static effect Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005312 nonlinear dynamic Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
Definitions
- the invention relates to the field of local navigation of robots and intelligent vehicles, and particularly to a trajectory prediction method and device.
- the main purpose of the present invention is to provide a trajectory prediction method and device, which can improve the accuracy of predicting a vehicle trajectory.
- the trajectory prediction method provided by the first aspect of the embodiments of the present invention is applied to a vehicle provided with a vehicle-mounted camera.
- the method includes: using the vehicle-mounted camera to photograph the surrounding environment to obtain a video sequence including the surrounding vehicle and the vehicle background;
- the video sequence locates the surrounding vehicles and extracts the historical trajectory information of the surrounding vehicles, and uses scene semantic information obtained by image segmentation of the video sequence as auxiliary information; the historical trajectory information and the auxiliary information are input into a nerve A network model to obtain the predicted trajectory of the surrounding vehicles.
- the trajectory prediction device provided in the second aspect of the embodiments of the present invention is applied to a vehicle provided with a vehicle-mounted camera, and the device includes: an acquisition module configured to use the vehicle-mounted camera to photograph the surrounding environment, and acquire a vehicle including the surrounding vehicle and the vehicle background. Video sequence; extraction and segmentation module for locating the surrounding vehicle from the video sequence and extracting historical trajectory information of the surrounding vehicle, and using scene semantic information obtained by image segmentation of the video sequence as auxiliary information; an output module For inputting the historical trajectory information and the auxiliary information into a neural network model to obtain a predicted trajectory of the surrounding vehicles.
- a video sequence including surrounding vehicles and vehicle backgrounds is acquired through an on-board camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain a predicted trajectory,
- the time continuity of the neural network model in this embodiment is guaranteed, and the accuracy of predicting the trajectory of the vehicle is improved.
- FIG. 1 is a schematic flowchart of an trajectory prediction method provided by a first embodiment of the present invention
- FIG. 2 is a schematic flowchart of a trajectory prediction method provided by a second embodiment of the present invention.
- FIG. 3 is a schematic diagram of a neural network model of a trajectory prediction method provided by a second embodiment of the present invention.
- FIG. 4 is an application schematic diagram of a trajectory prediction method provided by a second embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of a trajectory prediction apparatus according to a third embodiment of the present invention.
- FIG. 1 is a schematic flowchart of a trajectory prediction method provided by a first embodiment of the present invention. The method is applied to a vehicle provided with a vehicle camera. As shown in Figure 1, the trajectory prediction method mainly includes the following steps:
- the surrounding vehicle may refer to a vehicle that is located within a certain distance from the vehicle provided with the on-board camera and has a potential influence on the vehicle provided with the on-board camera.
- the range may be 30 meters around the vehicle provided with the on-board camera.
- the motion in the video sequence is the illusion of motion formed by displaying frames in rapid succession.
- the video sequence of each frame is a still image.
- the surrounding vehicles are located in the video sequence of each frame.
- the trajectory information of the surrounding vehicles can be seen in the video sequence, so for the video sequence of the current frame, the historical trajectory information of the surrounding vehicles is obtained from the video sequence of the past multiple frames.
- the scene semantic information obtained by image segmentation of the video sequence of each frame is used as auxiliary information.
- Image segmentation refers to segmenting objects in the video sequence of each frame according to semantic categories and labeling scene semantic information, such as pedestrians, surrounding vehicles, buildings, sky, vegetation, road obstacles, lane lines, road identification information and traffic lights Information, etc., to further identify the drivable area in the video sequence of the current frame.
- scene semantic information By using scene semantic information as auxiliary information, it can be robust to the apparent change of the target.
- edge detection can be used to segment the video sequence of each frame to extract the required target.
- the edge indicates the end of one feature area and the beginning of another feature area.
- the internal features or attributes of the required target are consistent and inconsistent with the features or attributes inside other feature areas, such as grayscale, color, or texture. feature.
- a neural network is a complex network system formed by a large number of simple neurons widely connected to each other. It is a highly complex non-linear dynamic learning system with massively parallel, distributed storage and processing, self-organizing, and self-organizing. Adaptation and self-learning. Therefore, a neural network model is established using a neural network to obtain a neural network model, and the obtained historical trajectory information and auxiliary information are input to the neural network model to obtain a predicted trajectory of surrounding vehicles.
- a video sequence including surrounding vehicles and vehicle backgrounds is acquired by an on-board camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain a predicted trajectory,
- the time continuity of the neural network model in this embodiment is guaranteed, and the accuracy of predicting the trajectory of the vehicle is improved.
- FIG. 2 is a schematic flowchart of a trajectory prediction method provided by a second embodiment of the present invention. The method is applied to a vehicle provided with a vehicle camera. As shown in FIG. 2, the trajectory prediction method mainly includes the following steps:
- a vehicle-mounted camera uses a vehicle-mounted camera to photograph the surrounding environment and obtain a video sequence including the surrounding vehicles and the vehicle background.
- the neural network model includes a convolutional neural network, a first layer of short-term memory network, a second layer of short-term memory network, and a fully connected layer.
- the convolutional neural network is a feedforward neural network. After segmenting and labeling the video sequence, the scene semantic information is obtained as auxiliary information and input to the convolutional neural network to obtain spatial feature information.
- Auxiliary information is image information, which can be encoded using a one-bit effective encoding. The number of channels is used as the number of semantic categories.
- the auxiliary information is input into a four-layer convolutional neural network.
- the convolution kernel can be 3 * 3 * 4 to obtain space Feature information, which is represented by a 6-dimensional vector.
- the convolutional neural network includes a convolutional layer, a linear correction unit, a pooling layer, and a Dropout layer.
- the convolutional layer can extract features from the auxiliary information.
- the linear layer can introduce non-linear features.
- the pooling layer can compress the input auxiliary information and extract the main features. Dropout layers can be used to alleviate the problem of overfitting.
- a long-short term neural (LSTM) network is a time-recursive network.
- the historical trajectory information has a certain time series, and there is a certain contextual relationship in the position, that is, the historical trajectory information as a sequence needs to continuously learn the position characteristics before and after learning, so the LSTM network is used to train the historical trajectory information and connect the historical frames
- the track information is used to infer the track information of the current frame.
- the historical trajectory information is input into the first layer LSTM network to obtain temporal feature information
- the temporal feature information and the spatial feature information obtained in step 203 are input to the second layer LSTM network to obtain joint characterization information.
- the first-level LSTM network can not only learn the time feature information, but also make the dimensions of the time feature information and the space feature information consistent.
- the number of units in the first layer LSTM network may be 100
- the second layer LSTM network may include two LSTM networks with 300 units in both layers.
- each node of the fully connected layer is connected to all nodes of the previous layer, and is used to integrate all the features extracted from the previous layer. Therefore, the joint characterization information is input into the fully connected layer to perform a series of The matrix is multiplied to obtain the output of the neural network model, and the predicted trajectory J at T time steps is obtained.
- the time T can be 1.6s (unit: second)
- the neural network model includes the following formulas:
- J indicates the predicted trajectory
- M indicates the mapping relationship between H
- A indicates the historical trajectory information
- A indicates the auxiliary information
- p indicates the surrounding vehicles
- h indicates vehicles in the t-th video sequence
- the position information of p indicates the scene semantic information of vehicle p in the t-th video sequence
- j indicates the position information of vehicle p in the t-th video sequence from frame T + 1, and t indicates each frame.
- an image segmentation-long-short term memory (SEG-LSTM) is proposed in this embodiment to fuse multiple streams of historical frames and predict the future trajectory of surrounding vehicles.
- the number of layers of the LSTM network, the number of units of each layer of the LSTM network, the number of layers of the convolutional neural network, and the size of the convolution kernel are all network hyperparameters, which are determined through cross-validation.
- the role of cross-validation is to determine the optimal hyperparameters while avoiding overfitting the model.
- the data set is divided into a training set and a test set with a ratio of 5: 1.
- the training set is divided into 5 parts. Each part is taken as the verification set in turn, and the remaining 4 parts are used as the training set for 5 training and verification.
- Different hyperparameters can be used to obtain the corresponding average accuracy rate. Determine its value.
- a video sequence is divided into multiple time step video sequences according to frames, and position information is detected and tracked from each frame of the video sequence, and image segmentation is performed to obtain semantic information. Subsequently, the position information and semantic information of the same frame are input into the LSTM network for training, and the predicted trajectory is obtained by training multiple historical frames and video sequences of the current frame.
- the depth camera uses the depth camera to obtain the minimum relative distance between the vehicle and each of the surrounding vehicles. According to the minimum relative distance, the two-dimensional spatial prediction trajectory is converted into a three-dimensional spatial prediction trajectory.
- the predicted trajectory is a two-dimensional spatial predicted trajectory
- a depth camera is further provided in the vehicle.
- x, y, w, and h respectively represent the elements of the two-dimensional spatial prediction trajectory in the pixel bounding box of each frame of the video sequence
- x r , y r , w r , and h r respectively represent the three-dimensional spatial prediction trajectory in each frame.
- the historical trajectory information and predicted trajectory can be defined as a three-dimensional space occupying grid, that is,
- d max represents the maximum distance between the vehicle and each of the surrounding vehicles.
- a video sequence including surrounding vehicles and vehicle backgrounds is acquired through a vehicle camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain predictions.
- Trajectories instead of using static images to extract scene semantic information for analysis, thereby ensuring the temporal continuity of the neural network model in this embodiment, and thereby improving the accuracy of predicting vehicle trajectories.
- the use of convolutional neural networks and LSTM networks can improve the robustness of surrounding vehicle tracking, and image segmentation can be used to obtain scene semantic information, which can improve the interpretability of the training process.
- FIG. 5 is a schematic structural diagram of a trajectory prediction device provided by a third embodiment of the present invention, which is applied to a vehicle provided with a vehicle camera.
- the trajectory prediction device mainly includes:
- the obtaining module 301 is configured to use an on-vehicle camera to photograph the surrounding environment and obtain a video sequence including the surrounding vehicles and the vehicle background.
- An extraction segmentation module 302 is used to locate surrounding vehicles from the video sequence and extract historical track information of the surrounding vehicles, and use scene semantic information obtained by image segmentation of the video sequence as auxiliary information.
- An output module 303 is configured to input historical trajectory information and auxiliary information into a neural network model to obtain a predicted trajectory of surrounding vehicles.
- the neural network model includes a convolutional neural network, a first layer of short-term memory network, a second layer of short-term memory network, and a fully connected layer.
- the output module 303 is further configured to input auxiliary information to the convolutional neural network to obtain spatial feature information.
- the output module 303 is further configured to input the historical trajectory information into the first layer of short-term and short-term memory network to obtain temporal characteristic information.
- the output module 303 is further configured to input the spatial feature information and the temporal feature information into the second-layer long-term and short-term memory network to obtain joint feature information.
- the output module 303 is further configured to input the joint feature information into the fully connected layer to obtain a predicted trajectory.
- the neural network model includes the following formula:
- J is the predicted trajectory
- M is the mapping relationship between H, A, and J
- H is the historical trajectory information
- A is the auxiliary information
- p is the surrounding vehicles
- h is the position information of the vehicle p in the t-th video sequence.
- A indicates the scene semantic information of vehicle p in the t-th video sequence
- j indicates the position information of vehicle p in the t-th video sequence from frame T + 1
- t indicates each frame.
- the predicted trajectory is a two-dimensional spatial predicted trajectory
- a depth camera is also provided in the vehicle.
- the obtaining module 301 is further configured to obtain a minimum relative distance between a vehicle and each surrounding vehicle through a depth camera.
- the device further includes a conversion module 304,
- the conversion module 304 is configured to convert the two-dimensional space prediction trajectory into a three-dimensional space prediction trajectory according to the minimum relative distance.
- the conversion module 304 is further configured to convert the two-dimensional space prediction trajectory into a three-dimensional space prediction trajectory according to the minimum relative distance by the following formula:
- x, y, w, and h respectively represent the elements of the two-dimensional spatial prediction trajectory in the pixel bounding box of each frame of the video sequence
- x r , y r , w r , and h r respectively represent the three-dimensional spatial prediction trajectory in each frame.
- a video sequence including surrounding vehicles and vehicle backgrounds is acquired by an on-board camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain a predicted trajectory,
- the time continuity of the neural network model in this embodiment is guaranteed, and the accuracy of predicting the trajectory of the vehicle is improved.
- the disclosed methods and devices may be implemented in other ways.
- the embodiments described above are only schematic.
- the division of the modules is only a logical function division.
- multiple modules or components may be combined or may be combined. Integration into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual couplings or direct couplings or communication links can make the indirect coupling or communication links of the modules through some interfaces, which can be electrical, mechanical or other forms.
- the modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, which may be located in one place, or may be distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the objective of the solution of this embodiment.
- each functional module in each embodiment of the present invention may be integrated into a processing module.
- Each module may exist separately physically, or two or more modules may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
A trajectory prediction method and apparatus, relating to the field of local navigation for robots and intelligent vehicles, and applied to a vehicle provided with a vehicle-mounted camera. The method comprises: photographing a surrounding environment by means of a vehicle-mounted camera to acquire a video sequence comprising a surrounding vehicle and a vehicle background (101); positioning the surrounding vehicle in the video sequence and extracting historical trajectory information of the surrounding vehicle, and taking, as auxiliary information, scene semantic information obtained through performing image segmentation on the video sequence (102); and inputting the historical trajectory information and the auxiliary information into a neural network model to obtain a predicted trajectory of the surrounding vehicle (103). By means of the trajectory prediction method, the accuracy of predicting a vehicle trajectory can be improved.
Description
本发明涉及机器人及智能车辆的局部导航领域,尤其涉及一种轨迹预测方法及装置。The invention relates to the field of local navigation of robots and intelligent vehicles, and particularly to a trajectory prediction method and device.
在车辆行驶过程中,预测其他交通参与者的未来轨迹以避免自动驾驶的车辆撞向其他车辆是十分重要的。假设所有交通参与者都遵守交通规则,人类驾驶员可以潜意识地预测目标的未来轨迹,则对于自动驾驶车辆而言,通常采用建立模型的方法来预测其他交通参与者的未来轨迹。As the vehicle travels, it is important to predict the future trajectory of other traffic participants to prevent autonomous vehicles from colliding with other vehicles. Assuming that all traffic participants comply with the traffic rules, human drivers can subconsciously predict the future trajectory of the target. For autonomous vehicles, a model is usually used to predict the future trajectories of other traffic participants.
然而,目前大多数工作都是使用静态图像来提取视觉语义消息,或者采用端对端的结构来学习驾驶网络,前者忽略驾驶情形中的时间连续性,而后者缺乏训练网络可解释性,因此会造成预测车辆轨迹准确度不高的问题。However, most current work uses static images to extract visual semantic messages, or uses an end-to-end structure to learn driving networks. The former ignores the time continuity in driving situations, while the latter lacks the interpretability of training networks, which will cause The problem of low accuracy in predicting vehicle trajectory.
发明内容Summary of the invention
本发明的主要目的在于提供一种轨迹预测方法及装置,可提高预测车辆轨迹的准确度。The main purpose of the present invention is to provide a trajectory prediction method and device, which can improve the accuracy of predicting a vehicle trajectory.
本发明实施例第一方面提供的轨迹预测方法,应用于设置有车载摄像头的车辆,所述方法包括:利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列;从所述视频序列中定位所述周围车辆并提取所述周围车辆的历史轨迹信息,将所述视频序列进行图像分割得到的场景语义信息作为辅助信息;将所述历史轨迹信息和所述辅助信息输入神经网络模型,得到 所述周围车辆的预测轨迹。The trajectory prediction method provided by the first aspect of the embodiments of the present invention is applied to a vehicle provided with a vehicle-mounted camera. The method includes: using the vehicle-mounted camera to photograph the surrounding environment to obtain a video sequence including the surrounding vehicle and the vehicle background; The video sequence locates the surrounding vehicles and extracts the historical trajectory information of the surrounding vehicles, and uses scene semantic information obtained by image segmentation of the video sequence as auxiliary information; the historical trajectory information and the auxiliary information are input into a nerve A network model to obtain the predicted trajectory of the surrounding vehicles.
本发明实施例第二方面提供的轨迹预测装置,应用于设置有车载摄像头的车辆,所述装置包括:获取模块,用于利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列;提取分割模块,用于从所述视频序列中定位所述周围车辆并提取所述周围车辆的历史轨迹信息,将所述视频序列进行图像分割得到的场景语义信息作为辅助信息;输出模块,用于将所述历史轨迹信息和所述辅助信息输入神经网络模型,得到所述周围车辆的预测轨迹。The trajectory prediction device provided in the second aspect of the embodiments of the present invention is applied to a vehicle provided with a vehicle-mounted camera, and the device includes: an acquisition module configured to use the vehicle-mounted camera to photograph the surrounding environment, and acquire a vehicle including the surrounding vehicle and the vehicle background. Video sequence; extraction and segmentation module for locating the surrounding vehicle from the video sequence and extracting historical trajectory information of the surrounding vehicle, and using scene semantic information obtained by image segmentation of the video sequence as auxiliary information; an output module For inputting the historical trajectory information and the auxiliary information into a neural network model to obtain a predicted trajectory of the surrounding vehicles.
从上述实施例中可知,通过车载摄像头获取包括周围车辆和车辆背景的视频序列,并且将视频序列进行图像分割获取场景语义信息,接着将场景语义信息和历史轨迹信息输入神经网络模型获取预测轨迹,而不是采用静态图像来提取场景语义信息进行分析,从而保证了本实施例中神经网络模型的时间连续性,进而提高了预测车辆轨迹的准确度。It can be known from the above embodiments that a video sequence including surrounding vehicles and vehicle backgrounds is acquired through an on-board camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain a predicted trajectory, Instead of using static images to extract scene semantic information for analysis, the time continuity of the neural network model in this embodiment is guaranteed, and the accuracy of predicting the trajectory of the vehicle is improved.
图1是本发明第一实施例提供的轨迹预测方法的实现流程示意图;FIG. 1 is a schematic flowchart of an trajectory prediction method provided by a first embodiment of the present invention;
图2是本发明第二实施例提供的轨迹预测方法的实现流程示意图;FIG. 2 is a schematic flowchart of a trajectory prediction method provided by a second embodiment of the present invention;
图3是本发明第二实施例提供的轨迹预测方法的神经网络模型的示意图;3 is a schematic diagram of a neural network model of a trajectory prediction method provided by a second embodiment of the present invention;
图4是本发明第二实施例提供的轨迹预测方法的应用示意图;4 is an application schematic diagram of a trajectory prediction method provided by a second embodiment of the present invention;
图5是本发明第三实施例提供的轨迹预测装置的结构示意图。FIG. 5 is a schematic structural diagram of a trajectory prediction apparatus according to a third embodiment of the present invention.
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而非全部实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, features, and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be described clearly and completely in combination with the drawings in the embodiments of the present invention. Obviously, the description The embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative work fall into the protection scope of the present invention.
请参阅图1,图1是本发明第一实施例提供的轨迹预测方法的实现流程示意图,该方法应用于设置有车载摄像头的车辆。如图1所示,该轨迹预测方法主要包括以下步骤:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a trajectory prediction method provided by a first embodiment of the present invention. The method is applied to a vehicle provided with a vehicle camera. As shown in Figure 1, the trajectory prediction method mainly includes the following steps:
101、利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列。101. Use a vehicle-mounted camera to photograph a surrounding environment, and obtain a video sequence including a surrounding vehicle and a vehicle background.
具体的,在车辆的自动行驶过程中,假设所有交通参与者都遵守交通规则,采用建立模型的方法来预测其他交通参与者的未来轨迹。在建立模型的过程中,需获取周围的环境信息,因此首先利用车辆上的车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列。其中,视频序列的每秒帧数可根据实际情况进行选用。其中,周围车辆可指与设置有车载摄像头的车辆的距离位于一定范围以内,对设置有车载摄像头的车辆存在潜在影响的车辆,该范围可为设置有车载摄像头的车辆的周围30米处。Specifically, during the automatic driving process of the vehicle, it is assumed that all traffic participants comply with the traffic rules, and a model method is used to predict the future trajectory of other traffic participants. In the process of establishing the model, the surrounding environment information needs to be obtained, so firstly the on-board camera on the vehicle is used to photograph the surrounding environment to obtain a video sequence including the surrounding vehicles and the vehicle background. Among them, the frames per second of the video sequence can be selected according to the actual situation. The surrounding vehicle may refer to a vehicle that is located within a certain distance from the vehicle provided with the on-board camera and has a potential influence on the vehicle provided with the on-board camera. The range may be 30 meters around the vehicle provided with the on-board camera.
102、从该视频序列中定位该周围车辆并提取该周围车辆的历史轨迹信息,将该视频序列进行图像分割得到的场景语义信息作为辅助信息。102. Locate the surrounding vehicle from the video sequence and extract historical track information of the surrounding vehicle, and use scene semantic information obtained by image segmentation of the video sequence as auxiliary information.
具体的,视频序列中的运动为快速连续地显示帧所形成的运动的假象,每一帧的视频序列为静止的图像,则在每一帧的视频序列中定位周围车辆,从连续的多帧视频序列中可以看到周围车辆的轨迹信息,因此对于当前帧的视频序列而言,从过去多帧的视频序列中获取到的是周围车辆的历史轨迹信息。Specifically, the motion in the video sequence is the illusion of motion formed by displaying frames in rapid succession. The video sequence of each frame is a still image. Then, the surrounding vehicles are located in the video sequence of each frame. The trajectory information of the surrounding vehicles can be seen in the video sequence, so for the video sequence of the current frame, the historical trajectory information of the surrounding vehicles is obtained from the video sequence of the past multiple frames.
其中,将每一帧的视频序列进行图像分割得到的场景语义信息作为辅助信息。图像分割是指将每一帧的视频序列中的物体按照语义类别进行分割并标注场景语义信息,如行人、周围车辆、建筑物、天空、植被、道路障碍、车道线、道路标识信息和交通信号灯信息等,进而识别当前帧的视频序列中的可行驶区域。通过将场景语义信息作为辅助信息可对于目标的表观变化具有一定的鲁棒性。The scene semantic information obtained by image segmentation of the video sequence of each frame is used as auxiliary information. Image segmentation refers to segmenting objects in the video sequence of each frame according to semantic categories and labeling scene semantic information, such as pedestrians, surrounding vehicles, buildings, sky, vegetation, road obstacles, lane lines, road identification information and traffic lights Information, etc., to further identify the drivable area in the video sequence of the current frame. By using scene semantic information as auxiliary information, it can be robust to the apparent change of the target.
可选的,由于不同的语义类别所对应的区域为不同特征区域,而不同特征区域的分界线是边缘,因此可采用边缘检测对每一帧的视频序列进行分割,从 而提取出所需要的目标。其中,边缘是表明一个特征区域的结束和另一个特征区域的开始,所需要的目标的内部特征或属性是一致的,与其他特征区域内部的特征或属性不一致,如灰度、颜色或者纹理等特征。Optionally, since the regions corresponding to different semantic categories are different feature regions, and the boundary between different feature regions is edges, edge detection can be used to segment the video sequence of each frame to extract the required target. Among them, the edge indicates the end of one feature area and the beginning of another feature area. The internal features or attributes of the required target are consistent and inconsistent with the features or attributes inside other feature areas, such as grayscale, color, or texture. feature.
103、将该历史轨迹信息和该辅助信息输入神经网络模型,得到该周围车辆的预测轨迹。103. Input the historical trajectory information and the auxiliary information into a neural network model to obtain a predicted trajectory of the surrounding vehicles.
具体的,神经网络是由大量且简单的神经元广泛地互相连接而形成的复杂网络系统,是一个高度复杂的非线性动力学习系统,具有大规模并行、分布式存储和处理、自组织、自适应和自学能力。因此,利用神经网络建立数学模型得到神经网络模型,并将得到的历史轨迹信息和辅助信息输入神经网络模型,得到周围车辆的预测轨迹。Specifically, a neural network is a complex network system formed by a large number of simple neurons widely connected to each other. It is a highly complex non-linear dynamic learning system with massively parallel, distributed storage and processing, self-organizing, and self-organizing. Adaptation and self-learning. Therefore, a neural network model is established using a neural network to obtain a neural network model, and the obtained historical trajectory information and auxiliary information are input to the neural network model to obtain a predicted trajectory of surrounding vehicles.
在本发明实施例中,通过车载摄像头获取包括周围车辆和车辆背景的视频序列,并且将视频序列进行图像分割获取场景语义信息,接着将场景语义信息和历史轨迹信息输入神经网络模型获取预测轨迹,而不是采用静态图像来提取场景语义信息进行分析,从而保证了本实施例中神经网络模型的时间连续性,进而提高了预测车辆轨迹的准确度。In the embodiment of the present invention, a video sequence including surrounding vehicles and vehicle backgrounds is acquired by an on-board camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain a predicted trajectory, Instead of using static images to extract scene semantic information for analysis, the time continuity of the neural network model in this embodiment is guaranteed, and the accuracy of predicting the trajectory of the vehicle is improved.
请参阅图2,图2是本发明第二实施例提供的轨迹预测方法的实现流程示意图,该方法应用于设置有车载摄像头的车辆。如图2所示,该轨迹预测方法主要包括以下步骤:Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a trajectory prediction method provided by a second embodiment of the present invention. The method is applied to a vehicle provided with a vehicle camera. As shown in FIG. 2, the trajectory prediction method mainly includes the following steps:
201、利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列。201. Use a vehicle-mounted camera to photograph the surrounding environment and obtain a video sequence including the surrounding vehicles and the vehicle background.
202、从该视频序列中定位该周围车辆并提取该周围车辆的历史轨迹信息,将该视频序列进行图像分割得到的场景语义信息作为辅助信息。202. Locate the surrounding vehicle from the video sequence and extract historical track information of the surrounding vehicle, and use scene semantic information obtained by image segmentation of the video sequence as auxiliary information.
203、将该辅助信息输入给该卷积神经网络,得到空间特征信息。203. Input the auxiliary information to the convolutional neural network to obtain spatial feature information.
具体的,该神经网络模型包括卷积神经网络、第一层长短期记忆网络、第二层长短期记忆网络和全连接层。Specifically, the neural network model includes a convolutional neural network, a first layer of short-term memory network, a second layer of short-term memory network, and a fully connected layer.
其中,卷积神经网络是一种前馈神经网络。视频序列进行图像分割和标注 后得到场景语义信息作为辅助信息输入到卷积神经网络,得到空间特征信息。辅助信息为图像信息,可采用一位有效编码进行编码,以频道数作为语义类别数量,将该辅助信息输入四层的卷积神经网络,该卷积核可为3*3*4,得到空间特征信息,该空间特征信息以6维向量表示。Among them, the convolutional neural network is a feedforward neural network. After segmenting and labeling the video sequence, the scene semantic information is obtained as auxiliary information and input to the convolutional neural network to obtain spatial feature information. Auxiliary information is image information, which can be encoded using a one-bit effective encoding. The number of channels is used as the number of semantic categories. The auxiliary information is input into a four-layer convolutional neural network. The convolution kernel can be 3 * 3 * 4 to obtain space Feature information, which is represented by a 6-dimensional vector.
其中,如图3所示,卷积神经网络包括卷积层、线性修正单元、池化层和Dropout层。卷积层可提取辅助信息中的特征。线性层可引入非线性特征。池化层可对输入的辅助信息进行压缩,并提取主要特征。Dropout层可用于缓解过拟合问题。Among them, as shown in FIG. 3, the convolutional neural network includes a convolutional layer, a linear correction unit, a pooling layer, and a Dropout layer. The convolutional layer can extract features from the auxiliary information. The linear layer can introduce non-linear features. The pooling layer can compress the input auxiliary information and extract the main features. Dropout layers can be used to alleviate the problem of overfitting.
204、将该历史轨迹信息输入该第一层长短期记忆网络,得到时间特征信息。将该空间特征信息和该时间特征信息输入该第二层长短期记忆网络,得到联合特征信息。204. Input the historical track information into the first layer of short-term and short-term memory network to obtain temporal characteristic information. The spatial feature information and the temporal feature information are input into the second-layer long-short-term memory network to obtain joint feature information.
具体的,长短期神经(Long-short Term Memory,LSTM)网络是一种时间递归网络。历史轨迹信息有一定的时序性,并且位置上存在一定上下文关联,即历史轨迹信息作为一个序列的输入需要不断进行学习前后的位置特征,因此利用LSTM网络对历史轨迹信息进行训练,并连接历史帧的轨迹信息用于推测当前帧的轨迹信息。Specifically, a long-short term neural (LSTM) network is a time-recursive network. The historical trajectory information has a certain time series, and there is a certain contextual relationship in the position, that is, the historical trajectory information as a sequence needs to continuously learn the position characteristics before and after learning, so the LSTM network is used to train the historical trajectory information and connect the historical frames The track information is used to infer the track information of the current frame.
其中,如图3所示,历史轨迹信息输入第一层LSTM网络,得到时间特征信息,将该时间特征信息和步骤203中得到的空间特征信息输入到第二层LSTM网络,得到联合表征信息。并且由于立体空间网格维数为6,第一层LSTM网络不仅可以学习到时间特征信息,而且可以使时间特征信息和空间特征信息的维数一致。在实际应用中,第一层LSTM网络的单元数可为100,第二层LSTM网络可包括两层单元数均为300的LSTM网络。Wherein, as shown in FIG. 3, the historical trajectory information is input into the first layer LSTM network to obtain temporal feature information, and the temporal feature information and the spatial feature information obtained in step 203 are input to the second layer LSTM network to obtain joint characterization information. And because the dimension of the spatial grid is 6, the first-level LSTM network can not only learn the time feature information, but also make the dimensions of the time feature information and the space feature information consistent. In practical applications, the number of units in the first layer LSTM network may be 100, and the second layer LSTM network may include two LSTM networks with 300 units in both layers.
205、将该联合特征信息输入全连接层,得到该预测轨迹。205: Enter the joint feature information into the fully connected layer to obtain the predicted trajectory.
具体的,全连接层的每一个结点都与上一层的所有结点相连,用来把上一层提取到的所有特征综合起来,因此将联合表征信息输入全连接层,进行一系列的矩阵相乘得到神经网络模型的输出,得到T个时间步长的预测轨迹J。在实 际应用中,时间T可为1.6s(单位:秒)Specifically, each node of the fully connected layer is connected to all nodes of the previous layer, and is used to integrate all the features extracted from the previous layer. Therefore, the joint characterization information is input into the fully connected layer to perform a series of The matrix is multiplied to obtain the output of the neural network model, and the predicted trajectory J at T time steps is obtained. In practical applications, the time T can be 1.6s (unit: second)
其中,该神经网络模型包括以下公式:The neural network model includes the following formulas:
J←M
p(h,a):H×A。
J ← M p (h, a): H × A.
其中,J表示该预测轨迹,M表示H、A与J之间的映射关系,H表示该历史轨迹信息,A表示该辅助信息,p表示该周围车辆,h表示在第t帧视频序列中车辆p的位置信息,a表示在第t帧视频序列中车辆p的场景语义信息,j表示在从T+1帧起第t帧视频序列中车辆p的位置信息,t表示每帧。Among them, J indicates the predicted trajectory, M indicates the mapping relationship between H, A, and J, H indicates the historical trajectory information, A indicates the auxiliary information, p indicates the surrounding vehicles, and h indicates vehicles in the t-th video sequence The position information of p, a indicates the scene semantic information of vehicle p in the t-th video sequence, j indicates the position information of vehicle p in the t-th video sequence from frame T + 1, and t indicates each frame.
其中,如图3所示,本实施例中提出图像分割-长短期记忆网络(Segmentation-Long-short Term Memory,SEG-LSTM)将历史帧的多流融合起来并预测周围车辆的未来轨迹。Among them, as shown in FIG. 3, an image segmentation-long-short term memory (SEG-LSTM) is proposed in this embodiment to fuse multiple streams of historical frames and predict the future trajectory of surrounding vehicles.
其中,LSTM网络的层数、每层LSTM网络的单元数、卷积神经网络的层数以及卷积核的尺寸都属于网络超参数,是经过交叉验证确定的。交叉验证的作用是确定最优的超参数,同时避免模型过拟合。示例性的,首先,将数据集分为训练集和测试集,比例为5:1。接着训练集均分为5部分,将每一部分轮流作为验证集,其余4部分作为训练集进行5次训练和验证,使用不同超参数可得到对应的平均准确率,取效果最优的超参数来确定其数值。Among them, the number of layers of the LSTM network, the number of units of each layer of the LSTM network, the number of layers of the convolutional neural network, and the size of the convolution kernel are all network hyperparameters, which are determined through cross-validation. The role of cross-validation is to determine the optimal hyperparameters while avoiding overfitting the model. Exemplarily, first, the data set is divided into a training set and a test set with a ratio of 5: 1. Then the training set is divided into 5 parts. Each part is taken as the verification set in turn, and the remaining 4 parts are used as the training set for 5 training and verification. Different hyperparameters can be used to obtain the corresponding average accuracy rate. Determine its value.
如图4所示,将视频序列按帧划分为多个时间步长的视频序列,并从每一帧的视频序列进行检测与跟踪得到位置信息,进行图像分割,得到语义信息。随后,将同一帧的位置信息和语义信息输入LSTM网络进行训练,通过对多个历史帧和当前帧的视频序列进行训练,得到预测轨迹,As shown in FIG. 4, a video sequence is divided into multiple time step video sequences according to frames, and position information is detected and tracked from each frame of the video sequence, and image segmentation is performed to obtain semantic information. Subsequently, the position information and semantic information of the same frame are input into the LSTM network for training, and the predicted trajectory is obtained by training multiple historical frames and video sequences of the current frame.
206、通过该深度相机,分别获取该车辆与各该周围车辆的最小相对距离。根据该最小相对距离,将该二维空间预测轨迹转换为三维空间预测轨迹。206. Use the depth camera to obtain the minimum relative distance between the vehicle and each of the surrounding vehicles. According to the minimum relative distance, the two-dimensional spatial prediction trajectory is converted into a three-dimensional spatial prediction trajectory.
具体的,该预测轨迹为二维空间预测轨迹,该车辆中还设置有深度相机。Specifically, the predicted trajectory is a two-dimensional spatial predicted trajectory, and a depth camera is further provided in the vehicle.
其中,通过以下公式,根据该最小相对距离,将该二维空间预测轨迹转换为三维空间预测轨迹:The following two formulas are used to convert the two-dimensional spatial prediction trajectory into a three-dimensional spatial prediction trajectory according to the minimum relative distance:
其中,x,y,w,h分别表示二维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,x
r,y
r,w
r,h
r分别表示三维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,f表示为该深度相机的焦距,d
min表示为该车辆与各该周围车辆的最小相对距离。
Among them, x, y, w, and h respectively represent the elements of the two-dimensional spatial prediction trajectory in the pixel bounding box of each frame of the video sequence, and x r , y r , w r , and h r respectively represent the three-dimensional spatial prediction trajectory in each frame. The elements in the pixel bounding box in a frame of video sequence, f is the focal length of the depth camera, and d min is the minimum relative distance between the vehicle and each of the surrounding vehicles.
其中,若忽略下标p,历史轨迹信息和预测轨迹可定义为一个三维空间占据网格,即Among them, if the subscript p is ignored, the historical trajectory information and predicted trajectory can be defined as a three-dimensional space occupying grid, that is,
H,J∈R
6={x,y,w,h,d
min,d
max}
H, J∈R 6 = {x, y, w, h, d min , d max }
式中,d
max表示该车辆与各该周围车辆的最大距离。
In the formula, d max represents the maximum distance between the vehicle and each of the surrounding vehicles.
在本发明实施例中,首先,通过车载摄像头获取包括周围车辆和车辆背景的视频序列,并且将视频序列进行图像分割获取场景语义信息,接着将场景语义信息和历史轨迹信息输入神经网络模型获取预测轨迹,而不是采用静态图像来提取场景语义信息进行分析,从而保证了本实施例中神经网络模型的时间连续性,进而提高了预测车辆轨迹的准确度。另外,采用卷积神经网络和LSTM网络可提高对周围车辆追踪的鲁棒性,并且采用图像分割得到场景语义信息,可提高训练过程的可解释性。In the embodiment of the present invention, first, a video sequence including surrounding vehicles and vehicle backgrounds is acquired through a vehicle camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain predictions. Trajectories, instead of using static images to extract scene semantic information for analysis, thereby ensuring the temporal continuity of the neural network model in this embodiment, and thereby improving the accuracy of predicting vehicle trajectories. In addition, the use of convolutional neural networks and LSTM networks can improve the robustness of surrounding vehicle tracking, and image segmentation can be used to obtain scene semantic information, which can improve the interpretability of the training process.
请参阅图5,图5是本发明第三实施例提供的轨迹预测装置的结构示意图,应用于设置有车载摄像头的车辆。如图5所示,该轨迹预测装置主要包括:Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a trajectory prediction device provided by a third embodiment of the present invention, which is applied to a vehicle provided with a vehicle camera. As shown in FIG. 5, the trajectory prediction device mainly includes:
获取模块301,用于利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列。The obtaining module 301 is configured to use an on-vehicle camera to photograph the surrounding environment and obtain a video sequence including the surrounding vehicles and the vehicle background.
提取分割模块302,用于从视频序列中定位周围车辆并提取周围车辆的历史轨迹信息,将视频序列进行图像分割得到的场景语义信息作为辅助信息。An extraction segmentation module 302 is used to locate surrounding vehicles from the video sequence and extract historical track information of the surrounding vehicles, and use scene semantic information obtained by image segmentation of the video sequence as auxiliary information.
输出模块303,用于将历史轨迹信息和辅助信息输入神经网络模型,得到 周围车辆的预测轨迹。An output module 303 is configured to input historical trajectory information and auxiliary information into a neural network model to obtain a predicted trajectory of surrounding vehicles.
进一步地,神经网络模型包括卷积神经网络、第一层长短期记忆网络、第二层长短期记忆网络和全连接层,则,Further, the neural network model includes a convolutional neural network, a first layer of short-term memory network, a second layer of short-term memory network, and a fully connected layer.
输出模块303,还用于将辅助信息输入给卷积神经网络,得到空间特征信息。The output module 303 is further configured to input auxiliary information to the convolutional neural network to obtain spatial feature information.
输出模块303,还用于将历史轨迹信息输入第一层长短期记忆网络,得到时间特征信息。The output module 303 is further configured to input the historical trajectory information into the first layer of short-term and short-term memory network to obtain temporal characteristic information.
输出模块303,还用于将空间特征信息和时间特征信息输入第二层长短期记忆网络,得到联合特征信息。The output module 303 is further configured to input the spatial feature information and the temporal feature information into the second-layer long-term and short-term memory network to obtain joint feature information.
输出模块303,还用于将联合特征信息输入全连接层,得到预测轨迹。The output module 303 is further configured to input the joint feature information into the fully connected layer to obtain a predicted trajectory.
进一步地,神经网络模型包括以下公式:Further, the neural network model includes the following formula:
J←M
p(h,a):H×A。
J ← M p (h, a): H × A.
其中,J表示预测轨迹,M表示H、A与J之间的映射关系,H表示历史轨迹信息,A表示辅助信息,p表示周围车辆,h表示在第t帧视频序列中车辆p的位置信息,a表示在第t帧视频序列中车辆p的场景语义信息,j表示在从T+1帧起第t帧视频序列中车辆p的位置信息,t表示每帧。Among them, J is the predicted trajectory, M is the mapping relationship between H, A, and J, H is the historical trajectory information, A is the auxiliary information, p is the surrounding vehicles, and h is the position information of the vehicle p in the t-th video sequence. , A indicates the scene semantic information of vehicle p in the t-th video sequence, j indicates the position information of vehicle p in the t-th video sequence from frame T + 1, and t indicates each frame.
进一步地,预测轨迹为二维空间预测轨迹,车辆中还设置有深度相机,Further, the predicted trajectory is a two-dimensional spatial predicted trajectory, and a depth camera is also provided in the vehicle.
获取模块301,还用于通过深度相机,分别获取车辆与各周围车辆的最小相对距离。The obtaining module 301 is further configured to obtain a minimum relative distance between a vehicle and each surrounding vehicle through a depth camera.
则装置还包括转换模块304,The device further includes a conversion module 304,
转换模块304,用于根据最小相对距离,将二维空间预测轨迹转换为三维空间预测轨迹。The conversion module 304 is configured to convert the two-dimensional space prediction trajectory into a three-dimensional space prediction trajectory according to the minimum relative distance.
进一步地,转换模块304,还用于通过以下公式,根据最小相对距离,将 二维空间预测轨迹转换为三维空间预测轨迹:Further, the conversion module 304 is further configured to convert the two-dimensional space prediction trajectory into a three-dimensional space prediction trajectory according to the minimum relative distance by the following formula:
其中,x,y,w,h分别表示二维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,x
r,y
r,w
r,h
r分别表示三维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,f表示为深度相机的焦距,d
min表示为车辆与各周围车辆的最小相对距离。
Among them, x, y, w, and h respectively represent the elements of the two-dimensional spatial prediction trajectory in the pixel bounding box of each frame of the video sequence, and x r , y r , w r , and h r respectively represent the three-dimensional spatial prediction trajectory in each frame. An element in the pixel bounding box of a frame of video sequence, where f is the focal length of the depth camera and d min is the minimum relative distance between the vehicle and each surrounding vehicle.
上述模块实现各自功能的过程具体可参考上述如图1至图4所示实施例中的相关内容,此处不再赘述。For the process of implementing the respective functions of the foregoing modules, reference may be made to related content in the embodiments shown in FIG. 1 to FIG. 4 above, and details are not described herein again.
在本发明实施例中,通过车载摄像头获取包括周围车辆和车辆背景的视频序列,并且将视频序列进行图像分割获取场景语义信息,接着将场景语义信息和历史轨迹信息输入神经网络模型获取预测轨迹,而不是采用静态图像来提取场景语义信息进行分析,从而保证了本实施例中神经网络模型的时间连续性,进而提高了预测车辆轨迹的准确度。In the embodiment of the present invention, a video sequence including surrounding vehicles and vehicle backgrounds is acquired by an on-board camera, and the video sequence is segmented to obtain scene semantic information, and then the scene semantic information and historical trajectory information are input to a neural network model to obtain a predicted trajectory, Instead of using static images to extract scene semantic information for analysis, the time continuity of the neural network model in this embodiment is guaranteed, and the accuracy of predicting the trajectory of the vehicle is improved.
在本申请所提供的多个实施例中,应该理解到,所揭露的方法及装置,可以通过其他的方式实现。例如,以上所描述的实施例仅仅是示意性的,例如,所述模块的划分,仅仅作为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信链接可以使通过一些接口,模块的间接耦合或通信链接,可以是电性,机械或其他的形式。In the multiple embodiments provided in this application, it should be understood that the disclosed methods and devices may be implemented in other ways. For example, the embodiments described above are only schematic. For example, the division of the modules is only a logical function division. In actual implementation, there may be another division manner. For example, multiple modules or components may be combined or may be combined. Integration into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual couplings or direct couplings or communication links can make the indirect coupling or communication links of the modules through some interfaces, which can be electrical, mechanical or other forms.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, which may be located in one place, or may be distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中。 也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的方式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into a processing module. Each module may exist separately physically, or two or more modules may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本发明所必须的。It should be noted that, for the foregoing method embodiments, for simplicity of description, they are all described as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence. Because according to the present invention, certain steps may be performed in another order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not described in detail in one embodiment, reference may be made to related descriptions in other embodiments.
以上为对本发明所提供的一种轨迹预测方法及装置、终端及计算机可读存储介质的描述,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本发明的限制。The foregoing is a description of a trajectory prediction method and device, a terminal, and a computer-readable storage medium provided by the present invention. For those of ordinary skill in the art, based on the ideas of the embodiments of the present invention, both the specific implementation and the scope of application are There may be changes. In summary, the content of this description should not be construed as a limitation on the present invention.
Claims (10)
- 一种轨迹预测方法,应用于设置有车载摄像头的车辆,其特征在于,所述方法包括:A trajectory prediction method, which is applied to a vehicle provided with a vehicle camera, is characterized in that the method includes:利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列;Use the vehicle camera to photograph the surrounding environment and obtain a video sequence including the surrounding vehicles and the vehicle background;从所述视频序列中定位所述周围车辆并提取所述周围车辆的历史轨迹信息,将所述视频序列进行图像分割得到的场景语义信息作为辅助信息;Locate the surrounding vehicle from the video sequence and extract historical track information of the surrounding vehicle, and use scene semantic information obtained by image segmentation of the video sequence as auxiliary information;将所述历史轨迹信息和所述辅助信息输入神经网络模型,得到所述周围车辆的预测轨迹。The historical trajectory information and the auxiliary information are input into a neural network model to obtain a predicted trajectory of the surrounding vehicles.
- 如权利要求1所述的轨迹预测方法,其特征在于,所述神经网络模型包括卷积神经网络、第一层长短期记忆网络、第二层长短期记忆网络和全连接层,则所述将所述历史轨迹信息和所述辅助信息输入神经网络模型,得到所述周围车辆的预测轨迹包括:The trajectory prediction method according to claim 1, wherein the neural network model comprises a convolutional neural network, a first layer of long-term short-term memory network, a second layer of long-term short-term memory network, and a fully connected layer, then the The historical trajectory information and the auxiliary information are input into a neural network model, and obtaining the predicted trajectory of the surrounding vehicles includes:将所述辅助信息输入给所述卷积神经网络,得到空间特征信息;Inputting the auxiliary information to the convolutional neural network to obtain spatial feature information;将所述历史轨迹信息输入所述第一层长短期记忆网络,得到时间特征信息;Inputting the historical trajectory information into the first layer of short-term and short-term memory network to obtain temporal characteristic information;将所述空间特征信息和所述时间特征信息输入所述第二层长短期记忆网络,得到联合特征信息;Inputting the spatial feature information and the temporal feature information into the second-layer long-short-term memory network to obtain joint feature information;将所述联合特征信息输入全连接层,得到所述预测轨迹。The joint feature information is input to a fully connected layer to obtain the predicted trajectory.
- 如权利要求1所述的轨迹预测方法,其特征在于,所述神经网络模型包括以下公式:The trajectory prediction method according to claim 1, wherein the neural network model includes the following formula:J←M p(h,a):H×A; J ← M p (h, a): H × A;其中,J表示所述预测轨迹,M表示H、A与J之间的映射关系,H表示所 述历史轨迹信息,A表示所述辅助信息,p表示所述周围车辆,h表示在第t帧视频序列中车辆p的位置信息,a表示在第t帧视频序列中车辆p的场景语义信息,j表示在从T+1帧起第t帧视频序列中车辆p的位置信息,t表示每帧。Among them, J indicates the predicted trajectory, M indicates the mapping relationship between H, A, and J, H indicates the historical trajectory information, A indicates the auxiliary information, p indicates the surrounding vehicles, and h indicates the t frame. Location information of vehicle p in the video sequence, a represents the scene semantic information of vehicle p in the t-th video sequence, j represents the location of vehicle p in the t-th video sequence from frame T + 1, and t represents each frame .
- 如权利要求1所述的轨迹预测方法,其特征在于,所述预测轨迹为二维空间预测轨迹,所述车辆中还设置有深度相机,则所述方法还包括:The trajectory prediction method according to claim 1, wherein the predicted trajectory is a two-dimensional spatial prediction trajectory, and a depth camera is further provided in the vehicle, and the method further comprises:通过所述深度相机,分别获取所述车辆与各所述周围车辆的最小相对距离;Obtaining the minimum relative distance between the vehicle and each of the surrounding vehicles through the depth camera;根据所述最小相对距离,将所述二维空间预测轨迹转换为三维空间预测轨迹。Converting the two-dimensional spatial prediction trajectory into a three-dimensional spatial prediction trajectory according to the minimum relative distance.
- 如权利要求4所述的轨迹预测方法,其特征在于,通过以下公式,根据所述最小相对距离,将所述二维空间预测轨迹转换为三维空间预测轨迹:The trajectory prediction method according to claim 4, wherein the two-dimensional spatial prediction trajectory is converted into a three-dimensional spatial prediction trajectory according to the minimum relative distance by the following formula:其中,x,y,w,h分别表示二维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,x r,y r,w r,h r分别表示三维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,f表示为所述深度相机的焦距,d min表示为所述车辆与各所述周围车辆的最小相对距离。 Among them, x, y, w, and h respectively represent the elements of the two-dimensional spatial prediction trajectory in the pixel bounding box of each frame of the video sequence, and x r , y r , w r , and h r respectively represent the three-dimensional spatial prediction trajectory in each frame. An element in a pixel bounding box in a frame of video sequence, where f is the focal length of the depth camera, and d min is the minimum relative distance between the vehicle and each of the surrounding vehicles.
- 一种轨迹预测装置,应用于设置有车载摄像头的车辆,其特征在于,所述装置包括:A trajectory prediction device applied to a vehicle provided with a vehicle-mounted camera is characterized in that the device includes:获取模块,用于利用车载摄像头对周围环境进行摄影,获取包括有周围车辆和车辆背景的视频序列;An acquisition module, configured to use a vehicle camera to photograph the surrounding environment, and obtain a video sequence including surrounding vehicles and the vehicle background;提取分割模块,用于从所述视频序列中定位所述周围车辆并提取所述周围车辆的历史轨迹信息,将所述视频序列进行图像分割得到的场景语义信息作为辅助信息;An extraction segmentation module, configured to locate the surrounding vehicles from the video sequence and extract historical track information of the surrounding vehicles, and use scene semantic information obtained by image segmentation of the video sequence as auxiliary information;输出模块,用于将所述历史轨迹信息和所述辅助信息输入神经网络模型,得到所述周围车辆的预测轨迹。An output module is configured to input the historical trajectory information and the auxiliary information into a neural network model to obtain a predicted trajectory of the surrounding vehicles.
- 如权利要求6所述的轨迹预测装置,其特征在于,所述神经网络模型包 括卷积神经网络、第一层长短期记忆网络、第二层长短期记忆网络和全连接层,则,The trajectory prediction device according to claim 6, wherein the neural network model includes a convolutional neural network, a first layer of short-term memory network, a second layer of short-term memory network, and a fully connected layer,所述输出模块,还用于将所述辅助信息输入给所述卷积神经网络,得到空间特征信息;The output module is further configured to input the auxiliary information to the convolutional neural network to obtain spatial feature information;所述输出模块,还用于将所述历史轨迹信息输入所述第一层长短期记忆网络,得到时间特征信息;The output module is further configured to input the historical trajectory information into the first-layer long-short-term memory network to obtain temporal characteristic information;所述输出模块,还用于将所述空间特征信息和所述时间特征信息输入所述第二层长短期记忆网络,得到联合特征信息;The output module is further configured to input the spatial feature information and the temporal feature information into the second-layer long-short-term memory network to obtain joint feature information;所述输出模块,还用于将所述联合特征信息输入全连接层,得到所述预测轨迹。The output module is further configured to input the joint feature information into a fully connected layer to obtain the predicted trajectory.
- 如权利要求6所述的轨迹预测装置,其特征在于,所述神经网络模型包括以下公式:The trajectory prediction device according to claim 6, wherein the neural network model includes the following formula:J←M p(h,a):H×A; J ← M p (h, a): H × A;其中,J表示所述预测轨迹,M表示H、A与J之间的映射关系,H表示所述历史轨迹信息,A表示所述辅助信息,p表示所述周围车辆,h表示在第t帧视频序列中车辆p的位置信息,a表示在第t帧视频序列中车辆p的场景语义信息,j表示在从T+1帧起第t帧视频序列中车辆p的位置信息,t表示每帧。Among them, J indicates the predicted trajectory, M indicates the mapping relationship between H, A, and J, H indicates the historical trajectory information, A indicates the auxiliary information, p indicates the surrounding vehicles, and h indicates the t frame. Location information of vehicle p in the video sequence, a represents the scene semantic information of vehicle p in the t-th video sequence, j represents the location of vehicle p in the t-th video sequence from frame T + 1, and t represents each frame .
- 如权利要求6所述的轨迹预测装置,其特征在于,所述预测轨迹为二维空间预测轨迹,所述车辆中还设置有深度相机,The trajectory prediction device according to claim 6, wherein the predicted trajectory is a two-dimensional spatial prediction trajectory, and a depth camera is further provided in the vehicle,所述获取模块,还用于通过所述深度相机,分别获取所述车辆与各所述周围车辆的最小相对距离;The obtaining module is further configured to obtain the minimum relative distance between the vehicle and each of the surrounding vehicles through the depth camera;则所述装置还包括转换模块,The device further includes a conversion module,所述转换模块,用于根据所述最小相对距离,将所述二维空间预测轨迹转 换为三维空间预测轨迹。The conversion module is configured to convert the two-dimensional space prediction trajectory into a three-dimensional space prediction trajectory according to the minimum relative distance.
- 如权利要求9所述的轨迹预测装置,其特征在于,The trajectory prediction device according to claim 9, wherein:所述转换模块,还用于通过以下公式,根据所述最小相对距离,将所述二维空间预测轨迹转换为三维空间预测轨迹:The conversion module is further configured to convert the two-dimensional spatial prediction trajectory into a three-dimensional spatial prediction trajectory according to the minimum relative distance by the following formula:其中,x,y,w,h分别表示二维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,x r,y r,w r,h r分别表示三维空间预测轨迹在每一帧视频序列中的像素边界框中的元素,f表示为所述深度相机的焦距,d min表示为所述车辆与各所述周围车辆的最小相对距离。 Among them, x, y, w, and h respectively represent the elements of the two-dimensional spatial prediction trajectory in the pixel bounding box of each frame of the video sequence, and x r , y r , w r , and h r respectively represent the three-dimensional spatial prediction trajectory in each frame. An element in a pixel bounding box in a frame of video sequence, where f is the focal length of the depth camera, and d min is the minimum relative distance between the vehicle and each of the surrounding vehicles.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/095144 WO2020010517A1 (en) | 2018-07-10 | 2018-07-10 | Trajectory prediction method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/095144 WO2020010517A1 (en) | 2018-07-10 | 2018-07-10 | Trajectory prediction method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020010517A1 true WO2020010517A1 (en) | 2020-01-16 |
Family
ID=69142182
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/095144 WO2020010517A1 (en) | 2018-07-10 | 2018-07-10 | Trajectory prediction method and apparatus |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020010517A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111895931A (en) * | 2020-07-17 | 2020-11-06 | 嘉兴泊令科技有限公司 | Coal mine operation area calibration method based on computer vision |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106873580A (en) * | 2015-11-05 | 2017-06-20 | 福特全球技术公司 | Based on perception data autonomous driving at the intersection |
CN107144285A (en) * | 2017-05-08 | 2017-09-08 | 深圳地平线机器人科技有限公司 | Posture information determines method, device and movable equipment |
CN107438873A (en) * | 2017-07-07 | 2017-12-05 | 驭势科技(北京)有限公司 | A kind of method and apparatus for being used to control vehicle to travel |
US20180023960A1 (en) * | 2016-07-21 | 2018-01-25 | Mobileye Vision Technologies Ltd. | Distributing a crowdsourced sparse map for autonomous vehicle navigation |
CN108068819A (en) * | 2016-11-17 | 2018-05-25 | 福特全球技术公司 | Emergency vehicle in detection and response road |
CN108196535A (en) * | 2017-12-12 | 2018-06-22 | 清华大学苏州汽车研究院(吴江) | Automated driving system based on enhancing study and Multi-sensor Fusion |
CN108803617A (en) * | 2018-07-10 | 2018-11-13 | 深圳大学 | Trajectory predictions method and device |
-
2018
- 2018-07-10 WO PCT/CN2018/095144 patent/WO2020010517A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106873580A (en) * | 2015-11-05 | 2017-06-20 | 福特全球技术公司 | Based on perception data autonomous driving at the intersection |
US20180023960A1 (en) * | 2016-07-21 | 2018-01-25 | Mobileye Vision Technologies Ltd. | Distributing a crowdsourced sparse map for autonomous vehicle navigation |
CN108068819A (en) * | 2016-11-17 | 2018-05-25 | 福特全球技术公司 | Emergency vehicle in detection and response road |
CN107144285A (en) * | 2017-05-08 | 2017-09-08 | 深圳地平线机器人科技有限公司 | Posture information determines method, device and movable equipment |
CN107438873A (en) * | 2017-07-07 | 2017-12-05 | 驭势科技(北京)有限公司 | A kind of method and apparatus for being used to control vehicle to travel |
CN108196535A (en) * | 2017-12-12 | 2018-06-22 | 清华大学苏州汽车研究院(吴江) | Automated driving system based on enhancing study and Multi-sensor Fusion |
CN108803617A (en) * | 2018-07-10 | 2018-11-13 | 深圳大学 | Trajectory predictions method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111895931A (en) * | 2020-07-17 | 2020-11-06 | 嘉兴泊令科技有限公司 | Coal mine operation area calibration method based on computer vision |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108803617B (en) | Trajectory prediction method and apparatus | |
US11315266B2 (en) | Self-supervised depth estimation method and system | |
Ramos et al. | Detecting unexpected obstacles for self-driving cars: Fusing deep learning and geometric modeling | |
JP6766844B2 (en) | Object identification device, mobile system, object identification method, object identification model learning method and object identification model learning device | |
US20210150203A1 (en) | Parametric top-view representation of complex road scenes | |
CN110837778A (en) | Traffic police command gesture recognition method based on skeleton joint point sequence | |
EP3822852A2 (en) | Method, apparatus, computer storage medium and program for training a trajectory planning model | |
CN111563415A (en) | Binocular vision-based three-dimensional target detection system and method | |
CN116258817B (en) | Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction | |
CN113936139A (en) | Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation | |
CN114022830A (en) | Target determination method and target determination device | |
JP7276607B2 (en) | Methods and systems for predicting crowd dynamics | |
CN109919026B (en) | Surface unmanned ship local path planning method | |
JP2019149142A (en) | Object marking system and object marking method | |
CN112396000A (en) | Method for constructing multi-mode dense prediction depth information transmission model | |
WO2021175434A1 (en) | System and method for predicting a map from an image | |
CN114120283A (en) | Method for distinguishing unknown obstacles in road scene three-dimensional semantic segmentation | |
CN116194951A (en) | Method and apparatus for stereoscopic based 3D object detection and segmentation | |
Tran et al. | Enhancement of robustness in object detection module for advanced driver assistance systems | |
CN114419603A (en) | Automatic driving vehicle control method and system and automatic driving vehicle | |
Rashed et al. | Bev-modnet: Monocular camera based bird's eye view moving object detection for autonomous driving | |
CN112233079B (en) | Method and system for fusing images of multiple sensors | |
WO2020010517A1 (en) | Trajectory prediction method and apparatus | |
CN106650814B (en) | Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision | |
CN111611869B (en) | End-to-end monocular vision obstacle avoidance method based on serial deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18926159 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/05/2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18926159 Country of ref document: EP Kind code of ref document: A1 |