Disclosure of Invention
The embodiment of the application provides a method and a device for detecting an obstacle, and the effectiveness of detecting the obstacle is improved.
A first aspect of an embodiment of the present application provides an obstacle detection method, including: acquiring a first image, wherein the first image can be an image directly shot by a camera or a frame of image in a video shot by the camera, and the first image comprises at least one obstacle; the boundary of the at least one obstacle is determined based on a boundary information network model, which may be a deep neural network trained in advance, wherein the boundary of the at least one obstacle includes a boundary formed by the obstacle and the road surface, and the boundary may be used for determining the position of the obstacle.
Compared with the attribute information of the shape, the size, the color, the texture, the material, the motion state and the like of the obstacle, the attribute information of the boundary of the obstacle is more stable and single, and the universality and the generalization are better; specifically, for different obstacles of the same category, the similarity of the boundaries of the obstacles is high, and for different obstacles of the different category, the boundaries of the obstacles have certain similarity; therefore, for an obstacle not included in the training sample set, if the training sample set includes other obstacles of the same category as the obstacle, the boundary of the obstacle may be determined based on the boundary information network model, and if the training sample set does not include other obstacles of the same category as the obstacle but includes other obstacles whose boundaries are similar to the boundary of the obstacle, the boundary of the obstacle may also be determined based on the boundary information network model; therefore, the obstacle detection method can be used for detecting the obstacles by determining the boundaries of the obstacles, is beneficial to detecting a larger number of obstacles, and can improve the effectiveness of obstacle detection.
As one implementation, the boundary information network model is trained based on empirical obstacle boundary information, where the empirical obstacle boundary information may be information related to an empirical obstacle boundary, for example, the empirical obstacle boundary information may include an occupation boundary of an empirical obstacle, and may also include a unique identifier ID of the occupation boundary of the empirical obstacle; classifying the empirical obstacle boundary information according to the source of the empirical obstacle boundary information, wherein the empirical obstacle boundary information may include historical obstacle boundary information and/or sample obstacle boundary information; the sample obstacle boundary information can be understood as boundary information obtained by manually labeling obstacles in the sample image; the historical obstacle boundary information may be understood as a priori obstacle boundary information, i.e. boundary information that can be obtained without manual labeling, for example, the historical obstacle boundary information may be boundary information of an obstacle that already exists in a map.
Because the historical barrier boundary information can be obtained without manual marking, the marking cost can be reduced by training a boundary information network model based on the historical barrier boundary information; the sample obstacle boundary information is acquired through manual marking, and various obstacles can be selected for marking in the manual marking process, so that the sample obstacle boundary information can increase the diversity of the boundary information, and the performance of the boundary information network model can be improved by training the boundary information network model based on the sample obstacle boundary information, so that the obstacle detection effectiveness is improved.
As one implementation, the sample obstacle boundary information is obtained by taking an ordered set of points along a boundary line segment between a lower edge of an obstacle in the image and the travelable road surface, wherein the lower edge is an edge close to the travelable road surface; or the sample obstacle boundary information is obtained through a boundary line segment between the lower edge of a mask of the obstacle in the image and the travelable road surface, the mask can be understood as an image for covering, and the mask of the obstacle can be understood as an image for covering the obstacle; or, the sample obstacle boundary information is generated by a simulation engine, and the scene image simulated by the simulation engine is an image containing the obstacle.
The implementation mode provides various feasible schemes for obtaining the boundary information of the sample obstacle, so that the mode for obtaining the boundary information of the sample obstacle is more flexible; the method is simple and easy to operate, the boundary information of the sample obstacle is obtained through the boundary line segment between the lower edge of the mask of the obstacle and the drivable road surface in the image, and the information of the boundary of the sample obstacle is obtained through the boundary line segment between the lower edge of the mask of the obstacle and the drivable road surface in the image, so that only the starting point and the ending point of the boundary line segment need to be marked without point-by-point taking, and the marking efficiency can be improved; the boundary information of the sample obstacle is generated through the simulation engine, manual marking is not needed, and marking cost can be reduced.
As one implementation, determining the boundary of the at least one obstacle based on the boundary information network model includes: inputting the first image into a boundary information network model, and classifying each pixel point in the first image based on empirical obstacle boundary information as a category, wherein the classification result can be pedestrians, vehicles, lanes, lane lines, sidewalks and the like; and processing the classification result to obtain the boundary of at least one obstacle.
In the implementation mode, based on the empirical obstacle boundary information as a category, each pixel point in the first image is classified, and the classification result is processed to obtain the boundary of at least one obstacle, so that the boundary of the obstacle is obtained through semantic segmentation.
As an implementation manner, pixel points occupied by the boundary of at least one obstacle are continuous in a first direction, where the first direction may be a pixel width direction of an image, and the pixel width direction corresponds to a horizontal direction of the image.
If the pixel points occupied by the boundary of the obstacle are discontinuous in the first direction, the boundary of the obstacle cannot well reflect the size of the obstacle in the first direction, and the user may mistakenly consider the discontinuous part as a feasible area; in contrast, in the implementation mode, the pixel points occupied by the boundary of the obstacle are continuous in the first direction, so that the size of the obstacle in the first direction can be well reflected, and the user can accurately identify the drivable area.
As one way of implementation, the at least one obstacle includes a first obstacle and a second obstacle; determining the boundary of the at least one obstacle based on the boundary information network model comprises: and determining the boundary of the first obstacle and the boundary of the second obstacle, wherein the intersection of the pixel point occupied by the boundary of the first obstacle and the pixel point occupied by the boundary of the second obstacle is an empty set.
The implementation mode provides a feasible scheme for determining the boundaries of the obstacles in a scene of a plurality of obstacles, and specifically, if the intersection of pixel points occupied by the boundaries of two obstacles is an empty set, the boundaries of the first obstacle and the boundaries of the second obstacle are respectively determined.
As an implementation manner, the method further includes: the size of the area occupied by the at least one obstacle in the first image is determined according to the boundary of the at least one obstacle and the preset pixel height of the obstacle in the image, wherein the pixel height can be understood as the size in the vertical direction of the first image, but the pixel height is preset and has no direct relation with the actual height of the obstacle, so the pixel height can be larger than the height of the obstacle in the first image and can also be smaller than the height of the obstacle in the first image.
The determination of the boundary of the obstacle is equivalent to the determination of the position of the obstacle, and since the actual obstacles have a certain volume, the obstacle is represented only by the position of the obstacle, which is not intuitive and three-dimensional enough, so in this implementation, the size of the area occupied by the at least one obstacle in the first image is determined according to the boundary of the at least one obstacle and the pixel height of the obstacle in the preset image, and the obstacle can be represented more intuitively and three-dimensionally.
A second aspect of embodiments of the present application provides an obstacle detection device, including: the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a first image which comprises at least one obstacle; a determining unit for determining a boundary of at least one obstacle based on the boundary information network model; wherein the boundary of at least one obstacle comprises a boundary formed by the obstacle and the road surface.
As an implementation manner, the boundary information network model is obtained by training based on empirical obstacle boundary information, and the empirical obstacle boundary information includes historical obstacle boundary information and/or sample obstacle boundary information.
As an implementation manner, the sample obstacle boundary information is obtained by taking an ordered set of points along a boundary line segment between the lower edge of the obstacle in the image and the drivable road surface; or the boundary information of the sample obstacle is obtained by taking a boundary line segment between the lower edge of the mask of the obstacle in the image and the drivable road surface; or, the sample obstacle boundary information is generated by a simulation engine, and the scene image simulated by the simulation engine is an image containing the obstacle.
As an implementation manner, the determining unit is specifically configured to: inputting the first image into a boundary information network model, and classifying each pixel point in the first image based on empirical obstacle boundary information as a category; and processing the classification result to obtain the boundary of at least one obstacle.
As an implementation manner, the pixel points occupied by the boundary of at least one obstacle are continuous in the first direction.
As one way of implementation, the at least one obstacle includes a first obstacle and a second obstacle; the determination unit is specifically configured to: and determining the boundary of the first obstacle and the boundary of the second obstacle, wherein the intersection of the pixel point occupied by the boundary of the first obstacle and the pixel point occupied by the boundary of the second obstacle is an empty set.
As an implementation manner, the determining unit is further configured to: and determining the size of the area occupied by the at least one obstacle in the first image according to the boundary of the at least one obstacle and the preset pixel height of the obstacle in the image.
For specific implementation, related descriptions, and technical effects of the above units, please refer to the description of the first aspect of the embodiments of the present application.
A third aspect of the embodiments of the present application provides an obstacle detection device, including: one or more processors and memory; wherein the memory has stored therein computer readable instructions; the one or more processors read the computer readable instructions in the memory to cause the obstacle detection apparatus to implement the method of any one of the first aspect and the various possible implementations described above.
A fourth aspect of embodiments of the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to perform the method according to the first aspect and any one of the various possible implementations.
A fifth aspect of embodiments of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method according to the first aspect and any one of the various possible implementations.
A sixth aspect of embodiments of the present application provides a chip, which includes one or more processors. Part or all of the processor is configured to read and execute the computer program stored in the memory to perform the method in any possible implementation manner of the first aspect.
Optionally, the chip may include a memory, and the memory and the processor may be connected to the memory through a circuit or a wire. Further optionally, the chip further comprises a communication interface, and the processor is connected to the communication interface. The communication interface is used for receiving data and/or information needing to be processed, the processor acquires the data and/or information from the communication interface, processes the data and/or information, and outputs a processing result through the communication interface. The communication interface may be an input output interface.
In some implementations, some of the one or more processors may also implement some of the steps of the above method by means of dedicated hardware, for example, a process involving a neural network model may be implemented by a dedicated neural network processor or a graphics processor.
The method provided by the embodiment of the application can be realized by one chip or by cooperation of a plurality of chips.
A seventh aspect of embodiments of the present application provides a vehicle including the apparatus in any possible implementation manner of the second aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
compared with the attribute information of the shape, the size, the color, the texture, the material, the motion state and the like of the obstacle, the attribute information of the boundary of the obstacle is more stable and single, and the universality and the generalization are better; specifically, for different obstacles of the same category, the similarity of the boundaries of the obstacles is high, and for different obstacles of the different category, the boundaries of the obstacles have certain similarity; therefore, the boundary of at least one obstacle in the first image is determined based on the boundary information network model, so that not only the obstacles contained in the training sample set can be detected, but also the obstacles not contained in the training sample set can be detected; specifically, for an obstacle that is not included in the training sample set, if the training sample set includes other obstacles of the same category as the obstacle, the embodiment of the present application may detect the obstacle based on the similarity of the boundaries; for an obstacle not included in the training sample set, if the training sample set includes other obstacles whose boundaries are similar to the boundaries of the obstacle, the obstacle may also be detected in the embodiment of the present application based on the similarity of the boundaries; therefore, a larger number of obstacles can be detected by determining the boundaries of the obstacles, and the effectiveness of obstacle detection can be improved.
Detailed Description
The technical solutions in the embodiments of the present application will be described in detail below with reference to the drawings in the embodiments of the present application.
The embodiment of the application can be applied to the detection system shown in fig. 1, and the detection system comprises a sensor, a perception algorithm module and a planning and control module.
The number of the sensors can be one or more, and the sensors specifically comprise a monocular camera, a binocular camera, a multi-view camera and a panoramic camera and are used for shooting images or videos of the surrounding environment; the sensing algorithm module is used for detecting obstacles in the images or videos shot by the sensors, and when the number of the sensors is multiple, the sensing algorithm module is also used for fusing the obstacle detection results corresponding to the sensors; the planning and control module is used for receiving the obstacle detection result of the perception algorithm module and planning and controlling the self-behavior of the movable platform according to the obstacle detection result, for example, the self-behavior is the next moving path and mode.
The perception algorithm module can be a separate device, can be arranged inside the sensor, and can also be arranged in a device together with the planning and control module.
The embodiment of the application can be applied to the fields of traffic safety, automatic Assisted Driving (ADAS), Automatic Driving (AD), and the like, and at this time, the detection system shown in fig. 1 can be deployed in a movable platform, which includes an automobile, a robot, and the like; when the movable platform is an automobile, the detection system shown in fig. 1 may also be referred to as an on-board system.
The embodiment of the application can also be applied to the fields of smart intersections, smart cities and the like, at this time, the detection system shown in fig. 1 can be deployed in a distributed sensor network or a non-movable platform, wherein the non-movable platform can be a street lamp or a traffic light and is used for detecting obstacles in a key traffic area.
At present, obstacles are mainly detected through a deep neural network. Specifically, a deep neural network is trained by utilizing the attribute information of the obstacle, and then the deep neural network is deployed on corresponding equipment as a part of a detection system; when the obstacle needs to be detected, the attribute information of the obstacle to be detected is firstly acquired, and then the attribute information of the obstacle to be detected is input into the deep neural network, so that the detection result of the obstacle to be detected can be output.
However, the attribute information of the obstacle used at present mainly includes the shape, size, color, texture, material, motion state, etc. of the obstacle, and these attribute information are various and have no uniform rule to follow. The attribute information corresponding to different types of obstacles has a large difference, and the attribute information corresponding to different types of obstacles also has a certain difference.
Therefore, the embodiment of the application provides an obstacle detection method, which detects an obstacle by using attribute information of a boundary formed by the obstacle and a road surface, and is suitable for detecting any obstacle as the boundary can be formed by any obstacle and the road surface; in addition, compared with attribute information such as the shape, size, color, texture, material and motion state of the obstacle, the attribute information of the boundary formed by the obstacle and the road surface is more stable and single, and the universality and the generalization performance are better, so that the obstacle detection method provided by the embodiment of the application is adopted to detect the obstacle, and the obstacle detection effectiveness can be improved.
For the sake of understanding, the terms used in the embodiments of the present application will be described below.
Obstacle: the obstacle is an object occupying a travelable road surface and influencing the forward movement of an autonomous vehicle (ego vehicle), and since any kind of object (instead of some specific kind of object or common kind of object) can be an obstacle, the obstacle can be also called a general obstacle, and the method provided by the embodiment of the application is described below by using the obstacle.
Referring to fig. 2, fig. 2 shows various examples of obstacles, including not only various conventional traffic participants such as pedestrians (101), automobiles (102), motorcycles (103), bicycles (104), but also traffic scene markers such as traffic cones (105), triangle boards (106), and objects such as animals (107), boxes (108), lying tires (109), stones (110) which are not frequently present in a traffic scene.
Semantic segmentation (semantic segmentation): a computer vision task for performing pixel-level classification on an input image is to classify each pixel point in the image and determine semantic categories (such as pedestrians, vehicles, lanes, lane lines, sidewalks and the like) of each point, so that the purpose of performing semantic-level division on the input image is achieved.
Instance segmentation (instance segmentation): on the basis of semantic segmentation, the purpose of distinguishing single individuals in each semantic category is additionally achieved.
True value (ground route): i.e., standard answers, refer to the expected result or correct output for each given input signal during a particular evaluation or measurement task. For example, the true value of semantic segmentation refers to the category to which each pixel in an image belongs, and the common expression form is a category label mask (mask) having the same size as the image. The truth value can be used for training the model in supervised learning and can also be used for verifying and evaluating the performance of the model.
Heat map (heat map): a visualization method for displaying data in shades of color changes. Given an input image, the semantic segmentation network outputs a corresponding heat map for each class. Where the shade of the color represents the likelihood that the category appears in the corresponding image area, generally, the warmer the color (or the higher the brightness) the greater the likelihood.
Occupation boundary: the method is characterized in that after a travelable road surface is occupied by an object, a boundary is formed between the object and the road surface; referring to fig. 3 and 4, fig. 3 and 4 illustrate various examples of occupancy boundaries, and in particular, fig. 3 illustrates an occupancy boundary formed between a carton and a roadway, fig. 3 also illustrates an occupancy boundary formed between a barrier and a roadway, and fig. 4 illustrates an occupancy boundary formed between various types of automobiles and a roadway.
Based on the foregoing description, it can be seen that, in the present application, a deep neural network is mainly used for detecting an obstacle, and in the embodiment of the present application, a boundary information network model needs to be trained to obtain the boundary information network model before the method provided by the embodiment of the present application is used for detecting an obstacle.
The following describes the training process of the boundary information network model with reference to fig. 5.
As shown in fig. 5, the training process for the boundary information network model may include:
in operation 201, a training data set is obtained.
The training data set may include a plurality of images and boundary information of obstacles in the plurality of images, and the plurality of images including the obstacles may be directly captured by the camera or extracted from a video captured by the camera.
The boundary information of the obstacle may also be referred to as empirical obstacle boundary information, which may be any information related to the empirical obstacle boundary; for example, the empirical obstacle boundary information may include an occupation boundary of the empirical obstacle, where the occupation boundary refers to a boundary line segment formed between the object and the road surface after the travelable road surface is occupied by the object; in addition, the empirical obstacle boundary information may also include information of the occupied boundary instances of the empirical obstacle.
Instances may be understood as individuals, each of which may be referred to as an instance; based on this, each occupancy boundary may be referred to as an occupancy boundary instance.
The information occupying the boundary instance may be various, which is not specifically limited in this embodiment of the application, for example, the information occupying the boundary instance may be a unique identification ID occupying the boundary.
The above description of the empirical obstacle boundary information is made from the viewpoint of information content, and the following description of the empirical obstacle boundary information is made from the source of the empirical obstacle boundary information.
Classifying the empirical obstacle boundary information according to the source of the empirical obstacle boundary information, wherein the empirical obstacle boundary information may include historical obstacle boundary information and/or sample obstacle boundary information; the sample obstacle boundary information can be understood as boundary information obtained by manually labeling obstacles in the sample image; the historical obstacle boundary information can be understood as prior obstacle boundary information, namely boundary information which can be obtained without manual marking.
For example, the historical obstacle boundary information may be boundary information of an obstacle already existing in the map, specifically, when a certain road segment is repaired, the road block set in the repaired road segment and the boundary information of the road block are updated in the map, and the boundary information of the road block may be used as the historical obstacle boundary information.
For the sample obstacle boundary information, it needs to be obtained through manual labeling, and the process of labeling the sample obstacle boundary information is described below by taking the occupied boundary as an example.
It should be noted that, multiple manual labeling methods may be adopted to obtain the boundary information of the sample obstacle, which is not specifically limited in the embodiment of the present application, and three labeling methods for obtaining the boundary information of the sample obstacle are described below by taking the occupied boundary as an example.
As one way of implementation, the occupancy boundary is obtained by taking a set of ordered points along a boundary line segment between the lower edge of the obstacle and the travelable road surface in the image.
The ordered point set may be formed by points from left to right along the image, or may be formed by points from right to left along the image.
For example, as shown in FIG. 6, an ordered set of points is taken along a line segment of the intersection between the lower edge of the bicycle and the ground, which constitutes an occupancy boundary that is also the true value of the occupancy boundary when the bicycle is acting as an obstacle.
For another example, as shown in fig. 7, an occupancy boundary, which is also the true value of the occupancy boundary when the barrier is an obstacle, is obtained by taking an ordered set of points along a line segment that intersects the ground and the lower edge of the barrier.
As one way of implementation, the occupancy boundary is obtained by a boundary line segment between the lower edge of the mask of the obstacle in the image and the travelable road surface.
The mask is understood to be an image for covering, and the mask for covering the obstacle is understood to be an image for covering the obstacle.
For example, fig. 8 shows a mask 1501 of an automobile and a mask 1500 of a travelable road surface, and on a boundary line between the mask 1501 of the automobile and the mask 1500 of the travelable road surface, a starting point and an ending point of a boundary line segment between a lower edge of the automobile and the ground surface are marked, for example, a point 1502 is marked as a starting point, and a point 1503 is marked as an ending point; thus, on the boundary line between the mask 1501 of the vehicle and the mask 1500 of the travelable road surface, all points (including the points 1502 and 1503) between the points 1502 and 1503 constitute an ordered set of points that constitute an occupancy boundary that is the true value of the occupancy boundary when the vehicle is an obstacle.
Therefore, the occupied boundary is obtained through the boundary line segment between the lower edge of the mask of the obstacle and the drivable road surface in the image, the starting point and the ending point of the boundary line segment between the lower edge of the mask of the obstacle and the drivable road surface are marked, the occupied boundary can be obtained without point selection one by one, and the marking efficiency can be improved.
As one implementation, the occupation boundary is generated by a simulation engine, and the scene image simulated by the simulation engine is an image including an obstacle.
Specifically, the image including the obstacle is used as a scene image simulated by the simulation engine, and the simulation engine simulates a traffic scene, so that the virtual data and the corresponding occupation boundary can be generated. For example, as shown in fig. 9, the occupation boundary of the automobile generated by the simulation engine is indicated by a white line segment, and the occupation boundary is a true value of the occupation boundary when the automobile is an obstacle.
The occupation boundary can be automatically generated through the simulation engine, the occupation boundary containing the obstacles in the image does not need to be manually marked one by one, the efficiency of obtaining the occupation boundary of the obstacles can be greatly improved, and the marking cost can be reduced.
It should be noted that, no matter which labeling method is adopted, if a plurality of overlapped obstacles exist in the image, the plurality of obstacles can be regarded as one obstacle or one cluster of obstacles for labeling, and accordingly, the plurality of overlapped obstacles can correspond to one occupied boundary; the plurality of overlapped obstacles means that, in the plurality of overlapped obstacles, for any one obstacle, another obstacle overlapped with the obstacle exists.
For example, as shown in fig. 6, the image includes two bicycles with overlapping portions, and the two bicycles with overlapping portions are labeled to obtain an occupation boundary (shown by a white line segment in fig. 6) as shown in fig. 6.
At operation 202, the boundary information network model is trained based on the training data set to obtain a trained boundary information network model.
The types of the boundary information network models may be various, which is not specifically limited in the embodiment of the present application, for example, an ENet network may be used as the boundary information network model, and a process of processing an image by the ENet network is as shown in fig. 10, where numbers in fig. 10 represent channel numbers of the image.
The process of training the boundary information network model generally comprises: selecting a boundary information network model, configuring initial weight for the boundary information network model, inputting training data in a training data set into the boundary information network model, calculating a loss function based on the output of the boundary information network model and labeled information, and finally performing back propagation according to the loss function to update the weight in the boundary information network model.
It can be understood that, because the occupation boundaries of the obstacles in the images in the training data set are labeled, the trained boundary information network model can output the occupation boundaries of the obstacles in the images by taking one image as an input; in addition, if information of an occupation boundary instance of an obstacle in the image is also labeled, the trained boundary information network model may further output the occupation boundary instance of the obstacle in the image, for example, the trained boundary information network model may output a unique ID of the occupation boundary of the obstacle; based on the occupation boundary instance of the obstacle, the boundary information network may further output an obstacle instance corresponding to the occupation boundary instance, where each obstacle in the image may be referred to as an obstacle instance.
The above describes the training process of the boundary information network model, and the following describes the process of detecting an obstacle in an image based on the boundary information network model.
Referring to fig. 11, an embodiment of the present application provides an obstacle detection method, including:
in operation 301, a first image is acquired, where the first image includes at least one obstacle.
There are various ways to acquire the first image, and this is not specifically limited in this embodiment of the application. For example, the first image may be directly captured by a camera, or a video may be captured by a camera, and then one frame of image including the obstacle is extracted from the video as the first image.
The types of cameras include, but are not limited to, monocular cameras, binocular cameras, multi-view cameras, and all-round cameras.
In particular, in a traffic scene, a first image may be acquired by a vehicle-mounted forward-looking camera.
The number of obstacles in the first image may be one or more; when the number of obstacles in the first image is plural, two obstacles independent from each other (i.e., non-overlapping) may exist in the plural obstacles, or two obstacles in an overlapping portion may exist.
For example, the first image is an image shown in fig. 6, and includes two obstacles, i.e., a car and a bicycle, which are independent of each other, and two bicycles having an overlapping portion.
The type of the obstacle in the first image may be one, or may be multiple, and the embodiment of the present application does not specifically limit the type of the obstacle in the first image, for example, the type of the obstacle in the first image may be any one of the obstacles in fig. 2.
At operation 302, a boundary of at least one obstacle is determined based on the boundary information network model.
The boundary of at least one obstacle includes a boundary formed by the obstacle and the road surface, and the boundary formed by the obstacle and the road surface can also be called an occupation boundary.
As can be seen from the foregoing description, before performing operation 302, the boundary information network model needs to be trained based on a training data set, where the training data set may include a plurality of training images and boundary information of obstacles in the plurality of training images, and the training data set may include boundary information of obstacles in the plurality of images and the plurality of images, which may also be referred to as empirical obstacle boundary information.
Therefore, as an implementation manner, the boundary information network model is obtained by training based on empirical obstacle boundary information, and the empirical obstacle boundary information includes historical obstacle boundary information and/or sample obstacle boundary information.
Since the empirical obstacle boundary information has been described above, the empirical obstacle boundary information can be understood with reference to the description of operation 201 above.
Because the historical barrier boundary information can be obtained without manual marking, the marking cost can be reduced by training a boundary information network model based on the historical barrier boundary information; the sample obstacle boundary information is acquired through manual marking, and various obstacles can be selected for marking in the manual marking process, so that the sample obstacle boundary information can increase the diversity of the boundary information, and the performance of the boundary information network model can be improved by training the boundary information network model based on the sample obstacle boundary information, so that the obstacle detection effectiveness is improved.
Based on the above description, the boundary information of the sample obstacle needs to be obtained through manual labeling, and three manual labeling methods for obtaining the boundary information of the sample obstacle are described below.
As an implementation manner, the sample obstacle boundary information is obtained by taking an ordered set of points along a boundary line segment between the lower edge of the obstacle in the image and the drivable road surface; or the boundary information of the sample obstacle is obtained by taking a boundary line segment between the lower edge of the mask of the obstacle in the image and the drivable road surface; or, the sample obstacle boundary information is generated by a simulation engine, and the scene image simulated by the simulation engine is an image containing the obstacle.
It can be understood that the sample obstacle boundary information may be an occupation boundary of the sample obstacle, so the process of acquiring the sample obstacle boundary information in this embodiment can be understood with reference to the related descriptions (three manual labeling methods for acquiring the occupation boundary of the sample obstacle) in fig. 6 to fig. 9.
The implementation mode provides various feasible schemes for obtaining the boundary information of the sample obstacle, so that the mode for obtaining the boundary information of the sample obstacle is more flexible; the method is simple and easy to operate, the boundary information of the sample obstacle is obtained through the boundary line segment between the lower edge of the mask of the obstacle and the drivable road surface in the image, and the information of the boundary of the sample obstacle is obtained through the boundary line segment between the lower edge of the mask of the obstacle and the drivable road surface in the image, so that only the starting point and the ending point of the boundary line segment need to be marked without point-by-point taking, and the marking efficiency can be improved; the boundary information of the sample obstacle is generated through the simulation engine, manual marking is not needed, and marking cost can be reduced.
The features of the boundary of the obstacle will be explained below.
As an implementable way, the pixel points occupied by the boundary of the at least one obstacle are continuous in the first direction.
Wherein the first direction may be a pixel width direction of the image, the pixel width direction corresponding to a horizontal direction of the image; for example, the first direction may be a horizontal direction from point 1502 to point 1503 in fig. 8.
It will be appreciated that a number of problems may result if the pixels occupied by the boundaries of an obstacle are discontinuous in the first direction.
For example, a broken multi-segment boundary may cause a user (e.g., a driver) to misunderstand that the broken multi-segment boundary is a boundary of multiple obstacles, and further misunderstand that an area between two boundaries is a travelable area, but the area between two boundaries is also an obstacle, i.e., a non-travelable area.
For another example, the obstacle is usually present in a certain volume, and the discontinuous multi-segment boundary is not favorable for the user to judge the size of the obstacle in the first direction.
Because the boundary of the obstacle discontinuity causes the above problems, in the embodiment of the present application, the pixel points occupied by the boundary of at least one obstacle are continuous in the first direction, which not only reflects the size of the obstacle in the first direction well, but also is beneficial for the user to accurately identify the drivable area.
Taking fig. 8 as an example, the actual contact position of the vehicle with the road surface in fig. 8 is at four wheels, which are obviously dispersed; if the four wheel positions are the boundaries of the automobile as an obstacle, the user may misunderstand that the area between the wheels is a drivable area, and cannot judge the size of the obstacle in the horizontal direction.
In the embodiment of the present application, the continuous boundary line from point 1502 to point 1503 is used as the boundary when the automobile is used as an obstacle; in this way, the user can judge the size of the obstacle in the horizontal direction, so as to estimate the size of the obstacle, and the area where the whole boundary is located is taken as a non-drivable area.
As can be seen from the description of operation 301, the number of obstacles may be one or more; when the number of the obstacles is one, the number of the boundaries of the determined obstacles is one; when the number of obstacles is plural, the number of boundaries of the determined obstacles may be divided into two cases.
In the first case: the overlapping part exists between the overlapping of the plurality of obstacles; at this time, as can be seen from the related description of the training process, a plurality of overlapped obstacles are regarded as one obstacle or one cluster of obstacles and labeled, accordingly, the number of boundaries of the obstacles determined based on the boundary information network model may be regarded as one, and the one boundary may be regarded as being formed by connecting the boundaries of the plurality of obstacles.
In the second case: there is no overlapping portion between the plurality of obstacles.
The second case will be described below by taking two obstacles as an example.
As one implementation, the at least one obstacle includes a first obstacle and a second obstacle, and accordingly, operation 302 includes: and determining the boundary of the first obstacle and the boundary of the second obstacle, wherein the intersection of the pixel point occupied by the boundary of the first obstacle and the pixel point occupied by the boundary of the second obstacle is an empty set.
The first obstacle and the second obstacle may be of the same type or different types, which is not specifically limited in this embodiment of the application.
In the embodiment of the application, when there is no overlapping part between two obstacles, the determined boundaries of the two obstacles are independent from each other, so that the intersection of pixel points occupied by the boundaries of the two obstacles is an empty set.
For example, fig. 7 includes three road blocks, two of the road blocks are used as a first obstacle and a second obstacle, and the intersection of pixel points occupied by the boundary of the two determined obstacles is an empty set.
It is understood that the boundary information network model is different, and the specific process of the corresponding operation 302 is also different.
As one implementation, the boundary information network model is used to determine the boundary of the obstacle through semantic segmentation, and accordingly, operation 302 includes:
inputting the first image into a boundary information network model, and classifying each pixel point in the first image based on empirical obstacle boundary information as a category;
and processing the classification result to obtain the boundary of at least one obstacle.
Wherein, the classification result can be pedestrians, vehicles, lanes, lane lines, sidewalks, etc.
In the embodiment of the application, each pixel point in the first image is classified based on empirical obstacle boundary information as a category, and the classification result is processed to obtain the boundary of at least one obstacle, so that the boundary of the obstacle is obtained through semantic segmentation.
It should be noted that the types of outputs of different boundary information network models are different, and in general, the boundary information network model outputs a heat map of a boundary including an obstacle; based on the heat map, boundaries of the obstacle may be determined.
A specific process of determining the boundary of an obstacle based on the heat map is explained below.
Specifically, the image shown in fig. 12 is input into the boundary information network model, and the boundary information network model outputs a heat map as shown in fig. 13, wherein white line segments in fig. 13 represent the boundaries of the obstacles; the boundaries of the obstacle can be determined based on the heat map shown in fig. 13.
In addition, the heat map shown in fig. 13 may be post-processed to obtain the boundaries (i.e., occupancy boundaries) corresponding to each obstacle instance.
Specifically, the pixels in each column below the lowest pixel in the thermal image shown in fig. 13 that are greater than a certain preset threshold are retained, the pixels in the rest positions are set to zero, and each pixel in the processed thermal image can be regarded as a one-dimensional signal; then, each dip notch of the one-dimensional signal, i.e. the boundary corresponding to each obstacle instance, is obtained through inflection point detection, and specifically, referring to fig. 14, each dip notch in fig. 14 represents the boundary corresponding to one obstacle instance.
In the embodiment of the application, the boundary of the obstacle can be used for determining the position of the obstacle, so that the detection of the obstacle can be realized.
Compared with the attribute information of the shape, the size, the color, the texture, the material, the motion state and the like of the obstacle, the attribute information of the boundary of the obstacle is more stable and single, and the universality and the generalization are better; specifically, for different obstacles of the same category, the similarity of the boundaries of the obstacles is high, and for different obstacles of the different category, the boundaries of the obstacles have certain similarity.
For example, as shown in FIG. 4, FIG. 4 includes a plurality of automobiles, such as trucks, vans, and SUVs, all belonging to the same category; although the various automobiles have various shapes, sizes, colors, materials, and the like, as long as the automobiles are objects of the category, the boundaries formed between the automobiles and the road surface include three types: straight lines, fold lines bent to the left and fold lines bent to the right. Then, most cars as obstacles can be detected using the above three boundaries.
Therefore, the attribute information of the boundary of the obstacle is relatively stable and single, and the similarity of the boundary of the obstacle is higher for different obstacles of the same category, so that the obstacle detection by determining the boundary of the obstacle is beneficial to detecting more obstacles.
For another example, as shown in fig. 3 and 4, fig. 3 contains a carton and fig. 4 contains a car; although cartons and cars belong to different categories, the boundaries between cartons and pavement, similar to those of cars and pavement, all contain three types: straight lines, fold lines bent to the left and fold lines bent to the right. Then, not only the car but also the carton as an obstacle can be detected using the above three kinds of boundaries.
Therefore, the attribute information of the boundary of the obstacle is relatively stable and single, and the boundaries of the obstacles have certain similarity for different types of obstacles, so that the obstacle detection by determining the boundaries of the obstacles is favorable for detecting a larger number of obstacles.
In summary, in the embodiment of the present application, the boundary of the obstacle is determined to detect the obstacle, which is beneficial to detecting a larger number of obstacles, and the effectiveness of obstacle detection can be improved.
In operation 303, a size of an area occupied by the at least one obstacle in the first image is determined according to a boundary of the at least one obstacle and a preset pixel height of the obstacle in the image.
It can be understood that the position of the obstacle can be determined based on the boundary of the obstacle, so that the detection of the obstacle can be realized; however, all the actual obstacles have a certain volume, so in order to more intuitively and stereoscopically represent the detected obstacles, the embodiment of the present application determines the size of the area occupied by the obstacle in the first image according to the boundary of the obstacle and the pixel height of the obstacle in the image, and accordingly, operation 303 is optional.
The pixel height may be understood as a dimension in the vertical direction of the first image, but the pixel height is preset and has no direct relation with the actual height of the obstacle, and the pixel height may be larger than the height of the obstacle in the first image or smaller than the height of the obstacle in the first image.
Taking the image of fig. 12 as an example, after determining the boundary of an obstacle according to the heat map of fig. 13, the obstacle can be represented as a columnar pixel (stixel) with the boundary of the obstacle as a base, and the representation effect is as shown in fig. 15; as can be seen from fig. 15, the carton, the barricade and other obstacles are all represented by the columnar pixels, and the actual height of the carton, the barricade and other obstacles is independent of the height of the columnar pixels, specifically, the height of the carton and other obstacles is less than that of the columnar pixels, and the height of part of the barricade is greater than that of the columnar pixels.
In the embodiment of the application, the size of the area occupied by the at least one obstacle in the first image is determined according to the boundary of the at least one obstacle and the preset pixel height of the obstacle in the image, so that the obstacle can be represented more intuitively and stereoscopically.
With reference to fig. 16, a schematic diagram of an embodiment of an obstacle detection apparatus according to the embodiment of the present application is shown.
One or more of the various unit modules in fig. 16 may be implemented by software, hardware, firmware, or a combination thereof. The software or firmware includes, but is not limited to, computer program instructions or code and may be executed by a hardware processor. The hardware includes, but is not limited to, various integrated circuits such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC).
The obstacle detection device includes:
an acquiring unit 1201, configured to acquire a first image, where the first image includes at least one obstacle;
a determining unit 1202 for determining a boundary of at least one obstacle based on the boundary information network model; wherein the boundary of at least one obstacle comprises a boundary formed by the obstacle and the road surface.
Further, the boundary information network model is obtained by training based on empirical obstacle boundary information, and the empirical obstacle boundary information comprises historical obstacle boundary information and/or sample obstacle boundary information.
Further, the sample obstacle boundary information is obtained by taking an ordered point set along a boundary line segment between the lower edge of the obstacle in the image and the drivable road surface; or the boundary information of the sample obstacle is obtained by taking a boundary line segment between the lower edge of the mask of the obstacle in the image and the drivable road surface; or the sample obstacle boundary information is generated by a simulation engine, and the image is a scene image simulated by the simulation engine.
Further, the determining unit 1202 is specifically configured to: inputting the first image into a boundary information network model, and classifying each pixel point in the first image based on empirical obstacle boundary information as a category; and processing the classification result to obtain the boundary of at least one obstacle.
Further, the pixel points occupied by the boundary of at least one obstacle are continuous in the first direction.
Further, the at least one obstacle includes a first obstacle and a second obstacle; the determining unit 1202 is specifically configured to: and determining the boundary of the first obstacle and the boundary of the second obstacle, wherein the intersection of the pixel point occupied by the boundary of the first obstacle and the pixel point occupied by the boundary of the second obstacle is an empty set.
Further, the determining unit 1202 is further configured to: and determining the size of the area occupied by the at least one obstacle in the first image according to the boundary of the at least one obstacle and the preset pixel height of the obstacle in the image.
Please refer to fig. 17, which is a schematic diagram of an embodiment of an obstacle detection apparatus according to an embodiment of the present application.
The obstacle detection device 1300 may be configured on a movable platform (e.g., a car, a robot, etc.), and may include one or more processors 1301 and a memory 1302, where the memory 1302 stores programs or data.
Memory 1302 may be volatile memory or non-volatile memory, among others. Optionally, processor 1301 is one or more Central Processing Units (CPUs), which may be single core CPUs or multi-core CPUs, processor 1301 may communicate with memory 1302 to execute a series of instructions in memory 1302 on obstacle detection apparatus 1300.
The obstacle detection device 1300 also includes one or more wired or wireless network interfaces 1303, such as an ethernet interface.
Optionally, although not shown in fig. 17, the obstacle detection apparatus 1300 may further include one or more power sources; the input/output interface may be used to connect a camera, a display, a mouse, a keyboard, a touch screen device, a sensing device, or the like, and the input/output interface is an optional component, and may or may not be present, and is not limited herein.
The process executed by the processor 1301 of the obstacle detection apparatus 1300 in this embodiment may refer to the method process described in the foregoing method embodiment, which is not described herein again.
The obstacle detecting device may be a vehicle having an obstacle detecting function, or other components having an obstacle detecting function. The obstacle detection device includes but is not limited to: the vehicle can pass through the vehicle-mounted terminal, the vehicle-mounted controller, the vehicle-mounted module, the vehicle-mounted component, the vehicle-mounted chip, the vehicle-mounted unit, the vehicle-mounted radar or the camera to implement the method provided by the application.
The obstacle detecting device may also be, or be provided in, another intelligent terminal having an obstacle detecting function than the vehicle, or a component of the intelligent terminal. The intelligent terminal can be other terminal equipment such as intelligent transportation equipment, intelligent home equipment and robots. The obstacle detection device includes, but is not limited to, a smart terminal or a controller in the smart terminal, a chip, other sensors such as a radar or a camera, and other components.
The obstacle detecting device may also be a general purpose device or a dedicated device. In a specific implementation, the apparatus may also be a desktop, a laptop, a network server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or other devices with processing functions. The embodiment of the present application does not limit the type of the obstacle detecting device.
The obstacle detecting device may also be a chip or a processor with a processing function, and the obstacle detecting device may include a plurality of processors. The processor may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. The chip or processor having the processing function may be provided in the sensor, or may be provided not in the sensor but on a receiving end of the sensor output signal.
The embodiment of the present application further provides a system, which is applied to unmanned driving or intelligent driving, and includes at least one of the obstacle detection devices, cameras, radar sensors and other sensors mentioned in the above embodiments of the present application, at least one device in the system may be integrated into a whole machine or equipment, or at least one device in the system may also be independently set as an element or device.
Further, any of the above systems may interact with a central controller of the vehicle to provide detection and/or fusion information for decision making or control of the driving of the vehicle.
Embodiments of the present application further provide a vehicle including at least one obstacle detection device according to the above-mentioned embodiments of the present application or any of the above-mentioned systems.
Embodiments of the present application also provide a chip including one or more processors. Part or all of the processor is used for reading and executing the computer program stored in the memory so as to execute the method of the foregoing embodiments.
Optionally, the chip may include a memory, and the memory and the processor may be connected to the memory through a circuit or a wire. Further optionally, the chip further comprises a communication interface, and the processor is connected to the communication interface. The communication interface is used for receiving data and/or information needing to be processed, the processor acquires the data and/or information from the communication interface, processes the data and/or information, and outputs a processing result through the communication interface. The communication interface may be an input output interface.
In some implementations, some of the one or more processors may also implement some of the steps of the above method by means of dedicated hardware, for example, a process involving a neural network model may be implemented by a dedicated neural network processor or a graphics processor.
The method provided by the embodiment of the application can be realized by one chip or by cooperation of a plurality of chips.
Embodiments of the present application also provide a computer storage medium for storing computer software instructions for the computer device, which includes a program designed for executing the computer device.
The computer device may be an obstacle detection apparatus as described in the foregoing fig. 16.
The embodiment of the present application further provides a computer program product, which includes computer software instructions that can be loaded by a processor to implement the flow in the method shown in the foregoing embodiments.
The embodiment of the application also provides a vehicle, which comprises the obstacle detection device as described in the foregoing fig. 16.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.