CN118247765A

CN118247765A - Panoramic object detection method, device, vehicle and storage medium

Info

Publication number: CN118247765A
Application number: CN202410066056.0A
Authority: CN
Inventors: 吴秶菘; 韦建全; 祝鹏程; 陈甫鑫; 陈海茹; 吴金兰; 代文浩; 谭翀; 邵俊辉
Original assignee: Guangdong Liuli Zhixing Technology Co ltd
Current assignee: Guangdong Liuli Zhixing Technology Co ltd
Priority date: 2024-01-16
Filing date: 2024-01-16
Publication date: 2024-06-25
Anticipated expiration: 2044-01-16
Also published as: CN118247765B

Abstract

The application discloses a panoramic object detection method, a device, a vehicle and a storage medium, which belong to the technical field of auxiliary driving.

Description

Panoramic object detection method, device, vehicle and storage medium

Technical Field

The present application relates to the field of assisted driving technologies, and in particular, to a panoramic target detection method, a device, a vehicle, and a storage medium.

Background

Panoramic image systems aim to improve driving experience, safety and convenience, and play an important role in particular in low-speed driving scenarios.

However, in the related art, there are blind spots or dead angles in the detection of the panoramic image system, for example, in the splicing area of the view angles of multiple cameras, the panoramic image system may have a defect of image dislocation, which may cause the disappearance of the target object in the display screen or the difficulty in judging the position of the target object relative to the vehicle body, so that the driver misjudges the existence and the relative distance of the target object, which may affect the timely detection and avoidance of the obstacle.

Disclosure of Invention

The application mainly aims to provide a panoramic target detection method, a device, a vehicle and a storage medium, and aims to solve the problem that in the related art, due to the processing defect of an image spelling area, blind spots and dead angles possibly exist in panoramic image system detection, so that a target object disappears in a display screen or the position of the target object relative to a vehicle body is difficult to judge.

In order to achieve the above object, in a first aspect, the present application provides a panoramic object detection method, including:

acquiring scene images acquired by cameras around a vehicle;

Generating panoramic scene images of a bird's eye view based on all scene images;

For each two adjacent scene images, determining an identified target in an image superposition area between the two scene images, and determining ranging information between the identified target and the vehicle, which is respectively obtained by two current cameras; the two current cameras are cameras respectively corresponding to two adjacent scene images, and in a preset fusion weight table, the two current cameras respectively have a ranging fusion weight;

for each two adjacent scene images, based on the respective ranging fusion weights of the two current cameras, fusing two ranging information to obtain unique ranging information between the identified target and the vehicle;

Based on the unique ranging information, marker information for the identified object is generated in the panoramic scene image.

Optionally, generating a panoramic scene image of a bird's eye view based on all scene images, including:

Performing view angle conversion on the scene image to obtain a bird's eye view angle image;

Determining the respective pixel fusion weights of any two adjacent aerial view images from a preset fusion weight table aiming at the image superposition area between every two adjacent aerial view images;

Aiming at each two adjacent aerial view angle images, based on the fusion weight of two pixels, carrying out fusion treatment on the image superposition areas of the two adjacent aerial view angle images to obtain fused adjacent aerial view angle images;

And generating panoramic scene images based on all the fused adjacent bird's-eye view angle images.

Optionally, determining, for each two adjacent scene images, an identified object in an image registration area between the two scene images, comprising:

Inputting each scene image into a target detection model to obtain an identified target; the identified targets have a classification confidence and a positioning confidence;

obtaining comprehensive confidence according to the classification confidence and the positioning confidence of the target image area of the identified target;

and outputting the identified target with the comprehensive confidence coefficient larger than the preset confidence coefficient as a final identified target.

Optionally, outputting the identified target with the integrated confidence level greater than the preset confidence level as the final identified target, including:

taking the identified target with the comprehensive confidence coefficient larger than the preset confidence coefficient as the identified target;

Updating the current target detection set based on the identified targets;

Outputting all the identified targets in the current target detection set.

Optionally, for each two adjacent scene images, based on respective ranging fusion weights of the two current cameras, fusing two ranging information to obtain unique ranging information between the identified target and the vehicle, including:

If the identified targets appear in the image overlapping areas, respectively identifying the image features corresponding to the identified targets in the two scene images;

determining binocular range information of the identified target according to pixel position information of the image features in the binocular imaging system; the binocular imaging system is composed of two cameras corresponding to two scene images;

Based on the two ranging fusion weights and the binocular fusion weight of the binocular range information, the binocular range information and the two ranging information are fused to obtain the unique ranging information of the identified target.

acquiring the identified target perception information acquired by a perception sensor;

determining a sensing fusion weight corresponding to the sensing sensor from a preset fusion weight table;

based on the sensing fusion weight and the two ranging fusion weights, fusing the two ranging information and the sensing information of the identified target to obtain the unique ranging information of the identified target.

Optionally, for each two adjacent scene images, based on the respective ranging fusion weights of the two current cameras, fusing the two ranging information to obtain unique ranging information between the identified target and the vehicle, and then the method further comprises the following steps:

determining the alarm level of each identified target according to the unique ranging information of each identified target;

Determining an alarm target corresponding to the highest alarm level from all the identified targets;

determining a target camera to which an alarm target belongs;

And displaying the scene image of the target camera.

In order to achieve the above object, the present application further provides a panoramic object detection device, including:

the acquisition module acquires scene images acquired by all cameras around the vehicle;

the panoramic image generation module is used for generating panoramic scene images based on all scene images;

The distance measurement module is used for determining an identified target in an image superposition area between two scene images aiming at each two adjacent scene images, and determining distance measurement information between the identified target and a vehicle, which is respectively obtained by two current cameras; the two current cameras are cameras respectively corresponding to two adjacent scene images, and in a preset fusion weight table, the two current cameras respectively have a ranging fusion weight;

the fusion module is used for fusing two ranging information based on the respective ranging fusion weights of the two current cameras aiming at each two adjacent scene images to obtain unique ranging information between the identified target and the vehicle;

And the identification module is used for generating mark information of the identified target in the panoramic scene image based on the unique ranging information.

In a third aspect, to achieve the above object, the present application further provides a vehicle comprising: the panoramic object detection method comprises the steps of a processor, a memory and a panoramic object detection program stored in the memory, wherein the panoramic object detection program is executed by the processor to realize the panoramic object detection method.

In a fourth aspect, to achieve the above object, the present application further provides a computer readable storage medium, where a panoramic object detection program is stored, and when executed by a processor, the panoramic object detection program implements the panoramic object detection method described above.

It is easy to see that the application carries out related processing on the scene images obtained by the cameras of the vehicle body to generate 360-degree aerial view panoramic images around the vehicle, determines the ranging information between the two cameras of the recognized target and the vehicle respectively aiming at the recognized target in the image superposition area of the two adjacent scene images, then carries out fusion processing on the two ranging information according to the ranging fusion weights preset in the fusion weight table aiming at the two ranging information to obtain more accurate unique ranging information between the target and the vehicle, then can calculate the position information of the target relative to the vehicle body according to the unique ranging information, and can correspondingly generate the marking information of the recognized target in the generated panoramic scene image according to the position information.

According to the method and the device for identifying the object in the panoramic scene, the acquisition of 360-degree aerial view scene information around the vehicle is achieved, the accurate measurement of object information around the vehicle is achieved through the ranging fusion weight, the marking of the identified object in the panoramic scene image is achieved, it can be understood that the problem that in a traditional panoramic image system, due to the defect of image dislocation during image stitching, the object disappears in a display screen or the position of the object relative to a vehicle body is difficult to determine can be avoided, a vehicle driver can accurately know the situation of the object around the vehicle according to the marking information in the panoramic scene image, and accordingly timely avoidance is achieved.

Drawings

FIG. 1 is a schematic view of a vehicle according to the present application;

FIG. 2 is a flowchart illustrating a panoramic object detection method according to a first embodiment of the present application;

fig. 3 is a detailed flowchart of step S200 of the panoramic object detection method according to the first embodiment of the present application;

FIG. 4 is a flowchart illustrating a panoramic object detection method according to a second embodiment of the present application;

FIG. 5 is a flowchart illustrating a step S330 of a panoramic object detection method according to a second embodiment of the present application;

FIG. 6 is a flowchart illustrating a panoramic object detection method according to a third embodiment of the present application;

FIG. 7 is a schematic diagram of a binocular imaging system of the present application;

FIG. 8 is a flowchart illustrating a panoramic object detection method according to a fourth embodiment of the present application;

FIG. 9 is a flowchart of a panoramic object detection method according to a fifth embodiment of the present application;

FIG. 10 is a first schematic diagram illustrating an example of the present application;

FIG. 11 is a second schematic diagram of an example of the present application;

Fig. 12 is a third structural schematic of an example of the present application.

The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

However, in the related art, there may be blind spots or dead angles in the detection of the panoramic image system, for example, in the splicing area of multiple camera views, the panoramic image system may have an image dislocation defect, which may cause the target object to disappear in the display screen or be difficult to judge relative to the vehicle body position, so that the driver misjudges the existence and the relative distance of the target object, which may affect the timely detection and avoidance of the obstacle.

The following description will be given of a panoramic object detection method device, a vehicle and a storage medium applied in the implementation of the technique of the present application:

Referring to fig. 1, fig. 1 is a schematic structural diagram of a vehicle in a hardware operating environment according to an embodiment of the present application.

As shown in fig. 1, the vehicle may include: a processor 1001, such as a CPU, a user interface 1003, a memory 1005, and a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a voice pick-up module, such as a microphone array, etc., and the optional user interface 1003 may also be a Display (Display), an input unit such as a Keyboard (Keyboard), etc. The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It is to be appreciated that the vehicle can also include a network interface 1004, and that the network interface 1004 can optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). Optionally, the vehicle may also include RF (Radio Frequency) circuitry, sensors, audio circuitry, wiFi modules, and the like.

Those skilled in the art will appreciate that the vehicle structure shown in FIG. 1 is not limiting of the vehicle and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

Based on the hardware structure of the vehicle, but not limited to the hardware structure, the present application provides a panoramic object detection method according to a first embodiment. Referring to fig. 2, fig. 2 is a flowchart illustrating a panoramic object detection method according to a first embodiment of the present application.

It should be noted that although a logical order is depicted in the flowchart, in some cases the steps depicted or described may be performed in a different order than presented herein.

In this embodiment, the panoramic object detection method includes:

step S100, acquiring scene images acquired by cameras around the vehicle.

Step S200, generating a panoramic scene image from a bird' S eye view based on all scene images.

In this embodiment, cameras are arranged at surrounding positions of the vehicle, and each camera may acquire scene information within a field of view of the camera to generate a scene image.

It can be appreciated that a fisheye camera may obtain scene information over a larger range of viewing angles than other cameras. Specifically, in the present embodiment, taking a fisheye camera as an example, four fisheye cameras are arranged in four directions of the front, rear, left and right of the vehicle to ensure that a scene image of 360 ° scene information around the vehicle can be acquired.

It can be appreciated that the view angle of the scene image collected by the fisheye camera is generally a horizontal view angle, and the bird's-eye view angle image can provide a more stereoscopic scene observation angle compared with the horizontal view angle, so as to help the driver and the vehicle to more comprehensively understand the information around the vehicle.

Further, as shown in fig. 3, as a specific embodiment, step S200 includes:

Step S210, performing view angle conversion on the scene image to obtain a bird' S-eye view angle image.

Specifically, in the present embodiment, the scene image from the fisheye camera is subjected to the distortion removal processing, and the scene image from the horizontal angle of view after the distortion removal is converted into the bird's-eye view image by performing the planar view conversion using the homography matrix converted between the horizontal angle of view and the bird's-eye view.

In a specific example, a horizontal scene image with a calibration cloth may be acquired through a fisheye camera mounted on a vehicle, where the calibration cloth needs to be placed at four corners of the vehicle and adjacent calibration cloths need to be kept on the same horizontal line. After the horizontal scene image is obtained, performing de-distortion operation on the horizontal scene image, selecting characteristic points of the calibration cloth, such as points with obvious characteristics, such as corner points of the calibration cloth, on the de-distorted horizontal scene image, simultaneously finding the characteristic points of the calibration cloth in the undistorted scene image under the aerial view corresponding to the horizontal view, and calculating to obtain a homography matrix for conversion between the horizontal view image and the aerial view according to the spatial relationship between the corresponding characteristic points in the de-distorted horizontal view image and the undistorted aerial view image. It can be understood that each fisheye camera on the vehicle corresponds to a homography matrix, and the homography matrix obtained by calculation is stored for the conversion of the aerial view angle of the scene picture shot by the subsequent fisheye camera. The specific application mode is shown in formula one:

Wherein, Homogeneous coordinates of corresponding points in horizontal visual angle images shot by a fish-eye camera on a vehicle are represented, homography matrix corresponding to the camera is represented by H, and the homography matrix is represented byAnd the homogeneous coordinates of corresponding points in the aerial view angle image obtained by overlooking the horizontal view angle image are represented.

It should be noted that, in an example, the horizontal view image may be subjected to de-distortion by adopting a pixel point fitting manner, and it is understood that the known horizontal view image is a distorted fish-eye image, and the process of de-distorting the horizontal view image is a process of finding the mapping relationship between the horizontal view image and each point in the de-distorted horizontal view image. Specifically, a fisheye camera on the vehicle may be used to capture calibration objects of known three-dimensional coordinates for each angle and record their corresponding two-dimensional coordinates in the image. The fisheye camera is calibrated by using known three-dimensional coordinates and corresponding two-dimensional coordinates in the shot image, internal parameters (such as focal length and principal point position) of the camera are estimated in the camera calibration process, meanwhile, distortion fitting coefficients are also estimated, and as radial distortion of the camera is usually nonlinear, an optimization algorithm (such as a least square method) is used for nonlinear optimization of the distortion fitting coefficients in the camera calibration process, so that the optimal distortion fitting coefficients are found, recorded and saved. In practical application, a specific de-distortion mode is shown in a formula II:

x′＝x(1+k₁r²+k₂r⁴+k₃r^b+k₄r⁸)；

y′＝y(1+k₁r²+k₂r⁴+k₃r⁶+k₄r⁸)；

In the formula II, (x ', y') represents the normalized coordinates of the corresponding point in the horizontal view image before de-distortion, (x, y) represents the normalized image coordinates of the corresponding point in the undistorted horizontal view image, k ₁、k₂、k₃、k₄ represents the distortion fitting coefficient, and r represents the distance from the corresponding point in the undistorted horizontal view image to the center point of the image.

It can be understood that the three-dimensional scene observation from the horizontal view angle to the bird's eye view angle is realized through the view angle conversion of the scene image acquired by the fisheye camera, and more comprehensive surrounding information is provided for a driver and a vehicle. Through de-distortion processing and overlook conversion, the distorted scene image around the vehicle is successfully converted into an undistorted aerial view image, and the geometric accuracy and the information restoration degree of the image are improved.

Step S220, determining the respective pixel fusion weights of any two adjacent aerial view images from a preset fusion weight table aiming at the image superposition area between every two adjacent aerial view images.

In the present embodiment, the image overlapping region between the bird's-eye view images is an overlapping portion of the bird's-eye view images corresponding to the view angle range between two adjacent fish-eye cameras on the vehicle. The pixel fusion weight represents the weight ratio of each pixel point in the overlapping area when the bird's-eye view angle image is spliced.

Specifically, after the horizontal view images shot by two adjacent fish-eye cameras are subjected to overlook transformation to obtain corresponding bird's-eye view images, overlapping portions of the bird's-eye view images are confirmed, and the weight ratio of corresponding pixel points in the overlapping portions is determined from a fusion weight table. It can be understood that the fusion weight ratio of each pixel point in the overlapping area of the bird's-eye view angle images corresponding to each fish-eye camera on the vehicle is recorded in the fusion weight table. In this embodiment, the sum of the pixel fusion weights of the corresponding pixels in the overlapping area of the two adjacent bird's-eye view images is 1, the weight ratio of each bird's-eye view image is confirmed according to the fusion effect, in a specific example, for the overlapping area of the two adjacent bird's-eye view images, the pixel fusion weight of the corresponding pixels in one bird's-eye view image is sequentially increased from 0 to 1, the fusion weight of the corresponding pixels in the other bird's-eye view image is correspondingly decreased from 1 to 0, and the fusion weight with the best statistical fusion effect is recorded in the fusion weight table for the subsequent image fusion processing.

Through construction and recording of the pixel fusion weight table, optimal fusion of the aerial view images shot by the adjacent fish-eye cameras in the overlapping area is achieved. And the optimal fusion effect is determined by gradually adjusting the fusion weights of pixels in the overlapping areas of the adjacent aerial view images, so that the images are in natural and seamless transition in the overlapping areas.

Step S230, for each two adjacent aerial view images, fusion processing is carried out on the image superposition areas of the two adjacent aerial view images based on the two pixel fusion weights, and the fused adjacent aerial view images are obtained.

In the present embodiment, for each of two adjacent bird's-eye view images, the fusion processing is performed on the image overlapping region between the two images by using the pixel fusion weights corresponding to the respective pixel points in the two images.

Specifically, according to the pixel fusion weights in the predetermined fusion weight table, the pixel points in the overlapping areas of the adjacent aerial view images are subjected to weighted fusion.

Step S240, generating a panoramic scene image with a bird 'S-eye view angle based on all the fused adjacent bird' S-eye view angle images.

In the present embodiment, one entire image composed of all the fused adjacent bird's-eye view angle images is a panoramic scene image.

It can be appreciated that the final panoramic scene image reflects the overall situation of the surrounding environment of the vehicle, and can provide a more comprehensive information basis for subsequent object detection. The method provides more detailed and accurate environmental awareness for driving, and ensures safe driving of the vehicle under complex road conditions.

Step S300, for each two adjacent scene images, determining an identified target in an image overlapping area between the two scene images, and determining ranging information between the identified target and the vehicle, which is obtained by the two current cameras respectively.

The two current cameras are cameras respectively corresponding to two adjacent scene images, and in a preset fusion weight table, the two current cameras respectively have a ranging fusion weight.

Step S400, for each two adjacent scene images, based on the respective ranging fusion weights of the two current cameras, fusing the two ranging information to obtain the unique ranging information between the identified target and the vehicle.

In this embodiment, the identified object is an object identified in the scene image, for example, a person, a dog, or the like in the scene image may affect the running of the vehicle, and it is understood that the scene images respectively captured by the two cameras at the same time may have the same identified object in the image overlapping area.

The ranging information represents a measured distance between the identified object and the vehicle at world coordinates. It will be appreciated that the calculated range information for the identified object may be different from scene images captured by different cameras due to the effects of camera errors.

The ranging fusion weight represents a weight ratio corresponding to ranging information between an identified object and a vehicle in a scene image captured by each camera. According to the embodiment, the distance detection is respectively carried out on the identified targets in each scene image, the distance measurement information corresponding to each target is determined, and the weight fusion is carried out on the distance measurement information, so that more accurate unique distance measurement information is obtained.

Specifically, after the ratio of the world coordinates to the pixel coordinates is obtained, the distance measurement information of the identified object from the camera under the world coordinates can be calculated by calculating the pixel distance of the identified object in the scene image, and the specific formula is as follows:

In equation three, l _world is the world coordinates of the identified object, l _pixel is the pixel coordinates of the identified object in the scene image, Δdistance _pixel is the pixel distance of the identified object in the scene image, and Δdistance _world is the ranging information of the identified object in the world coordinates.

After the distance measurement information between the recognized targets and the vehicles, which are respectively obtained by the two current cameras, is obtained through calculation, the two distance information are weighted and fused according to the distance measurement fusion weights of the two cameras.

In this embodiment, the ranging fusion weight of each camera is determined by the ranging error of the camera in the image fusion area. It can be understood that due to algorithm errors, camera performance and other reasons, the calculated distance measurement information between the identified target and the cameras may have a certain error with the actual distance, the distance information between two adjacent cameras and the same target object at different positions may be calculated respectively, the distance information and the actual distance are compared to obtain respective error distribution conditions of the two adjacent cameras, respective distance measurement fusion weights of the two adjacent cameras are determined according to the error distribution conditions, and the distance measurement fusion weights are recorded in a weight fusion table so as to facilitate determination of unique distance measurement information in the subsequent actual target identification process.

In a specific example, the manner in which the ranging fusion weights are determined is shown in equation four:

Wherein w _i is the ranging fusion weight of the ith camera, delta _i represents the variance between the ranging errors of the ith camera and the targets at different positions in the image fusion area, and it can be understood that the variance between the ranging errors can represent the error distribution condition of the cameras.

It can be appreciated that, in this embodiment, for the view angle overlapping area of two adjacent cameras, according to the ranging fusion weight of the ranging information between each camera and the identified target, the ranging information corresponding to each camera is weighted and fused, so that more accurate unique distance information between the vehicle and the target object can be obtained, which is helpful for reducing uncertainty caused by the ranging error of a single camera, and improving the overall ranging accuracy.

Step S500, generating marker information of the identified object in the panoramic scene image based on the unique ranging information.

It may be appreciated that after the unique ranging information between the identified object and the vehicle is determined, the position coordinate information of the identified object relative to the vehicle may be determined according to the unique ranging information, for example, in a world coordinate system (with the vehicle as the center), the position coordinate of the identified object in the world coordinate system may be determined according to the unique ranging information of the object and the angle between the identified object and the vehicle, the position information corresponding to the identified object in the panoramic scene image may be determined according to the conversion relationship between the world coordinate system and the image coordinate system of the panoramic scene image, and the mark information of the identified object may be generated at the corresponding position.

It can be seen that the application realizes the acquisition of 360-degree aerial view scene information around the vehicle, and the accurate measurement of object information around the vehicle through the ranging fusion weight, and the labeling of the identified object in the panoramic scene image, and it can be understood that the labeling of the identified object in the panoramic scene image can avoid the problem that the object disappears in the display screen or the position of the object relative to the vehicle body is difficult to determine due to the defect of image dislocation during image stitching in the traditional panoramic image system, and the vehicle driver can accurately know the situation of the surrounding obstacle of the vehicle according to the marking information in the panoramic scene image, thereby realizing timely avoidance.

Further, referring to fig. 4, fig. 4 is a flowchart illustrating a second embodiment of the panoramic object detection method according to the present application, in which step S300 includes:

step S310, respectively inputting each scene image into a target detection model to obtain an identified target; the identified targets have a classification confidence and a location confidence.

Step S320, obtaining the comprehensive confidence according to the classification confidence and the positioning confidence of the target image area of the identified target.

Step S330, the identified target with the comprehensive confidence coefficient larger than the preset confidence coefficient is taken as the final identified target to be output.

In this embodiment, each camera around the vehicle corresponds to a target detection model, and the target detection model may be used to detect an identified target in the scene image acquired by the corresponding vehicle camera. The identified targets are people, other vehicles, dogs, etc. that need to avoid during driving. In this embodiment, the classification confidence represents the confidence of the target detection model in the class to which the identified target belongs. Specifically, for each target identified by the target detection model, the classification confidence measures the probability or confidence level that the target belongs to a particular class. For example, if the model detects that a certain object is a vehicle, the classification confidence indicates how likely it is that the region is indeed a vehicle.

The location confidence represents the confidence of the object detection model in the accuracy of the location of the identified object in the image. Specifically, for each target detected by the target detection model, the positioning confidence measures how confident the model is in the accuracy of the position of the target in the image.

The integrated confidence represents an integrated measure of the confidence of the object detection model in both classifying and locating the identified object. Specifically, the integrated confidence is obtained by multiplying the classification confidence and the positioning confidence of the target image region. The overall confidence level of the model for each identified object can be more comprehensively reflected by the overall confidence, and the higher the overall confidence, the more confident the model is in terms of the category and location information of the identified object. Specifically, the identified targets with integrated confidence greater than the preset confidence may be used as the final output.

In a specific example, data of a target object to be identified in an actual application scene is acquired offline, wherein the data comprise images under different angles, different illumination, different environments, different backgrounds and the like, the images are marked and input into models capable of executing image identification analysis tasks such as a machine learning model and a neural network model for training, a model training convergence state is confirmed by comparing prediction information output by model reasoning with a verification data set marked manually, a required target detection model is finally obtained, and the target detection model outputs positioning confidence coefficient, classification confidence coefficient, a 2D detection frame and key points of the identified target.

It will be appreciated that the object detection model may output multiple overlapping detection frames for the same identified object, which need to be screened in order to avoid repetition of the count. In this embodiment, WEIGHTED NMS (weighted non-maximum suppression) algorithm is adopted to perform weighted fusion on each detection frame, and the detection frame with the highest confidence is selected as the final detection result.

Specifically, firstly, the positioning confidence coefficient and the classification confidence coefficient of each detection frame of the identified target are obtained, the detection frames lower than the threshold value of the confidence coefficient of the specific position are primarily screened through the positioning confidence coefficient, then the classification result with the maximum classification confidence coefficient is selected from the primarily screened detection frames as the classification of each detection frame, and the detection frames are classified and combined.

And then taking the detection frame with highest positioning confidence in the classified and combined detection frames as a reference frame, calculating the intersection ratio of other detection frames and the detection frame, confirming the weight factor of each detection frame according to the magnitude of the intersection ratio, wherein the larger the intersection ratio is, the higher the weight factor of the detection frame is, and carrying out weighted fusion on the classified and combined detection frames according to the weight factors to obtain the unique detection frame to be output in the same identified target. It can be appreciated that by the above manner, detection marks can be performed on all the identified targets in one scene image, and each different identified target corresponds to only one detection frame to be output.

Further screening the identified targets, setting different confidence coefficient thresholds according to different target object types, multiplying the positioning confidence coefficient and the classification confidence coefficient of the detection frame to be output in each identified target to obtain comprehensive confidence coefficient, taking the comprehensive confidence coefficient as a standard for judging whether the target is a real target or not, and outputting the identified target with the comprehensive confidence coefficient higher than the preset confidence coefficient.

It can be appreciated that the present embodiment is a process of identifying the target object in the scene image acquired by each vehicle camera, and by introducing the comprehensive confidence as a criterion for determining whether the identified target is correct, a more comprehensive and reliable target sensing capability is provided, and the accuracy and robustness of target object identification are improved.

In order to verify the detection precision and recall rate of the target detection in the example, a public data set and a custom scene data set are used for test comparison, various performance indexes are analyzed and compared, a picture data set under 10137 complex scenes is used as a test set, total 39418 labels are used, the data set is tested by using an original target detection model, and the data in table 1 are obtained after statistics and serve as the performance indexes of the original algorithm model to serve as references.

Next, the same batch of data set is tested using the target detection model of this example, and the performance index of table 2 as the self-grinding algorithm is obtained after statistics as a reference.

The comparison between the performance indexes of the original algorithm model and the algorithm model provided by the example to obtain table 3 as a reference, and it can be seen from table 3 that, compared with the original algorithm model, the detection accuracy and recall rate of the target detection model in the example are improved to different degrees, which further indicates that the embodiment provides more comprehensive and reliable target perception capability, and improves the accuracy of target object identification.

Further, referring to fig. 5 as a specific embodiment, in this embodiment, step S330 includes:

In step S331, the identified target with the integrated confidence level greater than the preset confidence level is used as the final identified target.

Step S332, updating the current target detection set based on the final identified target.

Step S333, outputting all the identified targets in the current target detection set.

In this embodiment, the object detection set is a set of identified objects in the acquired previous frame of scene image.

In the embodiment, a process of dynamically identifying objects of scenes around a vehicle is performed, specifically, after identifying objects in a scene image of a current frame, identifying the identified objects with comprehensive confidence coefficient greater than preset confidence coefficient as final identified objects, searching and matching the final identified objects identified by the current frame with the identified objects of a previous frame in a target detection set, updating the target detection set, and then synchronizing the objects in the updated target detection set with corresponding labels in a panoramic scene image.

In a specific example, a current frame scene image acquired by each camera of the vehicle is input into a target detection model to be predicted to obtain each identified target, if the current frame is a first frame, relevant data of the identified target is directly stored into a target detection set, and the target detection set mainly comprises abscissa information and ordinate information of a center point of a lower edge of a detection frame of the identified target, and the width and length of the detection frame. And if the current frame is not the first frame, sequentially matching the identified target of the current frame with the identified target of the previous frame in the target detection set.

Specifically, if the identified target of the current frame is successfully matched with the identified target of a certain previous frame in the target detection set, the identified target of the current frame and the identified target of the previous frame which is successfully matched are considered to be the same target, the state of the identified target of the previous frame is updated by using a Kalman filtering algorithm, and the related data of the identified target of the previous frame is subjected to weighted fusion processing by combining the related data of the identified target of the current frame, so that more stable and smooth updated data is output.

If the current frame identified target fails to find the previous frame identified target matched with the current frame identified target in the target detection set, the current frame identified target is considered to be a new appearing target object, and the current frame identified target is stored in the target detection set.

If the identified target in the target detection set does not exist in the current frame of the target detection set, which is matched with the identified target in the three continuous frames, the identified target in the target detection set is considered to have moved out of the vehicle detection range, and the identified target is removed from the tracking algorithm list.

It can be understood that by determining the target with the comprehensive confidence higher than the preset threshold as the final identified target, the system can identify the target timely and accurately in the current frame scene image, and then dynamically update the target detection set through steps of searching for matching, kalman filtering, fusion processing and the like. The method effectively reduces data redundancy, realizes stable tracking of the motion state of the target, automatically processes the newly-appearing target and the target moving out of view, improves the perceptibility of the system to the dynamic target in the surrounding scene of the vehicle, and provides more accurate and reliable input information for subsequent processing.

Further, referring to fig. 6, fig. 6 is a flowchart illustrating a third embodiment of the panoramic object detection method according to the present application, in which step S400 includes:

In step S410, if the identified object appears in the image overlapping area, the image features corresponding to the identified object are identified in the two scene images respectively.

Step S420, determining binocular distance information of the identified object according to the pixel position information of the image features in the binocular imaging system.

Step S430, fusing the binocular range information with the two ranging information based on the two ranging fusion weights and the binocular range information to obtain the unique ranging information of the identified target.

The binocular imaging system consists of two cameras corresponding to two scene images. The binocular fusion weight represents the weight ratio of binocular range information when unique range information is confirmed.

In this embodiment, binocular distance information refers to distance information of an identified object in an image fusion area acquired by a binocular imaging system. It will be appreciated that two cameras on a vehicle having overlapping viewing angle regions can be considered a binocular imaging system, and that binocular range information for an identified object relative to the binocular imaging system can be determined based on the binocular imaging principles as shown in fig. 7. In fig. 7 Ol, or is the optical center of two adjacent cameras, xl, xr is the pixel position of the ranging reference point P of the identified target on the s1 imaging plane and the s2 imaging plane, and Z is the binocular ranging information between the ranging reference point P and the binocular imaging system.

Specifically, a ranging reference point of an identifiable object in an image superposition area of one scene image is obtained, a ranging reference point corresponding to the identified object is searched and matched in the other scene image through a feature matching algorithm, and binocular range information of the identified object relative to a binocular imaging system can be calculated according to pixel position information of the two ranging reference points on the corresponding image. Specifically, the formula five is as follows:

Z＝f*B/D；

In the fifth formula, D is the parallax between two cameras in the binocular imaging system, which can be determined by the pixel position information of the ranging reference points corresponding to the two cameras, B is the baseline distance between the two cameras, f is the focal length corrected by the two cameras in the binocular imaging system, and Z is the binocular range information.

After the binocular range information is confirmed, weighting and fusing are carried out on the binocular range information and the ranging information of the two adjacent cameras respectively according to the ranging fusion weights corresponding to the two adjacent cameras and the binocular fusion weights corresponding to the binocular imaging system, and unique ranging information is calculated.

It can be appreciated that the binocular fusion weights may also be obtained by looking up in a fusion weight table. Binocular fusion weights in the fusion weight table can be determined by calculating the error distribution condition of binocular ranging information and actual ranging information between the binocular imaging system and targets at different positions.

It can be appreciated that by combining the binocular ranging information with the ranging information of two neighboring cameras, the accuracy and robustness of overall ranging can be further improved.

Further, referring to fig. 8, fig. 8 is a flowchart illustrating a fourth embodiment of the panoramic object detection method according to the present application, in which step S400 includes:

step S440, the identified target perception information acquired by the perception sensor is acquired.

Step S450, determining the sensing fusion weight corresponding to the sensing sensor from a preset fusion weight table.

Step S460, based on the two ranging fusion weights and the perception fusion weight, fusing the two ranging information and the perceived information of the identified target to obtain the unique ranging information of the identified target.

In the present embodiment, the sensing sensor may be an environment sensing device such as an ultrasonic radar, a millimeter wave radar, or the like. The recognized object sensing information may be distance information of the recognized object to the vehicle body detected by the sensing sensor in the viewing angle overlapping region. The sensing fusion weight can be the weight occupied by the target sensing information measured by each sensing sensor when fusion is carried out.

Specifically, a sensing fusion weight corresponding to the sensor can be obtained from the fusion weight table, and the distance measurement information and the target sensing information are subjected to weighted fusion according to the sensing fusion weight and the distance measurement fusion weight, so that the unique distance measurement information is obtained.

In one specific example, in an object recognition system consisting of n sensing sensors, the sensors s1, s2, …, sn detect the same object, and the local estimation errors of any two sensors are uncorrelated. In the fusion weight table, the perceived fusion weights of the sensors are w ₁,w₂,…,w_n(w₁+w₂+…+w_n =1 respectively, and the perceived fusion weights are determined by the ranging error distribution condition of the sensors in the view angle superposition area. Specifically, the formula six is as follows:

where w _i is the sensing fusion weight of the ith sensing sensor, δ _i represents the variance between the ranging errors of the ith sensing sensor and the targets at different positions in the view angle overlapping area, and it is worth mentioning that in this formula, δ _i also includes the variance between the ranging errors of the targets at different positions in the view angle overlapping area of the camera, that is, w _i also includes the ranging fusion weight.

After confirming the sensing fusion weights corresponding to the sensors, the target sensing information (ranging information) measured by n sensing sensors (including cameras) can be weighted and fused to obtain final unique ranging information.

It can be appreciated that in this embodiment, by combining the ranging information of the target object acquired by the camera and the sensing sensor, the fusion of the multi-source ranging information of the same target is achieved by using the ranging fusion weight and the sensing fusion weight respectively corresponding to the camera and the sensing sensor in the fusion weight table, so that the accuracy and the reliability of the ranging information of the target are improved, and the error between the target ranging information and the actual distance is reduced.

Further, referring to fig. 9, fig. 9 is a flowchart illustrating a fourth embodiment of the panoramic object detection method according to the present application, and in this embodiment, step S400 further includes:

step S600, determining the alarm level of the identified targets according to the unique ranging information of each identified target.

Step S700, determining an alarm target corresponding to the highest alarm level from all the identified targets.

Step S800, determining a target camera to which the alarm target belongs.

Step S900, displaying a scene image of the target camera.

In this embodiment, the alarm levels are classified according to the distance between the identified object and the vehicle body, for example, the alarm levels are classified into safety, normal alarm and danger alarm from low to high according to the distance between the object and the vehicle body.

Specifically, in this embodiment, the unique ranging information of the identified target is used as a criterion for judging the alarm level, the unique ranging information of all the identified targets in the panoramic scene image is detected, and the identified target with the unique ranging information meeting the highest alarm level is determined as the alarm target. And then, determining a target camera for detecting the alarm target, displaying a scene image acquired by the camera at the current moment, and marking the distance information of the identified target from the car body at the corresponding position of the image to provide a reference for a driver.

It can be appreciated that in the embodiment, the alarm grade is divided based on the unique ranging information of the identified target, the alarm target is further determined, the corresponding scene image is displayed, the accurate detection and alarm prompt of the panoramic target are realized, and the perception and coping ability of a driver to the surrounding environment are improved, so that the driving safety is enhanced.

In order to enable those skilled in the art to better understand the present embodiment, the following description is made with reference to a specific implementation example in a specific application scenario, and it is to be understood that the following example is only used to explain the present application and is not intended to limit the scope of the claims of the present application.

As shown in fig. 10, fig. 10 is a schematic structural diagram of a panoramic object detection system of the present example, in which the system includes a camera, the camera is mounted on a target vehicle according to the method shown in fig. 11, and the camera is transmitted to an image capturing module through an AHD signal.

The image capturing module is used for respectively giving the image frames transmitted by the camera to the subsequent looking-around module, the calibration module and the AI module for different algorithm processing.

Parameters of all modules in the system can be configured through the calibration module, and the parameters are synchronized to each module, and specifically, the calibration module is configured with internal parameters of each camera, distortion fitting coefficients required by image distortion removal, weight fusion tables required by weight fusion and homography matrix information required by overlook transformation.

It should be noted that, in the parameter configuration process, the calibration module selects relevant parameters of the effective area in the view angle range of the camera to configure, for example, relevant configuration information of scene information, which is obtained by the camera and is far away from the vehicle body, cannot be configured in the calibration module, so that other modules can only search for relevant information of the effective area when performing relevant operations such as table lookup, and the time consumption of algorithm operation is reduced.

The splicing module can be used for performing perspective conversion and splicing treatment on the same frame of image picture from each camera, obtaining a complete 360-degree looking-around spliced picture after treatment, and outputting the picture to the display screen.

And analyzing and framing the specified target in the image picture of each camera through the AI model, and simultaneously calculating the distance information of the specified target.

The same target objects in each image picture in the image splicing area are subjected to fusion association through a fusion module by a weighted fusion algorithm, so that more accurate target distance information is obtained, meanwhile, the position of the target object relative to the car body is determined according to the target distance information, the corresponding position of the target object in the panoramic scene image of the display screen is marked, and corresponding marking information is generated.

As shown in fig. 12, the present example divides different alarm levels in the panoramic scene image, the range of the alarm levels can be flexibly configured between 0 meters and 10 meters, and the alarm levels are divided into a normal alarm area and a dangerous alarm area, and the detected target object is displayed on the display screen with a visual tracking identifier. According to the target distance information output by the fusion module, the alarm sounds of different levels and the color change of the tracking ring are used for reminding in different dangerous areas, so that the burden of processing invalid information by a driver is greatly reduced.

It can be understood that the error between the target distance information output by the fusion module and the actual distance is smaller, so that when the target is positioned at the edge of the alarm area, missing report or false report caused by precision loss is avoided, the alarm accuracy is improved, and when the target is positioned at the camera view angle superposition area, the target is not switched back and forth between the adjacent cameras due to data instability, and the efficiency and safety are improved.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a computer-readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

Based on the same inventive concept, the application also provides a panoramic object detection device, which comprises:

The acquisition module is used for acquiring scene images acquired by all cameras around the vehicle;

the panoramic image generation module is used for generating panoramic scene images with bird's eye view based on all scene images;

The distance measurement module is used for determining an identified target in an image superposition area between two scene images aiming at each two adjacent scene images and determining distance measurement information between the identified target and a vehicle, which is respectively obtained by the two current cameras; the two current cameras are cameras respectively corresponding to two adjacent scene images, and in a preset fusion weight table, the two current cameras respectively have a ranging fusion weight;

It should be noted that, in this embodiment, the technical effects achieved by the embodiments of the panoramic object detection apparatus may refer to various implementations of the panoramic object detection method in the foregoing embodiments, and are not repeated herein.

It should be further noted that the above-described apparatus embodiments are merely illustrative, where elements described as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Claims

1. A panoramic object detection method, the method comprising:

acquiring scene images acquired by cameras around a vehicle;

Generating panoramic scene images of a bird's eye view based on all the scene images;

For each two adjacent scene images, determining an identified target in an image superposition area between the two scene images, and determining ranging information between the identified target and the vehicle, which is respectively obtained by two current cameras; the two current cameras are cameras respectively corresponding to the two adjacent scene images, and in a preset fusion weight table, the two current cameras respectively have a ranging fusion weight;

Fusing two ranging information based on respective ranging fusion weights of the two current cameras aiming at each two adjacent scene images to obtain unique ranging information between the identified target and the vehicle;

2. The panoramic object detection method of claim 1, wherein said generating a panoramic scene image from a bird's eye view based on all of said scene images comprises:

determining the respective pixel fusion weights of any two adjacent aerial view images from the preset fusion weight table aiming at the image superposition area between every two adjacent aerial view images;

aiming at each two adjacent aerial view angle images, based on the two pixel fusion weights, carrying out fusion processing on the image superposition areas of the two adjacent aerial view angle images to obtain fused adjacent aerial view angle images;

and generating panoramic scene images of the bird's-eye view angles based on all the fused adjacent bird's-eye view angle images.

3. The panoramic object detection method of claim 1, wherein said determining, for each two adjacent scene images, an identified object in an image registration area between the two scene images comprises:

Respectively inputting each scene image into a target detection model to obtain an identified target; the identified targets have a classification confidence and a positioning confidence;

obtaining comprehensive confidence coefficient according to the classification confidence coefficient and the positioning confidence coefficient of the target image area of the identified target;

4. The panorama target detection method according to claim 3, wherein said outputting the identified target having the integrated confidence level greater than a preset confidence level as a final identified target comprises:

Updating a current target detection set based on the identified targets;

Outputting all the identified targets in the current target detection set.

5. The panoramic object detection method of claim 1, wherein for each two adjacent scene images, based on respective ranging fusion weights of two current cameras, fusing two ranging information to obtain unique ranging information between the identified object and the vehicle, comprising:

If the identified target appears in the image overlapping area, respectively identifying image features corresponding to the identified target in the two scene images;

Determining binocular distance information of the identified target according to pixel position information of the image features in a binocular imaging system; the binocular imaging system is composed of two cameras corresponding to the two scene images;

and fusing the binocular range information with the two ranging information based on the two ranging fusion weights and the binocular range information to obtain the unique ranging information of the identified target.

6. The panoramic object detection method of claim 1, wherein for each two adjacent scene images, based on respective ranging fusion weights of two current cameras, fusing two ranging information to obtain unique ranging information between the identified object and the vehicle, comprising:

And fusing the two ranging information and the identified target perception information based on the perception fusion weight and the two ranging fusion weights to obtain the unique ranging information of the identified target.

7. The panoramic object detection method of claim 1, wherein said fusing two ranging information based on respective ranging fusion weights of two current cameras for each two adjacent scene images, after obtaining unique ranging information between the identified object and the vehicle, further comprises:

Determining a target camera to which the alarm target belongs;

And displaying the scene image of the target camera.

8. A panoramic object detection apparatus, the panoramic object detection apparatus comprising:

The panoramic image generation module is used for generating panoramic scene images with bird's eye view based on all the scene images;

The distance measurement module is used for determining an identified target in an image superposition area between two adjacent scene images and determining distance measurement information between the identified target and the vehicle, which is respectively obtained by two current cameras; the two current cameras are cameras respectively corresponding to the two adjacent scene images, and in a preset fusion weight table, the two current cameras respectively have a ranging fusion weight;

The fusion module is used for fusing the two ranging information based on the respective ranging fusion weights of the two current cameras aiming at each two adjacent scene images to obtain the unique ranging information between the identified target and the vehicle;

And the identification module is used for generating the marking information of the identified target in the panoramic scene image based on the unique ranging information.

9. A vehicle, characterized by comprising: a processor, a memory and a panoramic object detection program stored in the memory, which when executed by the processor, performs the steps of the panoramic object detection method according to any one of claims 1-7.

10. A computer readable storage medium, wherein a panoramic object detection program is stored on the computer readable storage medium, which when executed by a processor implements the panoramic object detection method according to any one of claims 1 to 7.