[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111914715B - Intelligent vehicle target real-time detection and positioning method based on bionic vision - Google Patents

Intelligent vehicle target real-time detection and positioning method based on bionic vision Download PDF

Info

Publication number
CN111914715B
CN111914715B CN202010721224.7A CN202010721224A CN111914715B CN 111914715 B CN111914715 B CN 111914715B CN 202010721224 A CN202010721224 A CN 202010721224A CN 111914715 B CN111914715 B CN 111914715B
Authority
CN
China
Prior art keywords
image
target
focal length
lens
reference image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010721224.7A
Other languages
Chinese (zh)
Other versions
CN111914715A (en
Inventor
安成刚
张立国
李巍
李会祥
吴程飞
张志强
王增志
张旗
史明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Langfang Heyi Life Network Technology Co ltd
Original Assignee
Langfang Heyi Life Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Langfang Heyi Life Network Technology Co ltd filed Critical Langfang Heyi Life Network Technology Co ltd
Priority to CN202010721224.7A priority Critical patent/CN111914715B/en
Publication of CN111914715A publication Critical patent/CN111914715A/en
Application granted granted Critical
Publication of CN111914715B publication Critical patent/CN111914715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an intelligent vehicle target real-time detection and positioning method based on bionic vision, which comprises the following steps: acquiring images of different focal length lenses in a multi-view imaging device of the intelligent vehicle in real time; detecting the category and position information of each target in each image; dividing the reference image into a common visual area and a non-common visual area according to the homogeneous coordinate system transformation relation of the object space of each lens; aiming at the common visual area of each image, performing three-dimensional reconstruction and positioning on a target in the common visual area by adopting a binocular different focal length stereoscopic vision reconstruction method to obtain three-dimensional positioning information and category label semantic information of the target; and aiming at the non-common visual area of the reference image, acquiring angle positioning information of the target in each non-common visual area so as to construct a vector map with semantic information for the intelligent vehicle. The method solves the technical problems that the three-dimensional semantic map cannot be constructed on line in real time and the calculated amount and the calculated complexity of the visual point cloud data in unmanned driving cannot be reduced in the prior art.

Description

Intelligent vehicle target real-time detection and positioning method based on bionic vision
Technical Field
The invention relates to the technical field of visual detection and positioning, in particular to an intelligent vehicle target real-time detection and positioning method based on bionic vision.
Background
In recent years, with the continuous and deep research of human beings on the bionics technology, the artificial intelligence technology is developing forward at an unprecedented speed and realizing breakthrough. A bionic vision sensor and an efficient computer vision solution are constructed by simulating a human eye vision system and an insect compound eye imaging mechanism, and the real-time detection and positioning of targets are the research hotspots of the current bionic vision artificial intelligence technology. Different from the perception that only distance can be perceived by laser radar, the vision sensor can also realize the identification of complex semantic information such as traffic signs, traffic lights, lane lines and the like, so that the unmanned automobile technology research based on bionic vision has great theoretical significance and application value.
The existing unmanned solution (such as ORB-SLAM, VINS-MONO, DSO) based on visual three-dimensional positioning and map construction (SLAM) is to establish a three-dimensional point cloud sparse or dense map corresponding to a plurality of image sequences by extracting and matching feature points of the whole image.
In addition, most of current visual SLAM researches are based on monocular motion + inertial navigation or a parallel binocular stereo vision model, that is, the same motion or two cameras with completely same parameters are used for performing target three-dimensional reconstruction through parallax of two images, and the imaging mode with a single visual angle has the defect that the balance among the field angle, the resolution and the detection speed is difficult to satisfy.
Therefore, on the premise of large visual field, high precision and quick perception, how to effectively reduce the complexity of the visual positioning algorithm and the calculated amount of three-dimensional point cloud data, and meanwhile, the semantic SLAM can be constructed on line, which becomes the difficult point of unmanned technology engineering.
Disclosure of Invention
Technical problem to be solved
In view of the defects and shortcomings of the prior art, the invention provides an intelligent vehicle target real-time detection and positioning method based on bionic vision, and solves the technical problem that the calculated amount of visual data in unmanned driving cannot be reduced, so that semantic SLAM cannot be constructed on line in real time in the prior art.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
in a first aspect, an embodiment of the present invention provides a bionic vision-based method for detecting and positioning an intelligent vehicle target in real time, including:
s1, acquiring images of different focal length lenses in the multi-view imaging device of the intelligent vehicle in real time;
step S2, detecting the type and position information of each target in the image of each lens;
step S3, dividing the reference image into a common visual area and a non-common visual area according to the homogeneous coordinate system transformation relation of object space corresponding to the lenses with different focal lengths by taking the image of the lens with the smallest focal length as the reference image;
step S4, aiming at the common visual area of each image, performing three-dimensional reconstruction and positioning on the target in the common visual area by adopting a binocular different focal length stereoscopic vision reconstruction method to obtain three-dimensional positioning information and category label semantic information of all targets in the common visual area in the reference image;
aiming at a non-common view area of a reference image, dividing an upper area, a lower area, a left area, a right area and a left area according to a first visual angle of a camera device to which the image belongs, and acquiring angle positioning information of a target in each non-common view area in a perspective grid dividing mode;
and constructing a vector map with semantic information for the intelligent vehicle by using the three-dimensional positioning information, the category label semantic information and the angle positioning information corresponding to the non-common visual area.
Specifically, in practical applications, the step S3 belongs to a coordinate transformation relationship in a two-dimensional coordinate system, the step S4 belongs to a reconstruction method in a three-dimensional coordinate system, and the constructed vector map includes key information of objects in all focal length images, which realizes online real-time construction, and thus can be used for real engineering applications, and has high practicability.
Optionally, the step S2 includes:
detecting the type and position information of each target in the image of each lens by adopting a YOLOv5 target real-time detection algorithm;
and/or, the target comprises: traffic lights, speed limit signs, pedestrians, small animals, vehicles or lane lines, etc.;
the category and location information includes: position information, size information, and category information of the object in each image.
Optionally, the step S3 includes:
when the number of the focal length lenses is more than or equal to 3, dividing different focal length lenses into a long focal length lens and a short focal length lens;
regarding the short-focal-length lens, taking an image of a lens with the smallest focal length in the short-focal-length lens as a first reference image, and dividing imaging sections of field angles of other lenses in the short-focal-length lens, which correspond to the first reference image, into a first common visual area and a first non-common visual area;
and regarding the long-focus lens, taking the image of the lens with the minimum focal length in the long-focus lens as a second reference image, and dividing the imaging interval of the second reference image corresponding to the field angle of other lenses in the long-focus lens into a second common visual area and a second non-common visual area.
Optionally, the step S4 includes:
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area in a first reference image by adopting a binocular different focal length stereoscopic vision reconstruction method;
acquiring angle positioning information of each target in a non-common visual area I in a reference image I in a perspective grid division mode;
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area II in a reference image II by adopting a binocular different-focal-length stereoscopic vision reconstruction method;
acquiring angle positioning information of each target in a non-common visual area II in a reference image II in a perspective grid division mode;
accordingly, the method further comprises step S5,
step S5 includes:
and fusing all the acquired three-dimensional positioning information, category label semantic information, all the angle positioning information and the three-dimensional point cloud data in the multi-view imaging device to construct a three-dimensional semantic map for the intelligent vehicle.
Optionally, the step S4 includes:
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area in a first reference image by adopting a binocular different focal length stereoscopic vision reconstruction method; acquiring angle positioning information of each target in a non-common visual area I in a reference image I in a perspective grid division mode;
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area II in a reference image II by adopting a binocular different-focal-length stereoscopic vision reconstruction method; acquiring angle positioning information of each target in a non-common visual area II in a reference image II in a perspective grid division mode;
and converting the three-dimensional positioning information and the category label semantic information of all the targets in the common visual area of the first reference image and the second reference image and the angle positioning information of all the targets in the non-common visual area to a preset global coordinate system (latitude and longitude under a geodetic coordinate system) through a series of coordinate transformation.
Optionally, the step S3 includes:
if it is detected in step S2 that the same target has corresponding corner points a, b, c, d in the image of the lens with any focal length; the corresponding corner points a ', b', c ', d' existing in the image of the lens with the smallest focal length;
when the coordinate systems of the two lenses are not coincident and/or the focal lengths of the lenses are different, based on the homogeneous coordinate system transformation relation of the object image space, the lens with the minimum focal length is taken as a reference image and is divided into a square-back area consisting of eight angular points a, b, c and d and a ', b', c 'and d';
wherein, the central square of the square-shaped region is an overlapped imaging region as a common visual area;
the central square-shaped areas a, b, c and d are boundary areas a ', b', c 'and d' from the square-shaped areas a, b, c and d to the square-shaped areas, and non-overlapped imaging areas are used as non-common view areas.
Optionally, the step S4 includes:
s4-1, determining and establishing a one-to-one mapping relation of coordinates of each target of the multi-target image in each image according to a two-image-surface coordinate relation (A1);
Figure GDA0003061129440000051
s4-2, detecting the depth information Z value of the target, namely the space point M, in the three-dimensional space by means of a plurality of cameras with different resolutions and focal lengths and according to a binocular parallel vision model;
specifically, the coordinates (X, Y, Z) of the spatial point M are:
Figure GDA0003061129440000052
wherein, the focal length f of the two-phase camera lens1、f2And the base line distance l belongs to a pre-calibrated parameter, and the corresponding relation x of the pixel coordinates is obtained through a formula (A1)1、y1And x2、y2
And S4-3, reconstructing the detection target in the common view area of the reference image according to the three-dimensional coordinate relation (A2) of the space point M, and marking the average depth value of the detection target area.
Optionally, the short-focus lens is an 8mm focal length lens and a 12mm focal length lens, and the long-focus lens is a 16mm focal length lens and a 25mm focal length lens;
the 8mm and 12mm short-focus lenses are used for monitoring short-distance targets of 5-35m in a large field angle of 90 degrees,
the 16mm and 25mm telephoto lenses are used to monitor a distant target of 10-60m within a small field angle of 40 °.
In a second aspect, an embodiment of the present invention provides an intelligent vehicle driving system, including: the multi-view image forming apparatus comprises a control device and a multi-view image forming device connected with the control device, wherein the multi-view image forming device comprises: at least one short focal length lens, at least one long focal length lens;
and after receiving the images collected by the short-focal-length lens and the long-focal-length lens, the control device adopts the real-time detection and positioning method of the intelligent vehicle of the first aspect to construct the three-dimensional semantic map of the intelligent vehicle on line in real time.
In a third aspect, an embodiment of the present invention further provides an intelligent vehicle, including the intelligent vehicle driving system in the second aspect.
(III) advantageous effects
The invention has the beneficial effects that: the method does not need to process all point cloud data, obtains the data of the area where the target is located and obtains the three-dimensional positioning information of the target in a coordinate conversion and three-dimensional reconstruction mode, can effectively reduce the calculation amount of data processing of a visual point cloud matching positioning algorithm and improve the positioning precision of the target, not only can realize the accurate positioning of the target with different fields of view and different distances, but also can greatly improve the working efficiency of data processing of the unmanned vehicle visual point cloud matching positioning algorithm, has the advantages of large field angle, high positioning precision, high processing speed and the like, and can meet the engineering application requirements of the current vision-based unmanned driving technology in an embedded processor.
It should be noted that the method of the embodiment of the present invention simulates the fast perception characteristics of the insect compound eye imaging system, and can utilize a depth learning method to classify and identify the target, and further utilize the space transformation relationship between the camera coordinate system and the world coordinate system to divide the image surface space of each camera into the common visual area and the non-common visual area, the common visual area adopts the multi-view different focal length three-dimensional reconstruction algorithm to accurately position the target, and the object space corresponding to the non-common visual area is divided into the perspective grids to roughly position the target, thereby increasing the observation visual angle, enhancing the processing efficiency of the intelligent vehicle visual system, and realizing the on-line semantic map construction.
Drawings
Fig. 1 is a flowchart of a method for detecting and positioning an intelligent vehicle target in real time based on bionic vision according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a multi-view imaging device;
FIG. 3 is a diagram showing the real-time detection and segmentation effect of the YOLOv5 algorithm on a target;
FIG. 4 is a schematic diagram of the common view region and the non-common view region corresponding to the 8mm and 12mm short-focus lens groups;
FIG. 5 is a schematic diagram of the common view area and the non-common view area corresponding to the 16mm and 25mm telephoto lens groups;
FIG. 6 is a schematic view of binocular disparity focus stereoscopic vision;
FIG. 7 is a diagram illustrating a variation curve of the short-focus camera image plane error Dx with the increase of the test distance L;
FIG. 8 is a diagram illustrating a mapping relationship between segmented target regions in an image established according to formula (4) in the second embodiment;
fig. 9 is a schematic flowchart of a method for real-time detection and positioning of an intelligent vehicle target based on bionic vision according to an embodiment of the present invention;
fig. 10 is a schematic diagram of the average depth of the detection target region in the three-dimensional space, which is obtained by reconstruction according to the formula (2) in the second embodiment;
fig. 11 is a schematic diagram of a real-time three-dimensional semantic map constructed by the method shown in fig. 1 according to an embodiment of the present invention.
Detailed Description
In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In the following embodiments, the lens may be a lens of a video camera or a lens of a camera, and in the detailed description, some embodiments are described using the lens, and some embodiments are described using the lens of the camera or the camera, the lens of the video camera or the lens of the video camera, which means devices for capturing images, and each lens has a fixed focal length.
Example one
As shown in fig. 1, the present embodiment provides a real-time detection and positioning method for an intelligent vehicle target based on bionic vision, which specifically includes the following steps:
s1, acquiring images of different focal length lenses in the multi-view imaging device of the intelligent vehicle in real time;
step S2, the type and position information of each object in the image of each shot is detected.
For example, a YOLOv5 target real-time detection algorithm may be employed to detect the category and location information of targets in the image of each shot.
The objectives of the present embodiment may include: traffic lights, speed limit signs, pedestrians, small animals, vehicles or lane lines, etc.; this embodiment is merely an example, and is not limited thereto, and is determined based on an actual image.
The category and location information includes: position information, size information, category information and the like of the object in each image, the category information may be a category of each object, and the size information may be a size of an area where the object is located.
And step S3, dividing the reference image into a common visual area and a non-common visual area according to the homogeneous coordinate system transformation relation of object space corresponding to the lenses with different focal lengths by taking the image of the lens with the smallest focal length as the reference image.
In this step, a target area of a two-dimensional coordinate system is mapped on the image.
For example, in the lenses with the same resolution, the focal lengths are different, that is, the long focal length and the short focal length object images are different, and then the reference image is determined based on the homogeneous coordinate system transformation relationship of the object image spaces corresponding to the lenses with different focal lengths, and the reference image is divided into a common view area and a non-common view area according to the field angles of other lenses.
For better understanding, as described with reference to fig. 4, assuming that there are four target points (a, B, C, D) in the space, which are respectively imaged at four diagonal points (a, B, C, D) on the image plane corresponding to the camera 1 and four diagonal points (a ', B', C ', D') on the image plane corresponding to the camera 2, it can be known from the homogeneous coordinate system transformation relationship of the object image, if the coordinate systems of the camera 1 and the camera 2 are not coincident or the focal lengths corresponding to the camera 1 and the camera 2 are not the same, the area surrounded by the coordinates of the four image points corresponding to the four target points (a, B, C, D) in the space will divide the reference image corresponding to the camera with a smaller focal length into the rectangular-shaped area as shown in fig. 4, and the central square area represents the imaging area corresponding to the overlapped viewing location of the camera 1 and the camera 2, i.e. the. The region from the four weeks of the central square to the boundary represents the imaging region corresponding to the non-overlapping fields of view of the cameras 1 and 2, i.e. the non-common view region.
Step S4, aiming at the common visual area of each image, performing three-dimensional reconstruction and positioning on the target in the common visual area by adopting a binocular different focal length stereoscopic vision reconstruction method to obtain three-dimensional positioning information and category label semantic information of all targets in the common visual area in the reference image;
aiming at a non-common view area of a reference image, dividing an upper area, a lower area, a left area, a right area and a left area according to a first visual angle of a camera device to which the image belongs, and acquiring angle positioning information of a target in each non-common view area in a perspective grid dividing mode;
and constructing a vector map with semantic information for the intelligent vehicle by using the three-dimensional positioning information, the category label semantic information and the angle positioning information corresponding to the non-common visual area.
For better understanding of step S4, the three-dimensional reconstructed location of the common viewing zone is described below with reference to the formula in the second embodiment.
S4-1, before three-dimensional reconstruction positioning, acquiring coordinate mapping relations of the target to be reconstructed in each image, namely, the target belongs to a multi-target detection and matching problem, wherein the most robust method is to solve the one-to-one mapping relation of each target between the multiple targets in the two images corresponding to other targets by using a linear search method, for example, three targets are A, B and C in the image 1, and three targets are A ', B' and C 'in the image 2, three-pair one-to-one mapping relations of A-A', B-B 'and C-C' are required to be established;
in the prior art, similarity error values of nine pairs of matching relations of AA ', AB ', AC ', BA ', BB ', BC ', CA ', CB ', CC ' need to be calculated respectively, and as the number of target N increases, the time complexity is polynomial time O (N)2) And is very time-consuming. Although the matching process can be accelerated by adopting the nearest search method of the K-D tree in the prior art, the time complexity is still linear logarithmic time O (nlogn), and therefore, the calculation amount cannot be reduced.
In the present application, the relationship between two image plane coordinates in the formula (3) given in the following second embodiment is used to further establish a one-to-one mapping relationship of the multi-target image coordinates, that is, the coordinates (x) of the known target in the image coordinate system 11,y1) The coordinates (x) of the target in the image coordinate system 2 can be uniquely determined by normalizing the focal lengths α and β and the depth value Z of the detected target by the camera2,y2) The corresponding image plane coordinates of the same target in two images are determined directly through the image plane coordinate relation without establishing the one-to-one mapping relation of each target to other targets, the running time is irrelevant to the number of matched targets (constant time O (1)), and the time complexity of the image matching process is effectively reduced.
S4-2, three-dimensional reconstruction and positioning, mainly including detecting the depth information Z value (three-dimensional space) of the target in the three-dimensional space.
The stereo vision reconstruction method mainly comprises two methods, one is to generate multiple views through monocular motion to perform three-dimensional reconstruction, and the other is to perform three-dimensional reconstruction through two identical cameras (a pair of cameras) which are arranged side by side, namely a binocular parallel vision model. And the parallel binocular vision model has the defects that: because two cameras with the same parameters are adopted for image acquisition and calculation, the resolution and the view field are completely the same, so that the view field of the camera with high resolution is small and the large resolution of the view field is small in order to meet the real-time requirement. In order to achieve balance among resolution, field of view and real-time performance, only a multi-resolution camera is used for processing, the multi-resolution camera has two types, one type is a physical mode, a space variable resolution sensor is manufactured by simulating human retina, namely, the resolution distribution of an image is changed by changing the arrangement mode of sensor photosensitive chips, the method for changing the resolution of the image by changing the manufacturing process of the sensor has high cost, normal imaging can be realized only by converting the manufacturing process into a polar coordinate system, and large-scale use is difficult to realize. The other method is that a plurality of cameras with different resolutions or focal lengths are combined, and the method has the advantages of low cost, good operability and wide application range. Therefore, the second method can be used in this embodiment to achieve a balance between resolution and field of view.
Specifically, with reference to the binocular disparity distance three-dimensional reconstruction model shown in fig. 6, it can be obtained that, after the focal lengths of two cameras are known based on the triangle similarity principle, the distance (depth value) formula of the spatial point M from the cameras is as follows:
Figure GDA0003061129440000101
equation (1) above illustrates that if the target depth Z is to be calculated, the focal length f of the two cameras must be known1And f2And a base line distance l, the prior parameters can be obtained by calibrating the cameras, and the corresponding relation x of pixel coordinates in the two cameras needs to be known1And x2. The three-dimensional coordinates of the target point M can be accurately reconstructed through the parameters:
Figure GDA0003061129440000111
in this embodiment, the point clouds in the target area are segmented in the image and are reconstructed, and the point cloud X and Y-axis coordinates in the segmented target area are averaged, and the depth value Z is the value with the highest frequency of occurrence of the point cloud depth value in the target area. That is, the three-dimensional point cloud set corresponding to each detected target is represented by only one average three-dimensional coordinate value, as shown in fig. 10.
It should be noted that, in this embodiment, the above-mentioned manner of constructing the vector map is constructed in real time on line, and the constructed map is different from the point cloud semantic map in the prior art, and only the boundary and the category of the target are labeled, instead of reconstructing the specific color and shape features of the target with the point cloud data, as shown in fig. 11.
In this embodiment, based on the method shown in fig. 1, the following step S5 is further included:
step S5: and fusing all the acquired three-dimensional positioning information, category label semantic information, all the angle positioning information and the three-dimensional point cloud data in the multi-view imaging device to construct a three-dimensional semantic map for the intelligent vehicle.
For example, the point cloud data in the multi-view imaging device includes point cloud data acquired by a laser radar or point cloud data of visual imaging, and the like, and the point cloud data is selected and fused according to actual needs. In the embodiment, the three-dimensional semantic map is constructed in an online manner.
That is to say, the method of the present embodiment can construct the semantic map in real time, and the map constructed by the method of the present embodiment is the semantic map constructed on line, which is simple and convenient to operate, does not need subsequent manual labeling, can satisfy the engineering application of the embedded processor, and has strong practicability compared with the map constructed by the offline method based on the point cloud + manual labeling map construction method in the prior art.
In addition, it should be noted that the method of this embodiment may implement real-time online semantic SLAM construction, that is, a partial map or a complete map constructed by data acquired by the image acquisition device in real time.
In a specific implementation process, the step S3 may include:
when the number of the focal length lenses is more than or equal to 3, dividing different focal length lenses into a long focal length lens and a short focal length lens;
regarding the short-focal-length lens, taking an image of a lens with the smallest focal length in the short-focal-length lens as a first reference image, and dividing imaging sections of field angles of other lenses in the short-focal-length lens, which correspond to the first reference image, into a first common visual area and a first non-common visual area;
and regarding the long-focus lens, taking the image of the lens with the minimum focal length in the long-focus lens as a second reference image, and dividing the imaging interval of the second reference image corresponding to the field angle of other lenses in the long-focus lens into a second common visual area and a second non-common visual area. In the second embodiment, four focal length lenses are used for illustration.
In addition, when the number of the focal length lenses is greater than or equal to 3, the step S4 may include:
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area in a first reference image by adopting a binocular different focal length stereoscopic vision reconstruction method;
acquiring angle positioning information of each target in a non-common visual area I in a reference image I in a perspective grid division mode;
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area II in a reference image II by adopting a binocular different-focal-length stereoscopic vision reconstruction method;
acquiring angle positioning information of each target in a non-common visual area II in a reference image II in a perspective grid division mode;
and converting the three-dimensional positioning information and the category label semantic information of all the targets in the common visual area of the first reference image and the second reference image and the angle positioning information of all the targets in the non-common visual area to a preset global coordinate system through a series of coordinate transformation.
The global coordinate system is a coordinate system where all targets in the constructed semantic map are located, and is different from imaginary coordinate systems such as an image coordinate system, a camera coordinate system and the like, the global coordinate system is a real existing place and can be established in a fixed place, and the global coordinate system is obtained through GPS-RTK measurement.
In addition, when step S5 is executed, all the acquired three-dimensional positioning information and category label semantic information, all the angle positioning information, and the three-dimensional point cloud data of the laser radar in the multi-view imaging device are fused to construct a three-dimensional semantic map for the smart vehicle.
Of course, if the method in fig. 1 does not execute step S5, the reference image two may be used as a reference image, and the three-dimensional positioning information and the category label semantic information of all the objects in the common view region one are three-dimensionally reconstructed and positioned in the common view region two, so as to obtain the three-dimensional positioning information and the category label semantic information of all the objects in the common view region in the reference image; and acquiring the angle positioning information of the target in each non-common visual area in a perspective grid division mode.
Therefore, the method of the embodiment can effectively reduce the calculated amount of data processing of the visual point cloud matching positioning algorithm and improve the target positioning precision, not only can realize the accurate positioning of targets with different fields of view and different distances, but also can greatly improve the working efficiency of data processing of the intelligent vehicle visual point cloud matching positioning algorithm, has the advantages of large field angle, high positioning precision and high processing speed, and can meet the engineering application requirements of the current vision-based unmanned technology on an embedded processor.
It should be noted that the semantic map constructed by the method of the present embodiment includes the category and the three-dimensional position information of the detection target. The category information is not filled in by specific color, shape and texture features, but is embodied by a boundary box with a label (the target category can be judged by semantic information through the label on the target boundary box). In addition, the three-dimensional information in the semantic map does not mean three-dimensional information for each feature point, but means average three-dimensional information in the detection target bounding box region. It is understood that the detection target image bounding box is treated as a centroid point. Therefore, the aim of real-time map construction is achieved by performing dimensionality reduction on the mass point cloud information of the high-precision map, namely only the essential meaning of the target is kept on the semantic map without accurately restoring the appearance shape.
Example two
The embodiment of the invention provides an intelligent vehicle target real-time detection and positioning method based on bionic vision, aiming at the engineering requirement of real-time detection and positioning of an unmanned embedded platform, and aiming at greatly improving the target detection and positioning efficiency of an unmanned visual information processing system, and is shown in figure 9.
The method provided by the embodiment of the invention simulates an insect compound eye imaging system to form a multi-target imaging system to enlarge the visual field range, classifies and identifies multiple targets by using a YOLOv5 deep learning target detection method, and segments and marks interested target positions in the whole image one by one; then, dividing the overlapped and non-overlapped areas of the sub-view field image surfaces of the multi-view imaging system into common-view and non-common-view areas, and respectively carrying out data coordinate conversion on the common-view and non-common-view areas.
Specifically, for a two-view image plane overlapping region, namely a common-view region: by utilizing a position relation formula between pixel coordinates corresponding to images (called as sub image surfaces) of each focal length lens of a multi-view imaging device, a one-to-one mapping relation (shown as a formula 4) of multi-target areas divided from two images is quickly established, namely, feature point extraction areas of two target images to be matched are positioned in advance, and the detection target traversal global search area is reduced; then, carrying out feature point matching on the pre-positioned target image feature point extraction area by utilizing an SURF algorithm; and finally, accurately positioning the detection target by adopting a multi-view different focal length three-dimensional reconstruction algorithm (as shown in the following formula 2).
Aiming at the non-overlapped area of the two view image planes, namely the non-common view area: dividing the first visual angle of the camera into four intervals, namely an upper interval, a lower interval, a left interval and a right interval; dividing each interval into angle perspective grids to roughly position the direction angle of the detection target; the subsequent laser radar can be conveniently and accurately positioned.
The method effectively reduces the calculated amount of data processing of the vision positioning algorithm, improves the positioning precision of the targets at different distances, enhances the real-time performance and the reliability of the vision system, and meets the engineering requirement of the unmanned technology based on vision.
In order to better understand the technical solution of the embodiment of the present invention, the above steps are explained in detail with reference to fig. 9:
step A1: as shown in fig. 2, image data is acquired for each vision sensor in the multi-view imaging device. That is, images of different focal length lenses in the multi-view imaging device of the intelligent vehicle are acquired in real time.
In this embodiment, four cameras with different focal lengths of 8mm, 12mm, 16mm and 25mm are adopted to form a multi-view imaging system with different fields of view: short-focus lenses of 8mm and 12mm are used to monitor close-range objects of 5-35m, for example pedestrians and vehicles, within a 90 deg. field of view. 16mm and 25mm telephoto lenses are used to monitor long-range objects of 10-60m within a 40 ° field angle, such as traffic lights and speed-limiting signs.
Step A2: and detecting the type and position information of each target in the image of each lens.
In the embodiment, in order to reduce the calculation amount between two or more frames of image point cloud data, a YOLOv5 target real-time detection algorithm is used to obtain the position and size information and the category information of different types of targets in each image.
Different types of targets can comprise dynamic targets such as pedestrians and vehicles and static targets such as lane lines and traffic lights in the image, each recognition target is divided in the image to be marked, irrelevant areas in the image can be removed in the process, and the number of calculation of feature points of the whole image is reduced. Fig. 3 is a graph showing the effect of YOLOv5 algorithm on real-time target detection and segmentation.
Step A3: and dividing the coordinates of each camera image plane corresponding to each overlapped area and each non-overlapped area of the lens view field of the multi-view imaging device into a common-view area and a non-common-view area.
Specifically, the image of the lens with the minimum focal length is taken as a reference image, and the reference image is divided into a common visual area and a non-common visual area according to the homogeneous coordinate system transformation relation of object space corresponding to lenses with different focal lengths.
In the present embodiment, different focal length lenses are divided into a long focal length lens and a short focal length lens; regarding the short-focal-length lens, taking an image of a lens with the smallest focal length in the short-focal-length lens as a first reference image, and dividing an imaging interval of the first reference image corresponding to the field angle of other lenses in the short-focal-length lens into a first common visual area and a first non-common visual area, as shown in fig. 4;
for the long-focus lens, the image of the lens with the smallest focal length in the long-focus lens is taken as a second reference image, and the imaging interval of the second reference image corresponding to the field angle of the other lenses in the long-focus lens is divided into a second common view area and a second non-common view area, as shown in fig. 5.
It can be understood that the known short-focus and long-focus cameras have the CMOS size of 1/2.7 and the resolution of 1920 × 1080 pixels, and as shown in FIG. 4, a common-view region I and a non-common-view region I in a 90-degree large-view-field image plane corresponding to 8mm and 12mm short-focus lens groups are shown. The coordinates of the end points of the common view rectangular area are (-666,360), (-666, -360), (638,360), (638, -360), the first common view area respectively accounts for 67.6% and 66.7% of the coordinates of the whole image plane in the X-axis direction and the Y-axis direction and accounts for 45.28% of the image plane area of the whole short-focus large-field of view, and the first common view area is not strictly symmetrical in the horizontal direction but slightly shifted to the X-axis direction due to the base line distance of the two cameras.
Fig. 5 shows a second common-view region and a second non-common-view region in the small-field-of-view image plane of 40 degrees corresponding to the 16mm and 25mm telephoto lens groups. The coordinates of the end points of the common view rectangular region are (-641,345), (-641, -345), (610,345), (610, -345), and the second common view region respectively accounts for 65.2% and 63.9% of the coordinates of the whole image plane in the X-axis direction and the Y-axis direction and accounts for 41.63% of the area of the image plane of the whole tele small field of view.
Specifically, the image coordinate system 1 (x)1,y1) The spatial transformation relationship with the world coordinate system (X, Y, Z) can be expressed as:
Figure GDA0003061129440000161
image coordinate system 2 (x)2,y2) The spatial transformation relationship with the world coordinate system (X, Y, Z) can be expressed as:
Figure GDA0003061129440000162
the image coordinate system 1 and the image coordinate system 2 take pictures of the same target M (X, Y, Z) and image, and the relation between the image surface coordinates X and Y of the two camera image coordinate systems can be deduced by knowing the normalized focal length alpha and beta values.
For example, given that the resolutions of two cameras are 1280 × 1024, the sub-image plane can be divided into a non-common view region and a common view region according to the spatial transformation relationship between the two camera coordinate systems and the same world coordinate system (i.e. the derivation result of equation 3).
Step A4: and aiming at the common visual area, the SURF characteristic point extraction and matching algorithm is adopted to realize target characteristic point matching in the designated area, and finally, the binocular disparity focus stereoscopic vision algorithm is utilized to carry out three-dimensional reconstruction on the detected target (as shown in the following formula 2).
The ideal parallel binocular camera imaging model with different focal lengths can be deduced according to the geometrical relationship between the image plane coordinates of the two cameras and the space point M in the image plane of the two cameras in the image plane of the.
Assuming that the optical axes of the left camera and the right camera are parallel (the Y-axis value is the same), since the focal lengths are different, the image planes are not on one plane, and the effective focal lengths corresponding to the 12mm lens and the 8mm lens are respectively set as f1And f2From the triangle similarity principle, the distance (depth value) formula of the spatial point M from the camera can be obtained as follows:
Figure GDA0003061129440000171
equation (1) above illustrates that if the target depth Z is to be calculated, the focal length f of the two cameras must be known1And f2And a base line distance l, the prior parameters can be obtained by calibrating the cameras, and the corresponding relation x of pixel coordinates in the two cameras needs to be known1And x2. The three-dimensional coordinates of the target point M can be accurately reconstructed through the parameters:
Figure GDA0003061129440000172
the position relation between the pixel coordinates corresponding to the two cameras can also be obtained according to the formula (1):
Figure GDA0003061129440000173
wherein alpha is1And alpha2,β1And beta2Which respectively represent scale factors, also referred to as normalized focal lengths, which are the horizontal axis u and the vertical axis v, respectively, of the image. When alpha is21,β21From the above formula (3), the coordinate offset y of the pixel corresponding to the camera with 12mm focal length can be known1Offset y of corresponding pixel coordinate of camera with focal length larger than 8mm2I.e. the same object moves faster in the image plane of the tele camera than in the short camera. Meanwhile, the offset of the image plane coordinates of the two cameras in the X-axis direction is also related to the ratio of the baseline distance l to the depth Z, so that the pixel coordinate X corresponding to the d/Z value to the (3) type two-phase machine needs to be further researched1And x2Influence of the positional relationship.
Let Dx be alpha2d/Z is the error value of the image surface position of the pixel coordinate corresponding to the two cameras, the working distance Z of the 8mm lens is 5-25 m, and the normalized focal length alpha1=β1=4000pixel,α2=β22666pixel, baseline distance l 0.05 m. The variation curve of the pixel coordinate image plane position error Dx between the two cameras of 8mm and 12mm along with the increase of the test distance L is shown in fig. 7. When the monitoring distance is more than 5m, the Dx error value is within 26 pixels, and since we only focus on the average value of the pixel coordinates in the target detection area, the pixel coordinate x of the d/Z value at this time can be ignored2The influence, denoted as constant D, can be further approximated by equation (3):
Figure GDA0003061129440000181
as shown in FIG. 8, the average position information x under the pixel coordinate system of the camera with the focal length of 12mm according to the detected object can be obtained by the formula (4)1The estimated approximate position x of the corresponding target under the pixel coordinate system of the 8mm focal camera2The method can avoid the linear search process of the one-to-one correspondence relationship of the multi-target areas and improve the working efficiency of the feature point matching algorithm. That is to say, the detection target can be effectively reduced from traversing the global search area, and the extraction and matching of the feature points are reducedAnd the calculation amount of the point cloud reconstruction algorithm is calculated, so that the matching index process is accelerated, and the calculation complexity of target feature point matching is reduced.
Step A5: and aiming at the non-common visual area, acquiring the angle positioning information of the target in each non-common visual area in a perspective grid division mode.
For example, for a non-common-view region of a multi-camera imaging device, as shown in fig. 4, the non-common-view region is divided into four regions, i.e., an upper region, a lower region, a left region, a right region, and a left region, according to a first angle of view of a camera, and each region is divided into perspective angle grids to roughly position a direction angle of a detection target, so that subsequent laser radar point cloud data can be conveniently accurately positioned and clustered.
For example, the common viewing zone is not strictly symmetrical in the horizontal direction, but is slightly shifted in the X-axis direction. The non-common visual area is divided into four parts according to the upper part, the lower part, the left part and the right part, the corresponding space visual angles are respectively [7.7 degrees ], 12.4 degrees ], [ -7.7 degrees, -12.4 degrees ], [ -13.5 degrees, -19.3 degrees ], [13.5 degrees and 20.3 degrees ], each sub visual angle is divided into perspective grids, the direction of a detection target can be roughly estimated, and therefore the subsequent laser radar point cloud data can be conveniently and accurately positioned and clustered.
It should be noted that, in the embodiment of the present invention, the angle value is used for positioning the detection target in the non-common view area in real time instead of the three-dimensional point cloud, so as to accelerate the processing speed of sensing the image target information. For example, the space view fields corresponding to the upper and lower regions are respectively used for detecting the traffic light and the approximate direction angle of the anti-collision barrel, and prior category and direction information is provided for subsequent clustering and positioning of laser radar point cloud data.
Step A6: static targets such as vehicles and pedestrians and dynamic target three-dimensional data such as traffic lights and lane lines, which are detected and positioned by the multi-view imaging device, are fused into the laser radar point cloud data, and a real-time three-dimensional semantic map is constructed, so that the bionic vision-based unmanned technology is applied to an embedded processing platform in an engineering mode.
It should be noted that the three-dimensional semantic map to be constructed in this embodiment is different from the point cloud semantic map in the prior art, in this embodiment, only the bounding box and the category of the target need to be labeled, and the point cloud data does not need to be used to reconstruct the specific color and shape features of the target, as shown in fig. 11.
The method can construct the semantic map in real time, and the map constructed by the method is the semantic map constructed on line, so that compared with the point cloud and manual labeling-based off-line map construction method in the prior art, the method is simple and convenient to operate, does not need subsequent manual labeling, can meet the engineering application of an embedded processor, and has strong practicability.
EXAMPLE III
According to another aspect of the embodiments of the present invention, there is also provided an intelligent vehicle driving system, including: the multi-view image forming apparatus comprises a control device and a multi-view image forming device connected with the control device, wherein the multi-view image forming device comprises: at least one short focal length lens, at least one long focal length lens;
and after receiving the images collected by the short-focal-length lens and the long-focal-length lens, the control device adopts the intelligent vehicle real-time detection and positioning method of the first embodiment or the second embodiment to construct the three-dimensional semantic map of the intelligent vehicle on line in real time.
In practical application, the short-focus lens of the multi-view imaging device in this embodiment is an 8 cm-focus lens and a 12 cm-focus lens, and the long-focus lens is a 16 cm-focus lens and a 25 cm-focus lens;
the 8cm and 12cm short-focus lenses are used for monitoring short-distance targets of 5-35m in a large field angle of 90 degrees,
16cm and 25cm telephoto lenses are used to monitor long-distance targets 10-60m within a small field angle of 40 deg..
In practical applications, the intelligent vehicle may be an unmanned vehicle or an autonomous vehicle, and the intelligent vehicle may include the intelligent vehicle driving system. Specifically, above-mentioned 8cm focus lens, 12cm focus lens, 16cm focus lens and 25cm focus lens are placed side by side in same horizontal plane and in the place ahead of vehicle cab, and wherein long focus and short focus lens alternate the setting, and the interaxial spacing of adjacent short focus lens is 12 cm.
The intelligent vehicle driving system can build a semantic map on line in real time, realize engineering application and improve practicability.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third and the like are for convenience only and do not denote any order. These words are to be understood as part of the name of the component.
Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims (9)

1. A bionic-vision intelligent vehicle target real-time detection and positioning method is characterized by comprising the following steps:
s1, acquiring images of different focal length lenses in the multi-view imaging device of the intelligent vehicle in real time;
step S2, detecting the type and position information of each target in the image of each lens;
step S3, dividing the reference image into a common visual area and a non-common visual area according to the homogeneous coordinate system transformation relation of object space corresponding to the lenses with different focal lengths by taking the image of the lens with the smallest focal length as the reference image;
step S4, aiming at the common visual area of each image, performing three-dimensional reconstruction and positioning on the target in the common visual area by adopting a binocular different focal length stereoscopic vision reconstruction method to obtain three-dimensional positioning information and category label semantic information of all targets in the common visual area in the reference image;
aiming at a non-common view area of a reference image, dividing an upper area, a lower area, a left area, a right area and a left area according to a first visual angle of a camera device to which the image belongs, and acquiring angle positioning information of a target in each non-common view area in a perspective grid dividing mode;
the three-dimensional positioning information, the category label semantic information and the angle positioning information corresponding to the non-common visual area construct a vector map with semantic information for the intelligent vehicle;
the step S4 includes:
s4-1, determining and establishing a one-to-one mapping relation of coordinates of each target of the multi-target image in each image according to a two-image-surface coordinate relation (A1);
Figure FDA0003061129430000011
α1and alpha2,β1And beta2Scale factors representing the horizontal axis u and the vertical axis v of the image, also called normalized focal length, respectively, and l is the baseline distance;
s4-2, detecting the depth information Z value of the target, namely the space point M, in the three-dimensional space by means of a plurality of cameras with different resolutions and focal lengths and according to a binocular parallel vision model;
specifically, the coordinates (X, Y, Z) of the spatial point M are:
Figure FDA0003061129430000021
wherein, the focal length f of the two-phase camera lens1、f2And the base line distance l belongs to a pre-calibrated parameter, and the corresponding relation x of the pixel coordinates is obtained through a formula (A1)1、y1And x2、y2
And S4-3, reconstructing the detection target in the common view area of the reference image according to the three-dimensional coordinate relation (A2) of the space point M, and marking the average depth value of the detection target area.
2. The method according to claim 1, wherein the step S2 includes:
detecting the type and position information of each target in the image of each lens by adopting a YOLOv5 target real-time detection algorithm;
and/or, the target comprises: traffic lights, speed limit signs, pedestrians, small animals, vehicles or lane lines;
the category and location information includes: position information, size information, and category information of the object in each image.
3. The method according to claim 1, wherein the step S3 includes:
when the number of the focal length lenses is more than or equal to 3, dividing different focal length lenses into a long focal length lens and a short focal length lens;
regarding the short-focal-length lens, taking an image of a lens with the smallest focal length in the short-focal-length lens as a first reference image, and dividing imaging sections of field angles of other lenses in the short-focal-length lens, which correspond to the first reference image, into a first common visual area and a first non-common visual area;
and regarding the long-focus lens, taking the image of the lens with the minimum focal length in the long-focus lens as a second reference image, and dividing the imaging interval of the second reference image corresponding to the field angle of other lenses in the long-focus lens into a second common visual area and a second non-common visual area.
4. The method according to claim 3, wherein the step S4 includes:
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area in a first reference image by adopting a binocular different focal length stereoscopic vision reconstruction method;
acquiring angle positioning information of each target in a non-common visual area I in a reference image I in a perspective grid division mode;
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area II in a reference image II by adopting a binocular different-focal-length stereoscopic vision reconstruction method;
acquiring angle positioning information of each target in a non-common visual area II in a reference image II in a perspective grid division mode;
accordingly, the method further comprises step S5,
step S5 includes:
and fusing all the acquired three-dimensional positioning information, category label semantic information, all the angle positioning information and the three-dimensional point cloud data in the multi-view imaging device to construct a three-dimensional semantic map for the intelligent vehicle.
5. The method according to claim 3, wherein the step S4 includes:
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area in a first reference image by adopting a binocular different focal length stereoscopic vision reconstruction method; acquiring angle positioning information of each target in a non-common visual area I in a reference image I in a perspective grid division mode;
acquiring three-dimensional positioning information and category label semantic information of all targets in a common visual area II in a reference image II by adopting a binocular different-focal-length stereoscopic vision reconstruction method; acquiring angle positioning information of each target in a non-common visual area II in a reference image II in a perspective grid division mode;
and transferring the three-dimensional positioning information and the category label semantic information of all the targets in the common visual area of the first reference image and the second reference image and the angle positioning information of all the targets in the non-common visual area to a preset global coordinate system through coordinate transformation.
6. The method according to claim 1, wherein the step S3 includes:
if it is detected in step S2 that the same target has corresponding corner points a, b, c, d in the image of the lens with any focal length; the corresponding corner points a ', b', c ', d' existing in the image of the lens with the smallest focal length;
when the coordinate systems of the two lenses are not coincident and/or the focal lengths of the lenses are different, based on the homogeneous coordinate system transformation relation of the object image space, taking the image of the lens with the minimum focal length as a reference image and dividing the reference image into a square-back area consisting of eight angular points a, b, c and d and a ', b', c 'and d';
wherein, the central square of the square-shaped region is an overlapped imaging region as a common visual area;
the central square-shaped areas a, b, c and d are boundary areas a ', b', c 'and d' from the square-shaped areas a, b, c and d to the square-shaped areas, and non-overlapped imaging areas are used as non-common view areas.
7. The method of claim 4, wherein the short focal length lenses are 8mm focal length lenses and 12mm focal length lenses, and the long focal length lenses are 16mm focal length lenses and 25mm focal length lenses;
8mm and 12mm short-focal-length lenses are used for monitoring short-distance targets of 5-35m in a large field angle of 90 degrees,
the 16mm and 25mm long-focus lenses are used for monitoring long-distance targets of 10-60m in a small field angle of 40 degrees.
8. The utility model provides an intelligence car driving system which characterized in that includes: the multi-view image forming apparatus comprises a control device and a multi-view image forming device connected with the control device, wherein the multi-view image forming device comprises: at least one short focal length lens, at least one long focal length lens;
after the control device receives the images collected by the short-focal-length lens and the long-focal-length lens, a three-dimensional semantic map of the intelligent vehicle is constructed on line in real time by adopting the intelligent vehicle target real-time detection and positioning method of any one of claims 1 to 7.
9. An intelligent vehicle, characterized by comprising the intelligent vehicle driving system of claim 8.
CN202010721224.7A 2020-07-24 2020-07-24 Intelligent vehicle target real-time detection and positioning method based on bionic vision Active CN111914715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010721224.7A CN111914715B (en) 2020-07-24 2020-07-24 Intelligent vehicle target real-time detection and positioning method based on bionic vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010721224.7A CN111914715B (en) 2020-07-24 2020-07-24 Intelligent vehicle target real-time detection and positioning method based on bionic vision

Publications (2)

Publication Number Publication Date
CN111914715A CN111914715A (en) 2020-11-10
CN111914715B true CN111914715B (en) 2021-07-16

Family

ID=73281418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010721224.7A Active CN111914715B (en) 2020-07-24 2020-07-24 Intelligent vehicle target real-time detection and positioning method based on bionic vision

Country Status (1)

Country Link
CN (1) CN111914715B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419392A (en) * 2020-11-30 2021-02-26 广州博进信息技术有限公司 Method, apparatus and medium for calculating actual size of moving object based on machine vision
CN112699900A (en) * 2021-01-05 2021-04-23 东北林业大学 Improved traffic sign identification method of YOLOv4
CN112863183B (en) * 2021-01-14 2022-04-08 深圳尚桥信息技术有限公司 Traffic flow data fusion method and system
CN112926501A (en) * 2021-03-23 2021-06-08 哈尔滨理工大学 Traffic sign detection algorithm based on YOLOv5 network structure
CN113034671B (en) * 2021-03-23 2024-01-09 成都航空职业技术学院 Traffic sign three-dimensional reconstruction method based on binocular vision
CN112949595A (en) * 2021-04-01 2021-06-11 哈尔滨理工大学 Improved pedestrian and vehicle safety distance detection algorithm based on YOLOv5
CN113781539A (en) * 2021-09-06 2021-12-10 京东鲲鹏(江苏)科技有限公司 Depth information acquisition method and device, electronic equipment and computer readable medium
CN114022530B (en) * 2021-10-26 2024-09-10 华中科技大学 3D printing auxiliary method, system and application based on mixed reality technology
CN114445633B (en) * 2022-01-25 2024-09-06 腾讯科技(深圳)有限公司 Image processing method, apparatus and computer readable storage medium
CN114777672A (en) * 2022-04-29 2022-07-22 河北工程大学 Three-dimensional measurement equipment and method based on different-resolution visual fusion of multi-ocular structured light
CN115271200B (en) * 2022-07-25 2023-05-30 仲恺农业工程学院 Intelligent coherent picking system for famous tea
CN115797459B (en) * 2022-08-29 2024-02-13 南京航空航天大学 Binocular vision system ranging method with arbitrary focal length combination
CN115272618B (en) * 2022-09-20 2022-12-20 深圳市其域创新科技有限公司 Three-dimensional grid optimization method, equipment and storage medium
CN117876608B (en) * 2024-03-11 2024-06-28 魔视智能科技(武汉)有限公司 Three-dimensional image reconstruction method, three-dimensional image reconstruction device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428275A (en) * 2013-07-30 2013-12-04 苏州两江科技有限公司 Indoor object activity routine tracking method based on WSN
CN108156419A (en) * 2017-12-22 2018-06-12 湖南源信光电科技股份有限公司 More focal length lens linkage imaging camera machine system based on multiple features combining and Camshift algorithms
CN108805910A (en) * 2018-06-01 2018-11-13 海信集团有限公司 More mesh Train-borne recorders, object detection method, intelligent driving system and automobile
CN108830905A (en) * 2018-05-22 2018-11-16 苏州敏行医学信息技术有限公司 The binocular calibration localization method and virtual emulation of simulating medical instrument cure teaching system
CN109461211A (en) * 2018-11-12 2019-03-12 南京人工智能高等研究院有限公司 Semantic vector map constructing method, device and the electronic equipment of view-based access control model point cloud
CN110285793A (en) * 2019-07-08 2019-09-27 中原工学院 A kind of Vehicular intelligent survey track approach based on Binocular Stereo Vision System
CN110728715A (en) * 2019-09-06 2020-01-24 南京工程学院 Camera angle self-adaptive adjusting method of intelligent inspection robot
CN110866863A (en) * 2018-08-27 2020-03-06 天津理工大学 Automobile A-pillar perspective algorithm

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2545658A (en) * 2015-12-18 2017-06-28 Canon Kk Methods, devices and computer programs for tracking targets using independent tracking modules associated with cameras
CN111435539A (en) * 2019-01-15 2020-07-21 苏州沃迈智能科技有限公司 Multi-camera system external parameter calibration method based on joint optimization
CN110095111A (en) * 2019-05-10 2019-08-06 广东工业大学 A kind of construction method of map scene, building system and relevant apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428275A (en) * 2013-07-30 2013-12-04 苏州两江科技有限公司 Indoor object activity routine tracking method based on WSN
CN108156419A (en) * 2017-12-22 2018-06-12 湖南源信光电科技股份有限公司 More focal length lens linkage imaging camera machine system based on multiple features combining and Camshift algorithms
CN108830905A (en) * 2018-05-22 2018-11-16 苏州敏行医学信息技术有限公司 The binocular calibration localization method and virtual emulation of simulating medical instrument cure teaching system
CN108805910A (en) * 2018-06-01 2018-11-13 海信集团有限公司 More mesh Train-borne recorders, object detection method, intelligent driving system and automobile
CN110866863A (en) * 2018-08-27 2020-03-06 天津理工大学 Automobile A-pillar perspective algorithm
CN109461211A (en) * 2018-11-12 2019-03-12 南京人工智能高等研究院有限公司 Semantic vector map constructing method, device and the electronic equipment of view-based access control model point cloud
CN110285793A (en) * 2019-07-08 2019-09-27 中原工学院 A kind of Vehicular intelligent survey track approach based on Binocular Stereo Vision System
CN110728715A (en) * 2019-09-06 2020-01-24 南京工程学院 Camera angle self-adaptive adjusting method of intelligent inspection robot

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Proposal for combining mapping, localization and target recognition;Gronwall,C et al.;《Conference on Electro-Optical Remote Sensing,Photonic Technologies, and Applications IX》;20150922;96490G.1-96490G.11 *
Angle precision study on dual-aircraft cooperatively detecting remote target by passive locating method;Gao Xiang et al.;《2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)》;20111027;第1-5页 *
基于多相机的同步定位与建图方法研究;杨理欣;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200615;I140-167 *
基于局部单应性矩阵的图像拼接与定位算法研究;迟龙云 等;《导航定位与授时》;20200531;第7卷(第3期);第62-69页 *

Also Published As

Publication number Publication date
CN111914715A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111914715B (en) Intelligent vehicle target real-time detection and positioning method based on bionic vision
CN113819890B (en) Distance measuring method, distance measuring device, electronic equipment and storage medium
CN107945220B (en) Binocular vision-based reconstruction method
Zhang et al. Intelligent collaborative localization among air-ground robots for industrial environment perception
CN110176032B (en) Three-dimensional reconstruction method and device
CN107741234A (en) The offline map structuring and localization method of a kind of view-based access control model
CN110782524A (en) Indoor three-dimensional reconstruction method based on panoramic image
CN108519102B (en) Binocular vision mileage calculation method based on secondary projection
CN106826833A (en) Independent navigation robot system based on 3D solid cognition technologies
Salman et al. Distance measurement for self-driving cars using stereo camera
CN105043350A (en) Binocular vision measuring method
Perrollaz et al. Probabilistic representation of the uncertainty of stereo-vision and application to obstacle detection
CN108615244A (en) A kind of image depth estimation method and system based on CNN and depth filter
CN109579825A (en) Robot positioning system and method based on binocular vision and convolutional neural networks
Nagy et al. Online targetless end-to-end camera-LiDAR self-calibration
CN112150518B (en) Attention mechanism-based image stereo matching method and binocular device
CN113256699B (en) Image processing method, image processing device, computer equipment and storage medium
CN116295412A (en) Depth camera-based indoor mobile robot dense map building and autonomous navigation integrated method
CN111998862A (en) Dense binocular SLAM method based on BNN
Alcantarilla et al. Large-scale dense 3D reconstruction from stereo imagery
Wen et al. Roadside hd map object reconstruction using monocular camera
Park et al. Global map generation using LiDAR and stereo camera for initial positioning of mobile robot
Yong-guo et al. The navigation of mobile robot based on stereo vision
CN108090930A (en) Barrier vision detection system and method based on binocular solid camera
CN114199205B (en) Binocular Ranging Method Based on Improved Quadtree ORB Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A real-time detection and localization method for intelligent vehicle targets based on biomimetic vision

Effective date of registration: 20230920

Granted publication date: 20210716

Pledgee: Bank of China Limited Langfang Branch

Pledgor: Langfang Heyi Life Network Technology Co.,Ltd.

Registration number: Y2023980057780

PE01 Entry into force of the registration of the contract for pledge of patent right