CN117475207B

CN117475207B - 3D-based bionic visual target detection and identification method

Info

Publication number: CN117475207B
Application number: CN202311410176.XA
Authority: CN
Inventors: 曹金刚; 李�荣
Original assignee: Jiangsu Xingshen Technology Group Co ltd
Current assignee: Jiangsu Xingshen Technology Group Co ltd
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2024-10-15
Anticipated expiration: 2043-10-27
Also published as: CN117475207A

Abstract

The invention discloses a 3D bionic visual target detection and identification method, which relates to the technical field of computer vision, and is used for acquiring 3D sensor data of a target environment, further extracting surface features of a target object through a local feature descriptor and establishing a 3D scene model; dividing a target object into a plurality of local areas, acquiring a surface feature descriptor of each local area, matching and positioning the surface feature descriptor of each local area, and further acquiring a preliminary target attitude and a target shape; training a target classifier through deep learning, inputting a surface feature descriptor of a local area into the target classifier for target recognition, performing overall fitting on a recognition result of a target object, further obtaining a final 3D space position and a final gesture of the target object, and performing database entry and encryption storage on the final 3D space position and the final gesture of the target object, thereby realizing detection and recognition of the target.

Description

3D-based bionic visual target detection and identification method

Technical Field

The invention relates to the technical field of computer vision, in particular to a 3D bionic visual target detection and identification method.

Background

The 3D bionic vision-based target detection and recognition method is a target detection and recognition technology which is inspired by a biological vision system. The method imitates the information processing mode of eyes and brains in a biological vision system, and realizes detection and identification of targets through perception and analysis of three-dimensional scenes.

How to efficiently and accurately extract the surface features of the target object required for target detection and recognition, how to optimize the subsequent modeling after the surface features are extracted, and how to improve the robustness of target recognition after the modeling are all the problems to be considered.

Disclosure of Invention

In order to solve the problems, the invention aims to provide a 3D bionic visual target detection and identification method.

The aim of the invention can be achieved by the following technical scheme: the 3D bionic visual target detection and identification method comprises the following steps:

Step S1: acquiring 3D sensor data of a target environment, extracting surface features of a target object through a local feature descriptor, and establishing a 3D scene model;

Step S2: dividing a target object into a plurality of local areas, acquiring a surface feature descriptor of each local area, matching and positioning the surface feature descriptor of each local area, and further acquiring a preliminary target attitude and a target shape;

step S3: training a target classifier through deep learning, inputting a surface feature descriptor of a local area into the target classifier for target recognition, performing overall fitting on recognition results of a target object, further obtaining a final 3D space position and a final posture of the target object, and performing database entry and encryption storage on the final 3D space position and the final posture of the target object.

Further, the process of acquiring 3D sensor data of the target environment includes:

Selecting a target environment, arranging different types of sensing acquisition equipment in the target environment, and acquiring corresponding 3D sensor data through the different types of sensing acquisition equipment, wherein the 3D sensor data comprises a target image, a depth map, a texture map and radar reflectivity, the radar reflectivity is R, R epsilon (0, 1), and the group number of the radar reflectivity is i, i=1, 2,3, … … and n, and n is a natural number larger than 0.

Further, the process of extracting the surface features of the target object through the local feature descriptors and further establishing the 3D scene model includes:

the local feature descriptors are used for extracting surface features of a target object, the surface features are edge feature data of the target object, the target image is associated with a corresponding target object, gradient directions and gradient strengths corresponding to the length dimension and the width dimension of the target object are obtained through HOG feature descriptors in the local feature descriptors, a transverse gradient strength interval and a longitudinal gradient strength interval are preset, corresponding transverse edge information and longitudinal edge information are obtained, edge feature data of the target object are generated through merging, texture images and depth images are processed through the HOG feature descriptors, corresponding texture information and depth information are obtained, and the edge feature data, the texture information and the depth information are respectively used as a first modeling parameter, a second modeling parameter and a third modeling parameter, so that a 3D scene model is built.

Further, the process of dividing the target object into a plurality of local areas and acquiring the surface feature descriptor of each local area includes:

Dividing a target object into a plurality of local areas, acquiring a surface feature descriptor of each local area, further acquiring a plurality of groups of gradient sequence values included in edge feature data, further acquiring one-dimensional vectors (a, b, c) corresponding to each group of gradient sequence values, presetting corresponding dimension intervals, respectively marking the dimension intervals as WD1, WD2 and WD3, if (a, b, c) epsilon WD1, marking the corresponding dimension interval of the corresponding local area, if (a, b, c) epsilon WD2, marking the corresponding dimension interval of the corresponding local area, and if (a, b, c) epsilon WD3, marking the corresponding dimension interval of the corresponding local area.

Further, the process of matching and positioning according to the surface feature descriptors of each local area to obtain the preliminary target pose and the target shape includes:

Corresponding gesture parameters and shape parameters exist in different dimension intervals, target gesture parameters and target shape parameters are preset and stored in a preset matching positioning library, and surface feature descriptors of different local areas are imported into the matching positioning library, so that corresponding area parameters I and area parameters II are obtained; the regional parameters are used for acquiring target gestures, and when the gesture parameters accord with the target gesture parameters in the matching positioning library, the target gesture corresponding to the target gesture parameters is endowed with a preliminary target gesture of the target object; the second region parameter is used for acquiring a target shape, and when the shape parameter accords with the target shape parameter in the matching positioning library, the target shape corresponding to the target shape parameter is endowed with a preliminary target shape of the target object; if the gesture parameters and the shape parameters do not accord with the corresponding target gesture parameters and the corresponding target shape parameters, the matching and the positioning of the local area are not performed.

Further, training the target classifier through deep learning, and performing the target recognition process further includes:

setting a target classifier, acquiring an image dataset corresponding to a target object, dividing the image dataset into a corresponding training set and a corresponding testing set according to a preset proportion, inputting the training set into the target classifier for training, acquiring a training frequency, inputting the testing set into the target classifier for testing, acquiring the recognition condition of the target object by the target classifier, wherein the recognition condition comprises successful recognition and failure recognition, if the recognition is successful, marking a correct mark symbol for the target classifier, otherwise, marking an error mark symbol, acquiring the number of times of marking the correct mark symbol, marking the number of times as n1, acquiring the number of times of marking the error mark symbol, marking the number of times as n2, acquiring a mark ratio, marking the number as U, and marking U=n1/n 2, presetting a training stopping threshold, marking the number as D _{Threshold value}, if U is more than or equal to D _{Threshold value}, stopping training of the target classifier, otherwise, increasing the training frequency Num1, improving the duty ratio of the training set, and continuing training the target classifier until U is more than or equal to D _{Threshold value}; and inputting the surface feature descriptors corresponding to the target objects into the trained target classifier to perform target recognition, and further acquiring space related information corresponding to a plurality of local areas, wherein the space related information comprises space coordinates, space scales and space projections.

Further, the overall fitting is performed on the recognition result of the target object, and the process of obtaining the final 3D spatial position and the final posture of the target object includes:

And acquiring spatial related information corresponding to the identification result of the target object, respectively marking the spatial coordinates and the spatial scale values in the spatial related information as phi ₁ and phi ₂, presetting a coordinate dimension cluster and a spatial scale classification interval, respectively marking as tau ₁ and tau ₂, acquiring spatial projection, further generating a mapping space with corresponding size of the spatial projection, and further generating a final 3D spatial position and a final posture of the target object by integral fitting according to phi ₁、ψ₂、τ₁ and tau ₂.

Further, the process of performing database entry and encryption storage on the final 3D space position and the final gesture comprises the following steps:

Presetting a database, acquiring a read right and a write right corresponding to the database, further reading a final 3D space position and a final posture through the read right, synchronously generating a read packet, associating a read progress for the read packet, acquiring the write right when the progress value is 100%, writing the read packet into the database, and further inputting the database;

Numbering a plurality of recorded data packets, setting an encryption time interval and an encryption frequency number, further constructing an encryption rule of each data packet, carrying out corresponding encryption through the encryption rule corresponding to each data packet, further generating an encryption data packet corresponding to each data packet, and storing the encryption data packet.

Compared with the prior art, the invention has the beneficial effects that: the method has the advantages that the surface features of the target object are extracted through the local feature descriptors, the extracted data are more refined and have pertinence, the transverse gradient and the longitudinal gradient of the target image are comprehensively considered in the 3D scene modeling process, modeling accuracy is improved to a certain extent due to the consideration of two dimensions, the surface feature descriptors are used for matching and positioning, the initial target gesture and the initial target shape are further generated, further training recognition is carried out through deep learning, the accuracy of target recognition is improved, the final 3D space position and the final gesture generated by target recognition are read and written into the database after the corresponding authority of the database is acquired, encryption time intervals and encryption frequency numbers are set, different encryption rules are formed, and the safety of the data is improved to a certain extent.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

As shown in fig. 1, the 3D bionic visual target detection and recognition method comprises the following steps:

It should be further noted that, in the implementation process, the process of acquiring the 3D sensor data of the target environment includes:

Selecting a target environment, and arranging different types of sensing acquisition equipment in the target environment, wherein the types of the sensing acquisition equipment comprise a laser radar, a panoramic camera and a depth camera, so that corresponding 3D sensor data are acquired through the different types of sensing acquisition equipment;

the 3D sensor data comprises a target image, a depth map, a texture map and radar reflectivity, a panoramic camera is used for collecting the target image and the texture map, a depth camera is used for collecting the depth map, a laser radar is used for collecting a plurality of groups of radar reflectivity, the radar reflectivity is R, and R epsilon (0, 1);

The number of the radar reflectivity groups is i, i=1, 2,3, … … and n are included, wherein n is a natural number larger than 0, the unit of the radar reflectivity is decibel (dB), the closer the value of R is to 1, the higher the reflectivity of an object detected by the laser radar is, otherwise, the lower the reflectivity is, and the material, the shape or other information of the related target object can be obtained preliminarily through different reflectivities;

It should be further noted that, in the implementation process, the process of extracting the surface features of the target object through the local feature descriptors, and further establishing the 3D scene model includes:

acquiring 3D sensor data, and further acquiring corresponding target images, depth maps, texture maps and a plurality of groups of radar reflectivity R;

The local feature descriptors are HOG feature descriptors, the HOG feature descriptors are used for extracting surface features of target objects, the surface features are edge feature data of the target objects, the target images are associated with corresponding target objects, the target objects are marked as Ch, the length and the width of the target images are acquired and respectively marked as Ch-L _{Order of (A)} and Ch-W _{Order of (A)}, the gradient directions and the gradient intensities corresponding to the two dimensions of the length and the width are acquired through the HOG feature descriptors, and the gradient intensities of the two are respectively marked as Dz and Dc;

marking a gradient direction corresponding to the length dimension of the target image as a transverse gradient direction, marking a gradient direction corresponding to the width dimension as a longitudinal gradient direction, and presetting a transverse gradient intensity interval and a longitudinal gradient intensity interval which are respectively marked as omega ₁ and omega ₂;

if Dz epsilon Omega ₁, obtaining a numerical value of the gradient strength Dz corresponding to the transverse gradient direction, recording the numerical value as D ₁, setting a coding number I, recording B, obtaining a coding sequence string I, recording Hc ₁, recording Hc ₁＝D₁ XB, and converting the generated coding sequence string I Hc ₁ into binary as transverse edge information;

If Dc epsilon omega ₂, obtaining a value of the gradient strength Dc corresponding to the longitudinal gradient direction, recording the value as D ₂, setting a coding number II, recording the coding number II as C, obtaining a coding sequence string II, recording the coding sequence string II as Hc ₂, recording the coding sequence string II as Hc ₂＝D₂ XC, and converting the generated coding sequence string II as Hc ₂ into binary as longitudinal edge information;

Combining the transverse edge information and the longitudinal edge information to generate edge characteristic data of the target object, wherein the edge characteristic data comprises a plurality of groups of gradient sequence values, which are marked as T and T= < D ₁,D₂ >, converting each group of gradient sequence values into corresponding one-dimensional vectors, and marking the one-dimensional vectors as (a, b and c);

It should be noted that, the edge feature data is converted into the corresponding one-dimensional vector, so that the data dimension can be reduced, the influence of noise on the feature can be reduced, and the efficiency of subsequent processing can be improved.

Processing the texture map and the depth map through the HOG feature descriptor, and further obtaining corresponding texture information and depth information, wherein the texture information comprises texture directions, texture specifications and texture frequencies, and the depth information comprises distance information, spatial structure information, surface normal information and motion information;

Acquiring a texture frequency and a texture direction corresponding to the same texture specification, packaging to generate a first texture feature, acquiring the texture specification and the texture frequency corresponding to the same texture direction, packaging to generate a second texture feature, acquiring the texture direction and the texture specification corresponding to the same texture frequency, and packaging to generate a third texture feature;

the texture first feature, texture second feature, and texture third feature are collectively referred to as texture features;

the distance information corresponds to the distance values of a plurality of depth points of the target environment, the depth points are numbered, j is marked, j=1, 2,3, … … and m are included, wherein m is a natural number larger than 0, the distance value is JL [ j ], and j is the number corresponding to the depth points;

The spatial structure information corresponds to the surface shape and geometric characteristics of a plurality of depth points and the relative positions among other depth points, and the content of the surface normal information corresponding to the depth map is as follows: selecting any depth of field point, marking as P, obtaining a depth value of P, marking as D_P, selecting a neighborhood of a fixed value, taking P as a neighborhood center, further obtaining depth of field positions and depth values of other depth of field points in the neighborhood, marking other depth of field points as P ', calculating position vectors of P and P', normalizing the position vectors, further obtaining a surface normal N, and repeating the steps to obtain all the surface normals in the neighborhood;

The motion information is the motion speed and motion state of a plurality of target objects obtained through different radar reflectivities;

The edge characteristic data, texture information and depth information are respectively used as a first modeling parameter, a second modeling parameter and a third modeling parameter, so that a 3D scene model is constructed;

it should be further noted that, in the implementation process, the process of dividing the target object into a plurality of local areas and obtaining the surface feature descriptor of each local area includes:

Dividing a target object into a plurality of local areas, numbering the local areas, and if the number is v, v=1, 2,3, … … and e are available, wherein e is a natural number greater than 0, and obtaining a surface feature descriptor of each local area;

Further, a plurality of groups of gradient sequence values included in the edge feature data corresponding to the surface feature descriptors are obtained, one-dimensional vectors (a, b and c) corresponding to each group of gradient sequence values are obtained, corresponding dimension intervals are preset, and the dimension intervals comprise a dimension interval I, a dimension interval II and a dimension interval III;

Marking the dimension coordinate sets corresponding to the dimension section I, the dimension section II and the dimension section III as WD1, WD2 and WD3 respectively, marking the corresponding local area to correspond to the dimension section I if (a, b, c) epsilon WD1, marking the corresponding local area to correspond to the dimension section II if (a, b, c) epsilon WD2, and marking the corresponding local area to correspond to the dimension section III if (a, b, c) epsilon WD 3;

It should be further noted that, in the implementation process, the process of matching and positioning according to the surface feature descriptor of each local area, and further obtaining the preliminary target pose and the target shape includes:

Corresponding gesture parameters and shape parameters exist in different dimension intervals, target gesture parameters and target shape parameters are preset and stored in a preset matching positioning library, and surface feature descriptors of different local areas are imported into the matching positioning library, so that corresponding area parameters I and area parameters II are obtained;

the regional parameters are used for acquiring target gestures, and when the gesture parameters accord with the target gesture parameters in the matching positioning library, the target gesture corresponding to the target gesture parameters is endowed with a preliminary target gesture of the target object;

The second region parameter is used for acquiring a target shape, and when the shape parameter accords with the target shape parameter in the matching positioning library, the target shape corresponding to the target shape parameter is endowed with a preliminary target shape of the target object;

if the gesture parameters and the shape parameters do not accord with the corresponding target gesture parameters and the corresponding target shape parameters, the matching and the positioning of the local area are not performed;

It should be further noted that, in the implementation process, the target classifier is trained through deep learning, and the surface feature descriptor of the local area is input into the target classifier, so that the target recognition process includes:

Setting a target classifier, wherein the target classifier has corresponding use permission, acquiring the use permission, further training the target classifier through deep learning, acquiring an image dataset corresponding to a target object, and dividing the image dataset into a corresponding training set and a corresponding testing set according to the proportion of 2:8;

Inputting the training set into a target classifier for training, obtaining the total training times, namely Num ₁, taking the total training times Num ₁ as the training frequency of the target classifier, and inputting the testing set into the target classifier for testing;

Acquiring the recognition condition of the target classifier on the target object after the test is started, wherein the recognition condition comprises successful recognition and failure recognition, if the recognition is successful, marking a correct mark for the target classifier, marking as Sign1, and if the recognition is failed, marking an error mark for the target classifier, marking as Sign2;

Acquiring the number of times of Sign1, marking as n1, acquiring the number of times of Sign2, marking as n2, acquiring a labeling ratio, marking as U, having U=n1/n 2, presetting a training stopping threshold, marking as D _{Threshold value}, stopping training of the target classifier if U is more than or equal to D _{Threshold value}, otherwise, increasing the training frequency Num1, improving the duty ratio of the training set, and continuing training the target classifier until U is more than or equal to D _{Threshold value};

Inputting surface feature descriptors of a plurality of local areas corresponding to the target object into a trained target classifier, and further carrying out target recognition to generate different recognition results;

the identification result is space related information corresponding to a plurality of local areas, and the space related information comprises space coordinates, space dimensions and space projection;

It should be further noted that, in the implementation process, the process of performing overall fitting on the recognition result of the target object, and further obtaining the final 3D spatial position and the final pose of the target object includes:

Acquiring space related information corresponding to the identification result of the target object, respectively marking the space coordinates and the space scale values in the space related information as phi ₁ and phi ₂, and presetting a coordinate dimension cluster and a space scale classification interval which are respectively marked as tau ₁ and tau ₂;

Acquiring a space projection, further generating a mapping space with a corresponding size of the space projection, and further generating a final 3D space position and a final gesture of the target object according to the psi ₁、ψ₂、τ₁ and the tau ₂;

When τ ₁ = (0,0.4), if ψ ₁∈τ₁, a class of dimension coordinates are generated, when τ ₁ = (0.4,0.7), if ψ ₁∈τ₁, a class of dimension coordinates are generated, when τ ₁ = (0.7,1), if ψ ₁∈τ₁, three classes of dimension coordinates are generated;

τ ₂ takes on data1 and data2, if τ ₂ takes on data1, but ψ ₂∈τ₂, then the corresponding generated spatial scale is spatial pose one, otherwise, τ ₂ takes on data2, ψ ₂∈τ₂, then the corresponding generated spatial scale is spatial pose two;

Summarizing one-class dimension coordinates, two-class dimension coordinates and three-class dimension coordinates, inputting the three-class dimension coordinates into a mapping space, integrally fitting a final 3D space position, and integrally fitting a first space gesture and a second space gesture into a final gesture;

It should be further noted that, in the implementation process, the process of performing database entry and encryption storage on the final 3D spatial position and the final pose includes:

Acquiring a final 3D space position and a final posture, presetting a database, wherein the database has corresponding read permission and write permission, acquiring the read permission, further reading the final 3D space position and the final posture through the read permission, and synchronously generating a read packet;

Associating a reading progress with the reading packet, wherein the reading progress has a corresponding progress value, the initial value of the progress value is 0%, the value range of the progress value is 0% -100%, when the progress value is 100%, the reading packet is indicated to have all the final 3D space position and the final posture read, the writing authority is obtained, the reading packet is written into a database, and then the database is recorded;

After the database is recorded, numbering a plurality of recorded data packets, wherein the number is j, j=1, 2,3, … … and m, wherein m is a natural number greater than 0, an encryption time interval is set, the number is denoted as T [ j ], an encryption frequency is set, the number is denoted as F [ j ], j is the number corresponding to the data packets, and then an encryption rule of each data packet is constructed, the number is denoted as omega, and the number is denoted as omega= < T [ j ], F [ j ] >, and the meaning is as follows: the encryption rule corresponding to the data packet with the number j is that the encryption of the frequency corresponding to the F [ j ] encryption frequency is carried out in the T [ j ] encryption time interval;

Corresponding encryption is carried out through the encryption rule corresponding to each data packet, so that an encrypted data packet corresponding to each data packet is generated, and the encrypted data packet is stored;

The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims

1. The 3D bionic visual target detection and identification method is characterized by comprising the following steps of:

Step S3: training a target classifier through deep learning, inputting a surface feature descriptor of a local area into the target classifier for target recognition, performing overall fitting on recognition results of a target object, further acquiring a final 3D space position and a final posture of the target object, and performing database entry and encryption storage on the final 3D space position and the final posture of the target object;

The process of acquiring 3D sensor data of the target environment includes:

Selecting a target environment, arranging different types of sensing acquisition equipment in the target environment, and acquiring corresponding 3D sensor data through the different types of sensing acquisition equipment, wherein the 3D sensor data comprises a target image, a depth map, a texture map and radar reflectivity, the radar reflectivity is R, R is E (0, 1), the group number of the radar reflectivity is i, i=1, 2,3, … … and n, and n is a natural number larger than 0;

the process of extracting the surface features of the target object through the local feature descriptors and then establishing the 3D scene model comprises the following steps:

The local feature descriptors are used for extracting surface features of a target object, the surface features are edge feature data of the target object, the target image is associated with a corresponding target object, gradient directions and gradient strengths corresponding to the length dimension and the width dimension of the target object are obtained through HOG feature descriptors in the local feature descriptors, a transverse gradient strength interval and a longitudinal gradient strength interval are preset, corresponding transverse edge information and longitudinal edge information are obtained, edge feature data of the target object are generated through combination, texture images and depth images are processed through the HOG feature descriptors, corresponding texture information and depth information are obtained, and the edge feature data, the texture information and the depth information are respectively used as a first modeling parameter, a second modeling parameter and a third modeling parameter, so that a 3D scene model is built;

The process of dividing the target object into a plurality of local areas and acquiring the surface feature descriptor of each local area comprises the following steps:

Dividing a target object into a plurality of local areas, acquiring a surface feature descriptor of each local area, further acquiring a plurality of groups of gradient sequence values included in edge feature data, further acquiring one-dimensional vectors (a, b, c) corresponding to each group of gradient sequence values, presetting corresponding dimension intervals, respectively marking the dimension intervals as WD1, WD2 and WD3, if (a, b, c) epsilon WD1, marking the corresponding dimension interval of the corresponding local area, if (a, b, c) epsilon WD2, marking the corresponding dimension interval of the corresponding local area, and if (a, b, c) epsilon WD3, marking the corresponding dimension interval of the corresponding local area;

Matching and positioning according to the surface feature descriptors of each local area, and further obtaining a preliminary target gesture and a target shape, wherein the process comprises the following steps:

2. The 3D-based bionic visual object detection and recognition method according to claim 1, wherein training an object classifier through deep learning, and further performing the object recognition process comprises:

3. The 3D bionic visual target detection and recognition method according to claim 2, wherein the process of integrally fitting the recognition result of the target object to obtain the final 3D spatial position and the final pose of the target object comprises:

4. The 3D-based bionic visual target detection and recognition method according to claim 3, wherein the process of database entry and encrypted storage of the final 3D spatial position and final pose comprises: