CN106228162B

CN106228162B - A kind of quick object identification method of mobile robot based on deep learning

Info

Publication number: CN106228162B
Application number: CN201610581928.2A
Authority: CN
Inventors: 王威; 谈笑; 胡义轩; 袁泽寰
Original assignee: Individual
Current assignee: Nanjing Fudian Culture Communication Co.,Ltd.
Priority date: 2016-07-22
Filing date: 2016-07-22
Publication date: 2019-05-17
Anticipated expiration: 2036-07-22
Also published as: CN106228162A

Abstract

The invention discloses a kind of quick object identification methods of the mobile robot based on deep learning, and this method comprises the following steps: 1) mobile picture acquisition, 2) image data pretreatment, 3) picture feature extraction, 4) picture prediction output, 5) environmental constraints optimization, 6) picture recognition output.The present invention overcomes traditional object identification system to need to add using object identification the complexity and unstability of detection by the unified integration of detection and recognition result；By the generation of multilayer residual error network design and surrounding gravity constraint condition, the shortcomings that overcoming integrated form object identification system poor accuracy；The integration of detection identification mission also ensures the treatment effeciency of system, improves the sensing capability in robot moving process.

Description

A kind of quick object identification method of mobile robot based on deep learning

Technical field

The present invention relates to a kind of quick object identification method of mobile robot more particularly to a kind of shiftings based on deep learning The quick object identification method of mobile robot.

Background technique

Currently there are object detection identifying systems generally to recommend skill by sliding window or object area first Art obtain an object set of candidate regions, then using didactic Feature Selection come to this gather in candidate region into Row identification, identification refer to by classifier to candidate region it is specified most close to object category.

The difference of feature is used according to classifier, detection system can probably be divided into two types at present: being based on deep learning Mode；Mode based on Heuristic Feature.Scheme based on Heuristic Feature is generally by the artificial design feature of experience come table Show candidate region, and the mode based on deep learning, object features are extracted by multilayer neural network, to reach pair The multiple combination of object local feature, to achieve the effect that classify to object.Deep learning feature can be according to data Distribution carries out adaptive learning, so depth learning technology had on Detection accuracy compared to Heuristic Feature technology in recent years Some superiority.However discovery learning technology is because of its quick and easy inherent characteristic, the still quilt in some simple applications It uses, such as Face datection.

However this two technologies still remain disadvantage at present, can not be applied in actual scene, major embodiment are as follows: heuristic Although quickly, feature representation ability is limited for characterization method.In reality scene, since identified object is typically found in whole picture Among scene, and article size size changes greatly at random, blocks simultaneously because multiple objects exist between each other, so object Body Detection accuracy is relatively low；Although still further aspect depth learning technology can be very good to indicate object area, but need Classify to each object candidate region, object set of candidate regions is huger under normal circumstances, so classification needs very The long time；And well-designed post-processing technology is needed to screen classified object area, such as non local very big Value inhibition etc., in addition these technologies are still deposited in the identification of complex scene object detection all without the global information for utilizing image In defect.

Summary of the invention

The purpose of the present invention is to provide a kind of quick object identification method of the mobile robot based on deep learning,

Operational efficiency declines caused by for being separated due to detection with identification component in object detection identifying system, synchronous It is stronger to introduce ability to express in order not to lose the accuracy rate of detection for the function of completing object detection prediction and identify its classification The multilayer depth network of residual error formula improves the accuracy of integrated form scheme, solves shortcoming of the existing technology.

The present invention adopts the following technical scheme that realization:

1, the quick object identification method of a kind of mobile robot based on deep learning, which is characterized in that including walking as follows It is rapid:

1) mobile picture obtains: the vision data perceived in moving process by camera to obtain robot, deep Picture is spent because can be used to restore scene physical system to establish the constraint condition in scene it includes depth information.

2) image data pre-processes: two software units: color picture pretreatment unit and depth map are used in this step Piece pretreatment unit, color picture pretreatment unit is to carry out piecemeal processing to the color picture of input, by picture in its entirety point It is segmented into Pork-pieces grid, by one of its Marking the cell object thus if the center for occurring object in grid block therein Point, and corresponding to this grid, neural network will predict that multiple objects surround frame, and for the object category of this grid ownership It carries out certain confidence level to generate, provides corresponding data basis for object identification；

Depth picture pretreatment unit to generate the constraint condition under indoor environment, by the processing of data of adjusting the distance and Gravity vector is estimated, to establish corresponding physical environment system, corresponding environment is established about according to this environmental system Beam, such as the reference obtained between object and plane are measured, and eliminate mistake using this environmental constraints with identification module in detection Solution；

3) picture feature is extracted: being completed by building multilayer residual error neural network, is mainly included two software units: volume Product core feature extraction unit and residual unit, then using convolution kernel feature extraction unit and residual unit as the group of neural network Laminated structure is carried out at part, so that forming depth characteristic extracts neural network, completes the multilayer distributed reproduction of characteristics of image；

A) convolution kernel feature extraction: given piece image, system is by reusing a variety of convolution kernel functions, block normalization The mode of unit and non-linear correction unit extracts the characteristics of image of suitable object detection identification, and wherein convolution kernel function is one A small data window, such as the data window of 3*3, convolution operation are that data window slides on global picture region, and And seek product with response picture position by each in data window and sum by data window region to result, accordingly Convolution kernel function and network parameter will be obtained by training, and non-linear unit is by increasing the non-linear expression come Enhanced feature Ability.Picture local feature surrounds the prediction that enters data to complete as neural network at the generation of frame and object category Reason.

B) residual unit: the phenomenon that degeneration also will appear while exporting convergence due to the depth structure of neural network: I.e. with the increase of network depth, although precision is gradually increased, decline rapidly suddenly when reaching an extreme value.And this Phenomenon can not make alleviation not due to over-fitting causes, by continuing growing network depth to this.Compared to traditional scheme Using the output of previous feature extraction unit (specific combination of convolution kernel, non-linear unit) as the suitable of next layer of feature unit Sequence model) output of previous feature unit is added with input as the defeated of next layer of feature extraction unit by residual error neural unit Enter；

4) picture prediction exports:

Any given input picture, algorithm output result include in picture potential object encirclement frame size, The classification and confidence level of coordinate, object.Due to that may include multiple objects in a secondary picture, in order to reduce the uncertain of model Property, want to predict than in surrounding frame from whole picture prediction single body, the present invention attempts the grid that picture is first divided into p*p It whether there is object in each grid, if there is coordinate, size, classification and the confidence level (6 dimension) of prediction object.Because same One object possibly is present in multiple grids, and the cooperation prediction of multiple grids increases the robustness of object prediction and accurate Rate.And corresponding to each grid, neural network will predict that k possible objects surround frame, because may be same in a grid When there are the region of multiple objects, (p*p*k*6) a output is predicted using linear classifier, is finally deleted using non-maximum restraining Except redundancy and object window that confidence level is small.

5) environmental constraints optimize: environmental constraints optimization module is to generate corresponding environmental constraints item according to depth data Part can be calculated in environment since depth data can restore the three-dimensional point cloud structure in space by point cloud data Gravity vector, to be described to form constraint condition to plane and object come bottom-up according to gravity vector；

6) picture recognition exports: after environmental constraints optimization, system will be after neural network detection identification And the result after being optimized by environmental constraints is exported to robot receiving module and is handled, picture recognition output module is defeated simultaneously Go out for object detection with identification as a result, the central point x-axis y-axis of output data inclusion body classification, encirclement detection block is sat Mark surrounds the data such as detection width of frame and height.This data is used to robot assisted environment understanding and navigation composition.Having Under the premise of having high accuracy, whole system be can reach as the movement of robot handles (> 24FPS) in real time, to guarantee The timeliness and accuracy of robot command adapted thereto.

2, the quick object identification method of the mobile robot according to claim 1 based on deep learning, feature Be, the camera be depth camera, depth camera can also be obtained while obtaining ordinary color picture with away from From the depth picture based on information.

The method have the benefit that:

1) by the unified integration of detection and recognition result, traditional object identification system is overcome to need to add using object identification The complexity and unstability of detection；2) generation for passing through multilayer residual error network design and surrounding gravity constraint condition, overcomes collection The shortcomings that accepted way of doing sth object identification system poor accuracy；3) integration for detecting identification mission also ensures the treatment effeciency of system, mentions Sensing capability in high robot moving process.

Detailed description of the invention

Fig. 1 is the overall flow figure of recognition methods of the present invention.

Fig. 2 is multilayer residual error neural network diagram.

Fig. 3 is residual error neural unit figure.

Specific embodiment

By the following description of the embodiment, the public understanding present invention will more be facilitated, but can't should be by Shen Given specific embodiment of asking someone is considered as the limitation to technical solution of the present invention, the definition of any pair of component or technical characteristic Be changed and/or to overall structure make form and immaterial transformation is regarded as defined by technical solution of the present invention Protection scope.

Step 1, mobile picture obtain: mobile picture obtains module acquisition robot and passes through camera sense in moving process The vision data known, vision data include depth picture and common RGB color picture two parts.

Step 2, image data pretreatment: image data preprocessing module obtains the picture obtained in module to picture and carries out Pretreatment, firstly generates corresponding color picture and depth picture, rear color picture pretreatment unit is to the color to input Picture carries out piecemeal, and picture in its entirety is divided into one piece of block grid, positions object using the mode of grid.Depth picture is located in advance Reason unit is iterated calculating to gravity vector, first calculating picture plane normal vector.Therewith by horizontal with vertical normal vector into Row Classified optimization iteration, the final accurate approximation for obtaining gravity vector establish bottom-up physical environment constraint.

Step 3, picture feature are extracted: picture feature extraction module uses the shape of multilayer residual error neural network combination convolution kernel Formula highlights feature from level to level, as shown below: with going deep into for neural network, it is possible to find the feature of object edge is got over It is more obvious, light tone is presented.Edge Gradient Feature is carried out feature again and the feature for forming higher is combined to present by neural network, from And complete prediction.

Step 4, picture prediction output: picture prediction output module is carried out surrounding frame size, be sat to object in input picture The prediction of mark, the classification of object and confidence level.The present invention generates multiple prediction blocks, reuse non-maximum restraining delete redundancy and And the multiple prediction results of object window generation that confidence level is small.

Step 5, environmental constraints optimization: environmental constraints optimization module utilizes gravity vector direction, carries out to plane and object Height description forms constraint condition.Such as people is higher than seat plane, object is higher than desk plane etc., for prediction network output As a result it can be optimized according to constraint, to remove the mistake solution under some environmental constraints, the encirclement frame of human body also occurs In prediction result, it is judged as correct through overconstrained condition.

Step 6, picture recognition output: picture recognition output module output recognition result includes classification, the center for surrounding frame Point x-axis y-axis coordinate surrounds the data such as width of frame and height, uses to robot assisted environment understanding and navigation composition.Having Under the premise of having high accuracy, whole system be can reach as the movement of robot handles (> 24FPS) in real time, to guarantee The timeliness and accuracy of robot command adapted thereto.

Certainly, the present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, Those skilled in the art can make various corresponding changes and modifications according to the present invention, but these it is corresponding change and Deformation all should fall within the scope of protection of the appended claims of the present invention.

Claims

1. a kind of quick object identification method of mobile robot based on deep learning, which comprises the steps of:

1) mobile picture obtains: the vision data perceived in moving process by camera to obtain robot；

2) image data pre-processes: two software units are used in this step: color picture pretreatment unit and depth picture are pre- Picture in its entirety is divided by processing unit, color picture pretreatment unit to carry out piecemeal processing to the color picture of input Pork-pieces grid, by a part of its Marking the cell object thus if the center for occurring object in grid block therein, And correspond to this grid, neural network will predict multiple objects surround frame, and for this grid ownership object category into The certain confidence level of row generates, and provides corresponding data basis for object identification；

Processing and counterweight of the depth picture pretreatment unit to generate the constraint condition under indoor environment, by data of adjusting the distance Force vector is estimated, to establish corresponding physical environment system, establishes corresponding environmental constraints according to this environmental system；

3) picture feature is extracted: being completed by building multilayer residual error neural network, is mainly included two software units: convolution kernel Feature extraction unit and residual unit, then using convolution kernel feature extraction unit and residual unit as the composition portion of neural network Divide and carry out laminated structure, so that forming depth characteristic extracts neural network, completes the multilayer distributed reproduction of characteristics of image；

A) convolution kernel feature extraction: given piece image, system is by reusing a variety of convolution kernel functions, block normalization unit The characteristics of image of suitable object detection identification is extracted with the mode of non-linear correction unit, wherein convolution kernel function is a number According to window, convolution operation is that data window slides on global picture region, and presses each in data window and sound Picture position is answered to seek product and sum by data window region to result, corresponding convolution kernel function and network parameter will pass through To obtain, non-linear unit will be as mind by increasing the non-linear ability to express come Enhanced feature, picture local feature for training The generation processing for entering data to complete prediction and surrounding frame and object category through network；

B) residual unit: the output of previous feature unit is added with input as next layer of feature extraction list by residual error neural unit The input of member；

4) picture prediction exports: predicting output using linear classifier, the object of redundancy is finally deleted using non-maximum restraining Window；

5) environmental constraints optimize: environmental constraints optimization module is to generate corresponding environmental constraint according to depth data；

6) picture recognition export: by environmental constraints optimization after, system will by neural network detection identification after and lead to Result after crossing environmental constraints optimization, which is exported to robot receiving module, to be handled.

2. the quick object identification method of the mobile robot according to claim 1 based on deep learning, which is characterized in that The camera is depth camera, and depth camera can also obtain while obtaining ordinary color picture with range information Based on depth picture.