CN107463952B - Object material classification method based on multi-mode fusion deep learning - Google Patents
Object material classification method based on multi-mode fusion deep learning Download PDFInfo
- Publication number
- CN107463952B CN107463952B CN201710599106.1A CN201710599106A CN107463952B CN 107463952 B CN107463952 B CN 107463952B CN 201710599106 A CN201710599106 A CN 201710599106A CN 107463952 B CN107463952 B CN 107463952B
- Authority
- CN
- China
- Prior art keywords
- tactile
- matrix
- modality
- scale
- acceleration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000000463 material Substances 0.000 title claims abstract description 51
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000013135 deep learning Methods 0.000 title claims abstract description 15
- 230000001133 acceleration Effects 0.000 claims abstract description 68
- 230000000007 visual effect Effects 0.000 claims abstract description 55
- 230000005236 sound signal Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 75
- 238000011176 pooling Methods 0.000 claims description 51
- 239000000523 sample Substances 0.000 claims description 47
- 238000012549 training Methods 0.000 claims description 45
- 239000013598 vector Substances 0.000 claims description 35
- 238000005070 sampling Methods 0.000 claims description 16
- 238000001228 spectrum Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 3
- 229910052799 carbon Inorganic materials 0.000 claims description 3
- 125000004432 carbon atom Chemical group C* 0.000 claims description 3
- 239000002245 particle Substances 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000008447 perception Effects 0.000 abstract description 2
- 239000011365 complex material Substances 0.000 abstract 2
- 238000007500 overflow downdraw method Methods 0.000 abstract 1
- 239000000123 paper Substances 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 239000011087 paperboard Substances 0.000 description 3
- 239000004033 plastic Substances 0.000 description 3
- 235000013305 food Nutrition 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 1
- 239000000919 ceramic Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 235000013410 fast food Nutrition 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000006260 foam Substances 0.000 description 1
- 239000011888 foil Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 239000005022 packaging material Substances 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000005060 rubber Substances 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 230000015541 sensory perception of touch Effects 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 239000004753 textile Substances 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention relates to an object material classification method based on multi-mode fusion deep learning, and belongs to the technical field of computer vision, artificial intelligence and material classification. The invention discloses an object material classification method based on multi-mode fusion deep learning, in particular to a multi-mode fusion method based on an ultralimit learning machine with multi-scale local receptive fields. The method fuses perception information (including visual images, touch acceleration signals and touch sound signals) of different modes of the material of the object, and finally realizes correct classification of the material of the object. The method can not only utilize the multi-scale local receptive field to extract high-representation characteristics of the real complex material, but also effectively fuse the information of each mode to realize the information complementation between the modes. The method can improve the robustness and accuracy of the classification of the complex materials, so that the method has greater applicability and universality.
Description
Technical Field
The invention relates to an object material classification method based on multi-mode fusion deep learning, and belongs to the technical field of computer vision, artificial intelligence and material classification.
Background
In the world, the materials are various and can be divided into plastic, metal, ceramic, glass, wood, textile, stone, paper, rubber, foam and the like. Recently, object material classification has attracted great attention in the social environment, the industry, and the academia. For example, the classification of materials can be effectively used for recycling materials; four large columns of packaging material: paper, plastic, metal and glass, require packaging of unusable materials under different market demands. For long-distance transportation without special requirements on transportation quality, paper, paperboard and packaging box paperboard are generally selected; the food package should accord with the hygiene calibration, the package of direct-entry food such as cake and the like should use carton paperboard, and the light-proof and damp-proof use cans such as salt and the like, and the fast food box can be made of natural plant fiber; the reasonable use of decorative materials is the key to the success of interior decoration. Based on the requirements of the above problems, it is necessary to research a set of methods capable of automatically classifying object materials.
The mainstream method for object material classification is to use a visual image containing rich information, but two objects with extremely similar appearances cannot be distinguished only by the visual image. Assume that there are two objects: a red rough paper and a red plastic foil, the visual image has less ability to distinguish between the two objects. However, for the above assumption, the human brain can inherently fuse different modal perception features of the same object, thereby achieving the purpose of classifying the object material. With this heuristic, to enable a computer to implement automatic classification of object material, object material classification may be performed using different modality information of the object at the same time.
There are also currently published techniques for object material classification, such as chinese patent application CN105005787A, a material classification based on joint sparse coding of dexterous hand haptic information. The invention only uses the tactile sequence for material classification, and does not combine the multi-modal information of the material. It was observed that classifying object material using only visual images does not robustly capture material features such as hardness or roughness. It can be assumed that when a rigid tool is dragged or moved over the surface of a different object, the tool will produce vibrations and sounds of different frequencies, and thus tactile information complementary to vision can be used to classify the material of the object. However, how to effectively combine visual modalities with tactile modalities remains a challenging problem.
Disclosure of Invention
The invention aims to provide an object material classification method based on multi-mode fusion deep learning, which is used for realizing multi-mode information fusion object material classification on the basis of an overrun learning machine method based on multi-scale local receptive field so as to improve the robustness and accuracy of classification and effectively fuse various modal information of object materials for material classification.
The invention provides an object material classification method based on multi-mode fusion deep learning, which comprises the following steps:
(1) let the number of training samples be N1The training sample material type is M1Each class of material training sample is marked with a label ofWherein 1 is less than or equal to M1≤N1Separately collecting all N1Visual image I of a training sample1Tactile acceleration A1And a tactile sound S1Establishing an inclusion I1、A1And S1Data set D of1,I1The image size of (2) is 320 × 480;
setting the number of objects to be classified as N2The kind of the material of the object to be classified is M2Each class of object to be classified is labeled asWherein 1 is less than or equal to M2≤M1Separately collecting all N2Visual image I of an object to be classified2Tactile acceleration A2And a tactile sound S2Establishing an inclusion I2、A2And S2Data set D of2,I2The image size of (2) is 320 × 480;
(2) for the above data set D1And a data set D2The method comprises the following steps of carrying out visual image preprocessing on a visual image, carrying out tactile acceleration preprocessing on a tactile acceleration signal and carrying out tactile sound preprocessing on a tactile sound signal to respectively obtain a visual image, a tactile acceleration spectrogram and a tactile sound spectrogram, and comprises the following steps:
(2-1) image I with image size of 320X 480 by using down-sampling method1And image I2Down-sampling to obtain I1And I2A visual image of size 32 × 32 × 3;
(2-2) separately converting the tactile acceleration A into a plurality of tactile accelerations A by short-time Fourier transform1And tactile sense plusSpeed A2Converting to frequency domain, the window length of Hamming window in short-time Fourier transform is 500, the window offset is 100, the sampling frequency is 10kHz, and the tactile acceleration A is obtained respectively1And tactile acceleration A2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain A1And A2A haptic acceleration spectrum image of size 32 × 32 × 3;
(2-3) separately converting the tactile sounds S by short-time Fourier transform1And a tactile sound S2Converting to frequency domain, with Hamming window length of 500, window offset of 100, and sampling frequency of 10kHz in short-time Fourier transform, respectively obtaining haptic sound S1And a tactile sound S2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain S1And S2A sound spectrum image of size 32 × 32 × 3;
(3) obtaining convolution characteristics of a visual modality, a tactile acceleration modality and a tactile sound modality through multi-scale feature mapping, and comprising the following steps:
(3-1) subjecting the I obtained in the step (2) to1And I2A 32X 3 visual image, A1And A2Magnitude of 32 × 32 × 3 and S1And S2Is input into the first layer of the neural network, namely the input layer, the size of the input image is d x d, the local receptive field in the neural network has psi scale channels, the sizes of the psi scale channels are r1,r2,…,rΨGenerating K different input weights for each scale channel so as to randomly generate psi multiplied by K feature maps, and recording initial weights of a visual image, a tactile acceleration frequency spectrogram and a sound frequency spectrogram of a phi scale channel randomly generated by a neural network asAnd andare respectively composed ofAndthe method comprises the steps of composing column by column, wherein an upper corner mark I represents visual modals of a training sample and an object to be classified, an upper corner mark A represents a tactile acceleration modals of the training sample and the object to be classified, S represents a tactile sound modals of the training sample and the object to be classified,it is shown that the initial weight is,representing the initial weight for generating the zeta-th feature map, phi is more than or equal to 1 and less than or equal to psi, zeta is more than or equal to 1 and less than or equal to K, and the size of the phi-th scale local receptive field is rΦ×rΦ,
And obtaining the size (d-r) of all K characteristic maps of the phi-th scale channelΦ+1)×(d-rΦ+1);
(3-2) initial weight matrix for the phi-th scale channel using singular value decomposition methodPerforming orthogonalization processing to obtain an orthogonal matrixAnd andeach column ofAndare respectively asAndorthogonal basis of (1), input weight of the ζ -th feature map of the Φ -th scale channel Andare respectively composed ofAndforming a square matrix;
calculating the convolution characteristics of the nodes (i, j) in the zeta-th feature map of the phi-th scale channel of the visual, tactile acceleration and tactile sound modalities respectively by using the following formula:
Φ=1,2,3...,Ψ,
i,j=1,...,(d-rΦ+1),
ζ=1,2,3...,K,
anda convolution feature of a node (i, j) of a ζ -th feature graph in a Φ -th scale channel respectively representing a visual modality, a tactile acceleration modality, and a tactile sound modality, x being a matrix corresponding to the node (i, j);
(4) performing multi-scale square root pooling on convolution characteristics of the visual modality, the tactile acceleration modality and the tactile sound modality, wherein the pooling scales have psi scales, and the magnitudes of the psi scales are e1,e2,…,eΨSize of pooling at the phi-th scale eΦIndicating the distance between the pooling center and the edge, the pooling map and the feature map being of the same size and being (d-r)Φ+1)×(d-rΦ+1) calculating the pooling feature from the convolution feature obtained in step (3) using the following formula:
p,q=1,...,(d-rΦ+1),
if the node (i, j) is not in (d-r)Φ+1), thenAndare all zero, and the total number of the active carbon particles is zero,
Φ=1,2,3...,Ψ,
ζ=1,2,3...,K,
wherein,andpooling features of nodes (p, q) of a ζ -th pooling graph in a Φ -th scale channel representing a visual modality, a haptic acceleration modality, and a haptic sound modality, respectively;
(5) obtaining full-connection feature vectors of three modes according to the pooling features, and the method comprises the following steps:
(5-1) connecting all the pooled features of the pooled graphs of the visual image modality, the tactile acceleration modality and the tactile sound modality of the omega training sample in the pooled features of the step (4) into a row vector respectivelyAndwherein omega is more than or equal to 1 and less than or equal to N1;
(5-2) traversal of N1Repeating the step (5-1) for each training sample to obtain N1The row vector combination of the visual image modality, the tactile acceleration modality, and the tactile sound modality of the training sample is recorded as:
wherein,a matrix of combined feature vectors representing the visual modalities,a matrix of tactile acceleration modal characteristics is represented,representing tactile soundA feature vector matrix of the modality;
(6) the method comprises the following steps of performing multi-mode fusion on fully connected feature vectors of three modes to obtain a multi-mode fused mixing matrix:
(6-1) reacting N in the step (5)1The method comprises the steps of inputting a mixed layer of a visual image modality, a tactile acceleration modality and a tactile sound modality of a training sample in a row vector mode, and performing combination processing to obtain a mixed matrix H ═ HI,HA,HS];
(6-2) adjusting the mixing row vector of each sample in the mixing matrix H in the step (6-1) to generate a multi-mode fused two-dimensional mixing matrix, wherein the size of the two-dimensional mixing matrix isWherein d' is the length of the two-dimensional matrix and has a value range of
(7) Inputting the multi-modal fused mixing matrix obtained in the step (6) into a mixing network layer of a neural network, and obtaining multi-modal mixed convolution characteristics through multi-scale characteristic mapping, wherein the method comprises the following steps:
(7-1) inputting the multi-modal fused mixing matrix obtained in the step (6-2) into a mixing network, wherein the size of the mixing matrix is d 'multiplied by d', the mixing network is provided with psi 'scale channels, and the sizes of the psi' scale channels are r1,r2,…,rΨ'Generating K 'different input weights for each scale channel, thereby randomly generating psi' multiplied by K 'mixed feature maps, and recording phi' scale channel mixed initial weights randomly generated by the mixed network asByColumn by column, wherein the superscript hybrid represents a tri-modal fusion,an initial weight of the hybrid network is represented,representing the initial weight for generating the zeta 'th mixed feature map, 1 ≦ phi' ≦ psi ',1 ≦ zeta' ≦ K ', and the size of the phi' th scale channel local receptive field is rΦ'×rΦ'Then, then
Further, the size of the zeta ' th characteristic diagram of the phi ' th scale channel is obtained as (d ' -r)Φ'+1)×(d”-rΦ'+1);
(7-2) using a singular value decomposition method to initialize a weight matrix for the phi' -th scale channelPerforming orthogonalization processing to obtain an orthogonal matrixEach column ofIs thatThe input weight of the ζ 'th feature map of the Φ' th scale channelIs formed byForming a square matrix;
calculating the convolution node (i ', j') mixed convolution characteristics in the zeta 'th characteristic graph of the phi' th scale channel by using the following formula:
Φ'=1,2,3...,Ψ',
i',j'=1,...,(d'-rΦ'+1),
ζ'=1,2,3...,K',
is a convolution node (i ', j ') mixed convolution feature in the ζ ' th feature graph of the Φ ' th scale channel, and x ' is a matrix corresponding to the node (i ', j ');
(8) performing mixed multi-scale square root pooling on the mixed convolution characteristics, wherein the pooling scales have psi' scales and the sizes are e respectively1,e2,…,eΨ'The pooling map and the feature map at the phi 'th scale have the same size and are (d' -r)Φ'+1)×(d”-rΦ'+1), calculating the mixed pooling feature according to the mixed convolution feature obtained in the step (7) by using the following formula:
p',q'=1,...,(d'-rΦ'+1),
Φ'=1,2,3...,Ψ',
ζ'=1,2,3...,K',
wherein,hybrid pooling characteristics of the combined nodes (p ', q') of the ζ 'th pooling map representing the Φ' th scale channel;
(9) repeating the step (5) according to the mixed pooling characteristics, and fully connecting the mixed pooling characteristic vectors with different scales to obtain the mixed pooling characteristic vectorCombined feature matrix to hybrid networkWherein K' represents the number of different characteristic graphs generated by each scale channel;
(10) the combination characteristic matrix H of the hybrid network obtained according to the step (9)hybricUsing the following formula, based on the number N of training samples1Computing training sample output weights for the neural network β:
Where T is a training sampleC is a regularization coefficient and takes an arbitrary value, in one embodiment of the present invention, the value of C is 5, and superscript T represents matrix transposition;
(11) utilizing the orthogonal matrix after the initial weight orthogonalization of the three modes in the step (3)Andfor the preprocessed data set D to be classified2Obtaining the three-mode mixed feature vector H of the sample to be classified by using the method from the step (3) to the step (9)test;
(12) The training sample output weights β from step (10) above and the method aboveThe three-mode mixed feature vector H of the step (11)testCalculating N by using the following formula2Prediction tag mu of sample to be classifiedεRealizes object material classification based on multi-mode fusion deep learning,
με=Htestβ1≤ε≤M。
the object material classification method based on the multi-mode fusion deep learning provided by the invention has the following characteristics and advantages:
1. the ultralimit learning machine method based on the multi-scale local receptive field can sense the material by using the local receptive fields of multiple scales, extract various characteristics and realize the classification of the material of a complex object.
2. The deep learning method of the ultralimit learning machine based on the multi-scale local receptive field can integrate feature learning and image classification, and does not extract features by a manually designed feature extractor, so that the algorithm is suitable for classifying most objects with different materials.
3. The invention discloses an ultralimit learning machine method based on multi-scale local receptive fields, which is a multi-mode fusion deep learning method based on the ultralimit learning machine of the multi-scale local receptive fields, can effectively fuse information of three modes of object materials, realizes information complementation and improves the robustness and accuracy of material classification.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
FIG. 2 is a block diagram of a flow of an ultralimit learning machine based on multi-scale local receptive fields in the method of the present invention.
FIG. 3 is a block diagram of a process of different modality fusion of the ultralimit learning machine method based on multi-scale local receptive fields in the present invention.
Detailed Description
The flow chart of the object material classification method based on the multi-mode fusion deep learning is shown in fig. 1 and mainly comprises a visual image mode, a touch acceleration mode, a touch sound mode and a mixed network. The method comprises the following steps:
(1) let the number of training samples be N1The training sample material type is M1Each class of material training sample is marked with a label ofWherein 1 is less than or equal to M1≤N1Separately collecting all N1Visual image I of a training sample1Tactile acceleration A1And a tactile sound S1Establishing an inclusion I1、A1And S1Data set D of1,I1The image size of (2) is 320 × 480;
setting the number of objects to be classified as N2The kind of the material of the object to be classified is M2Each class of object to be classified is labeled asWherein 1 is less than or equal to M2≤M1Separately collecting all N2Visual image I of an object to be classified2Tactile acceleration A2And a tactile sound S2Establishing an inclusion I2、A2And S2Data set D of2,I2The image size of (2) is 320 × 480; in which the tactile acceleration A1And A2Is a one-dimensional signal acquired by a sensor when a rigid object slides on the surface of a material, namely a tactile sound S1And S2When the rigid object slides on the surface of the object material, the one-dimensional signal is stored by a microphone;
(2) for the above data set D1And a data set D2The method comprises the following steps of carrying out visual image preprocessing on a visual image, carrying out tactile acceleration preprocessing on a tactile acceleration signal and carrying out tactile sound preprocessing on a tactile sound signal to respectively obtain a visual image, a tactile acceleration spectrogram and a tactile sound spectrogram, and comprises the following steps:
(2-1) image I with image size of 320X 480 by using down-sampling method1And image I2Down-sampling to obtain I1And I2A visual image of size 32 × 32 × 3;
(2-2) separately converting the tactile acceleration A into a plurality of tactile accelerations A by short-time Fourier transform1And tactile acceleration A2Converting to frequency domain, the window length of Hamming window in short-time Fourier transform is 500, the window offset is 100, the sampling frequency is 10kHz, and the tactile acceleration A is obtained respectively1And tactile acceleration A2The first 500 low frequency channels are selected from the spectrogram as a spectral image, which retains most of the energy from the haptic signal, and the spectral image is down-sampled to obtain a1And A2A haptic acceleration spectrum image of size 32 × 32 × 3;
(2-3) separately converting the tactile sounds S by short-time Fourier transform1And a tactile sound S2Converting to frequency domain, with Hamming window length of 500, window offset of 100, and sampling frequency of 10kHz in short-time Fourier transform, respectively obtaining haptic sound S1And a tactile sound S2The first 500 low frequency channels are selected from the spectrogram as a spectral image, which retains most of the energy from the haptic signal, and the spectral image is down-sampled to obtain S1And S2A sound spectrum image of size 32 × 32 × 3;
(3) obtaining convolution characteristics of a visual modality, a tactile acceleration modality and a tactile sound modality through multi-scale feature mapping, and comprising the following steps:
(3-1) subjecting the I obtained in the step (2) to1And I2A 32X 3 visual image, A1And A2Magnitude of 32 × 32 × 3 and S1And S2Is input into the first layer of the neural network, namely the input layer, the size of the input image is d x d, the local receptive field in the neural network has psi scale channels, the sizes of the psi scale channels are r1,r2,…,rΨGenerating K different input weights for each scale channel so as to randomly generate psi multiplied by K feature maps, and randomly generating a visual image, a tactile acceleration frequency spectrogram and a sound frequency spectrogram of a phi scale channel generated by a neural networkIs given as an initial weightAnd andare respectively composed ofAndthe method comprises the steps of composing column by column, wherein an upper corner mark I represents visual modals of a training sample and an object to be classified, an upper corner mark A represents a tactile acceleration modals of the training sample and the object to be classified, S represents a tactile sound modals of the training sample and the object to be classified,it is shown that the initial weight is,representing the initial weight for generating the zeta-th feature map, phi is more than or equal to 1 and less than or equal to psi, zeta is more than or equal to 1 and less than or equal to K, and the size of the phi-th scale local receptive field is rΦ×rΦ,
And obtaining the size (d-r) of all K characteristic maps of the phi-th scale channelΦ+1)×(d-rΦ+1);
(3-2) using a singular value decomposition method for the phi-th rulerInitial weight matrix of degree channelPerforming orthogonalization processing to obtain an orthogonal matrixAndthe orthogonalized input weights can extract more complete features,andeach column ofAndare respectively asAndorthogonal basis of (1), input weight of the ζ -th feature map of the Φ -th scale channel Andare respectively composed ofAndforming a square matrix;
calculating the convolution characteristics of the nodes (i, j) in the zeta-th feature map of the phi-th scale channel of the visual, tactile acceleration and tactile sound modalities respectively by using the following formula:
Φ=1,2,3...,Ψ,
i,j=1,...,(d-rΦ+1),
ζ=1,2,3...,K,
anda convolution feature of a node (i, j) of a ζ -th feature graph in a Φ -th scale channel respectively representing a visual modality, a tactile acceleration modality, and a tactile sound modality, x being a matrix corresponding to the node (i, j);
(4) performing multi-scale square root pooling on convolution characteristics of the visual modality, the tactile acceleration modality and the tactile sound modality, wherein the pooling scales have psi scales, and the magnitudes of the psi scales are e1,e2,…,eΨSize of pooling at the phi-th scale eΦThe distance between the pooling center and the edge is shown, and as shown in FIG. 2, the pooling map and the feature map are the same size, and are (d-r)Φ+1)×(d-rΦ+1) calculating the pooling feature from the convolution feature obtained in step (3) using the following formula:
p,q=1,...,(d-rΦ+1),
if the node (i, j) is not in (d-r)Φ+1), thenAndare all zero, and the total number of the active carbon particles is zero,
Φ=1,2,3...,Ψ,
ζ=1,2,3...,K,
wherein,andpooling features of nodes (p, q) of a ζ -th pooling graph in a Φ -th scale channel representing a visual modality, a haptic acceleration modality, and a haptic sound modality, respectively;
(5) obtaining full-connection feature vectors of three modes according to the pooling features, and the method comprises the following steps:
(5-1) connecting all the pooled features of the pooled graphs of the visual image modality, the tactile acceleration modality and the tactile sound modality of the omega training sample in the pooled features of the step (4) into a row vector respectivelyAndwherein omega is more than or equal to 1 and less than or equal to N1;
(5-2) traversal of N1Repeating the step (5-1) for each training sample to obtain N1The row vector combination of the visual image modality, the tactile acceleration modality, and the tactile sound modality of the training sample is recorded as:
wherein,a matrix of combined feature vectors representing the visual modalities,a matrix of tactile acceleration modal characteristics is represented,a matrix of feature vectors representing haptic sound modalities;
(6) the method comprises the following steps of performing multi-mode fusion on fully connected feature vectors of three modes to obtain a multi-mode fused mixing matrix:
(6-1) reacting N in the step (5)1The method comprises the steps of inputting a mixed layer of a visual image modality, a tactile acceleration modality and a tactile sound modality of a training sample in a row vector mode, and performing combination processing to obtain a mixed matrix H ═ HI,HA,HS];
(6-2) adjusting the mixing row vector of each sample in the mixing matrix H in the step (6-1) to generate a multi-mode fused two-dimensional mixing matrix, wherein the size of the two-dimensional mixing matrix isAs shown in fig. 3, wherein d' is the length of the two-dimensional matrix and has a value range of
(7) Inputting the multi-modal fused mixing matrix obtained in the step (6) into a mixing network layer of a neural network, and obtaining multi-modal mixed convolution characteristics through multi-scale characteristic mapping, wherein the method comprises the following steps:
(7-1) inputting the multi-modal fused mixing matrix obtained in the step (6-2) into a mixing network, wherein the size of the mixing matrix is d 'multiplied by d', the mixing network is provided with psi 'scale channels, and the sizes of the psi' scale channels are r1,r2,…,rΨ'Generating K 'different input weights for each scale channel, thereby randomly generating psi' multiplied by K 'mixed feature maps, and recording phi' scale channel mixed initial weights randomly generated by the mixed network asByColumn by column, wherein the superscript hybrid represents a tri-modal fusion,an initial weight of the hybrid network is represented,representing the initial weight for generating the zeta 'th mixed feature map, 1 ≦ phi' ≦ psi ',1 ≦ zeta' ≦ K ', and the size of the phi' th scale channel local receptive field is rΦ'×rΦ'Then, then
Further, the size of the zeta ' th characteristic diagram of the phi ' th scale channel is obtained as (d ' -r)Φ'+1)×(d”-rΦ'+1);
(7-2) using a singular value decomposition method to initialize a weight matrix for the phi' -th scale channelPerforming orthogonalization processing to obtain an orthogonal matrixThe orthogonalized input weights can extract more complete features,each column ofIs thatThe input weight of the ζ 'th feature map of the Φ' th scale channelIs formed byForming a square matrix;
calculating the convolution node (i ', j') mixed convolution characteristics in the zeta 'th characteristic graph of the phi' th scale channel by using the following formula:
Φ'=1,2,3...,Ψ',
i',j'=1,...,(d'-rΦ'+1),
ζ'=1,2,3...,K',
is a convolution node (i ', j ') mixed convolution feature in the ζ ' th feature graph of the Φ ' th scale channel, and x ' is a matrix corresponding to the node (i ', j ');
(8) performing mixed multi-scale square root pooling on the mixed convolution characteristics, wherein the pooling scales have psi' scales and the sizes are e respectively1,e2,…,eΨ'The pooling map and the feature map at the phi 'th scale have the same size and are (d' -r)Φ'+1)×(d”-rΦ'+1), calculating the mixed pooling feature according to the mixed convolution feature obtained in the step (7) by using the following formula:
p',q'=1,...,(d'-rΦ'+1),
Φ'=1,2,3...,Ψ',
ζ'=1,2,3...,K',
wherein,hybrid pooling characteristics of the combined nodes (p ', q') of the ζ 'th pooling map representing the Φ' th scale channel;
(9) repeating the step (5) according to the mixed pooling characteristics, and fully connecting the mixed pooling characteristic vectors with different scales to obtain a combined characteristic matrix of the mixed networkWherein K' represents the number of different characteristic graphs generated by each scale channel;
(10) the combination characteristic matrix H of the hybrid network obtained according to the step (9)hybricUsing the following formula, based on the number N of training samples1Computing training sample output weights for the neural network β:
Where T is a training sampleC is a regularization coefficient and takes an arbitrary value, in one embodiment of the present invention, the value of C is 5, and superscript T represents matrix transposition;
(11) utilizing the orthogonal matrix after the initial weight orthogonalization of the three modes in the step (3)Andfor the preprocessed data set D to be classified2Obtaining a three-mode mixed feature vector H of the sample to be classifiedtest(ii) a By utilizing the step (3), convolution characteristic vectors of three modes of the object to be classified can be obtained; by utilizing the step (4), the pooling feature vectors of the three modes of the object to be classified can be obtained; by utilizing the step (5), the full-connection feature vectors of the three modes of the object to be classified can be obtained; by utilizing the step (6), a multi-mode fused mixing matrix of the object to be classified can be obtained; by utilizing the step (7), the multi-modal mixed convolution characteristics of the object to be classified can be obtained; by utilizing the step (8), the multi-mode mixed pooling characteristics of the object to be classified can be obtained; by using the step (9), the three-mode mixed feature vector H of the object to be classified can be obtainedtest。
(12) According to the training sample output weight β of the step (10) and the tri-modal mixture feature vector H of the step (11)testCalculating N by using the following formula2Prediction tag mu of sample to be classifiedεRealizes object material classification based on multi-mode fusion deep learning,
με=Htestβ 1≤ε≤M。
Claims (1)
1. an object material classification method based on multi-mode fusion deep learning is characterized by comprising the following steps:
(1) let the number of training samples be N1The training sample material type is M1Each class of material training sample is marked with a label ofWherein 1 is less than or equal to M1≤N1Separately collecting all N1Visual image I of a training sample1Tactile acceleration A1And a tactile sound S1Establishing an inclusion I1、A1And S1Data set D of1,I1The image size of (2) is 320 × 480;
setting the number of objects to be classified as N2The kind of the material of the object to be classified is M2Each class of object to be classified is labeled asWherein 1 is less than or equal to M2≤M1Separately collecting all N2Visual image I of an object to be classified2Tactile acceleration A2And a tactile sound S2Establishing an inclusion I2、A2And S2Data set D of2,I2The image size of (2) is 320 × 480;
(2) for the above data set D1And a data set D2The method comprises the following steps of carrying out visual image preprocessing on a visual image, carrying out tactile acceleration preprocessing on a tactile acceleration signal and carrying out tactile sound preprocessing on a tactile sound signal to respectively obtain a visual image, a tactile acceleration spectrogram and a tactile sound spectrogram, and comprises the following steps:
(2-1) image I with image size of 320X 480 by using down-sampling method1And image I2Down-sampling to obtain I1And I2A visual image of size 32 × 32 × 3;
(2-2) separately converting the tactile acceleration A into a plurality of tactile accelerations A by short-time Fourier transform1And tactile acceleration A2Converting to frequency domain, the window length of Hamming window in short-time Fourier transform is 500, the window offset is 100, the sampling frequency is 10kHz, and the tactile acceleration A is obtained respectively1And tactile acceleration A2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain A1And A2A haptic acceleration spectrum image of size 32 × 32 × 3;
(2-3) separately converting the tactile sounds S by short-time Fourier transform1And a tactile sound S2Conversion to frequency domain, short timeThe window length of Hamming window in Fourier transform is 500, window offset is 100, sampling frequency is 10kHz, and tactile sound S is obtained respectively1And a tactile sound S2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain S1And S2A sound spectrum image of size 32 × 32 × 3;
(3) obtaining convolution characteristics of a visual modality, a tactile acceleration modality and a tactile sound modality through multi-scale feature mapping, and comprising the following steps:
(3-1) subjecting the I obtained in the step (2) to1And I2A 32X 3 visual image, A1And A2Magnitude of 32 × 32 × 3 and S1And S2The size of the input image is d × d × 3, the local receptive field in the neural network has Ψ scale channels, and the Ψ scale channels have r sizes respectively1,r2,…,rΨGenerating K different input weights for each scale channel so as to randomly generate psi multiplied by K feature maps, and recording initial weights of a visual image, a tactile acceleration frequency spectrogram and a sound frequency spectrogram of a phi scale channel randomly generated by a neural network asAnd andare respectively composed ofAndthe method comprises the steps of composing column by column, wherein an upper corner mark I represents visual modals of a training sample and an object to be classified, an upper corner mark A represents a tactile acceleration modals of the training sample and the object to be classified, S represents a tactile sound modals of the training sample and the object to be classified,it is shown that the initial weight is,representing the initial weight for generating the zeta-th feature map, phi is more than or equal to 1 and less than or equal to psi, zeta is more than or equal to 1 and less than or equal to K, and the size of the phi-th scale local receptive field is rΦ×rΦ,
And obtaining the size (d-r) of all K characteristic maps of the phi-th scale channelΦ+1)×(d-rΦ+1);
(3-2) initial weight matrix for the phi-th scale channel using singular value decomposition methodPerforming orthogonalization processing to obtain an orthogonal matrixAnd andeach column ofAndare respectively asOrthogonal basis of (1), input weight of the ζ -th feature map of the Φ -th scale channel Andare respectively composed ofAndforming a square matrix;
calculating the convolution characteristics of the nodes (i, j) in the zeta-th feature map of the phi-th scale channel of the visual, tactile acceleration and tactile sound modalities respectively by using the following formula:
anda convolution feature of a node (i, j) of a ζ -th feature graph in a Φ -th scale channel respectively representing a visual modality, a tactile acceleration modality, and a tactile sound modality, x being a matrix corresponding to the node (i, j);
(4) performing multi-scale square root pooling on convolution characteristics of the visual modality, the tactile acceleration modality and the tactile sound modality, wherein the pooling scales have psi scales, and the magnitudes of the psi scales are e1,e2,…,eΨSize of pooling at the phi-th scale eΦIndicating the distance between the pooling center and the edge, the pooling map and the feature map being of the same size and being (d-r)Φ+1)×(d-rΦ+1) calculating the pooling feature from the convolution feature obtained in step (3) using the following formula:
if node i is not (0, (d-r)Φ+1)), node j is not at (0, (d-r)Φ+1)), thenAndare all zero, and the total number of the active carbon particles is zero,
Φ=1,2,3...,Ψ,
ζ=1,2,3...,K,
wherein,andpooling features of nodes (p, q) of a ζ -th pooling graph in a Φ -th scale channel representing a visual modality, a haptic acceleration modality, and a haptic sound modality, respectively;
(5) obtaining full-connection feature vectors of three modes according to the pooling features, and the method comprises the following steps:
(5-1) connecting all the pooled features of the pooled graphs of the visual image modality, the tactile acceleration modality and the tactile sound modality of the omega training sample in the pooled features of the step (4) into a row vector respectivelyAndwherein omega is more than or equal to 1 and less than or equal to N1;
(5-2) traversal of N1Repeating the step (5-1) for each training sample to obtain N1The row vector combination of the visual image modality, the tactile acceleration modality, and the tactile sound modality of the training sample is recorded as:
wherein,a matrix of combined feature vectors representing the visual modalities,a matrix of tactile acceleration modal characteristics is represented,a matrix of feature vectors representing haptic sound modalities;
(6) the method comprises the following steps of performing multi-mode fusion on fully connected feature vectors of three modes to obtain a multi-mode fused mixing matrix:
(6-1) reacting N in the step (5)1The method comprises the steps of inputting a mixed layer of a visual image modality, a tactile acceleration modality and a tactile sound modality of a training sample in a row vector mode, and performing combination processing to obtain a mixed matrix H ═ HI,HA,HS];
(6-2) adjusting the mixing row vector of each sample in the mixing matrix H in the step (6-1) to generate a multi-mode fused two-dimensional mixing matrix, wherein the size of the two-dimensional mixing matrix is d' × d ",wherein d' is the length of the two-dimensional matrix and has a value range of
(7) Inputting the multi-modal fused mixing matrix obtained in the step (6) into a mixing network layer of a neural network, and obtaining multi-modal mixed convolution characteristics through multi-scale characteristic mapping, wherein the method comprises the following steps:
(7-1) inputting the multi-modal fused mixing matrix obtained in the step (6-2) into a mixing network, wherein the size of the mixing matrix is d 'multiplied by d', the mixing network is provided with psi 'scale channels, and the sizes of the psi' scale channels are r1,r2,…,rΨ'Generating K 'different input weights for each scale channel, thereby randomly generating psi' multiplied by K 'mixed feature maps, and recording phi' scale channel mixed initial weights randomly generated by the mixed network as ByColumn by column, wherein the superscript hybrid represents a tri-modal fusion,an initial weight of the hybrid network is represented,representing the initial weight for generating the zeta 'th mixed feature map, 1 ≦ phi' ≦ psi ',1 ≦ zeta' ≦ K ', and the size of the phi' th scale channel local receptive field is rΦ'×rΦ'Then, then
Further, the size of the zeta ' th characteristic diagram of the phi ' th scale channel is obtained as (d ' -r)Φ'+1)×(d”-rΦ'+1);
(7-2) using a singular value decomposition method to initialize a weight matrix for the phi' -th scale channelPerforming orthogonalization processing to obtain an orthogonal matrix Each column ofIs thatThe input weight of the ζ 'th feature map of the Φ' th scale channelIs formed byForming a square matrix;
calculating the convolution node (i ', j') mixed convolution characteristics in the zeta 'th characteristic graph of the phi' th scale channel by using the following formula:
is a convolution node (i ', j ') mixed convolution feature in the ζ ' th feature graph of the Φ ' th scale channel, and x ' is a matrix corresponding to the node (i ', j ');
(8) performing mixed multi-scale square root pooling on the mixed convolution characteristics, wherein the pooling scales have psi' scales and the sizes are e respectively1,e2,…,eΨ'The pooling map and the feature map at the phi 'th scale have the same size and are (d' -r)Φ'+1)×(d”-rΦ'+1), calculating the mixed pooling feature according to the mixed convolution feature obtained in the step (7) by using the following formula:
if node i 'is not (0, (d' -r)Φ’+1)), node j 'is not (0, (d' -r)Φ’+1)), thenThe number of the carbon atoms is zero,
Φ'=1,2,3...,Ψ',
ζ'=1,2,3...,K';
wherein,hybrid pooling characteristics of the combined nodes (p ', q') of the ζ 'th pooling map representing the Φ' th scale channel;
(9) according to the mixed pooling characteristics, adopting the method of the step (5) to fully connect the mixed pooling characteristic vectors with different scales to obtain a combined characteristic matrix of the mixed networkWherein K' represents the number of different characteristic graphs generated by each scale channel;
(10) the combination characteristic matrix H of the hybrid network obtained according to the step (9)hybricUsing the following formula, based on the number N of training samples1Computing training sample output weights for the neural network β:
Where T is a training sampleC is a regularization coefficient, the value is an arbitrary value, and superscript T represents matrix transposition;
(11) utilizing the orthogonal matrix after the initial weight orthogonalization of the three modes in the step (3)Andfor the preprocessed data set D to be classified2Obtaining a combined feature matrix H of the mixed network of the samples to be classified by using the methods from the step (3) to the step (9)test;
(12) According to the training sample output weight β of the step (10) and the combined feature matrix H of the mixed network of the samples to be classified of the step (11)testBy usingN is calculated by the following formula2Prediction tag mu of sample to be classifiedεRealizes object material classification based on multi-mode fusion deep learning,
με=Htestβ 1≤ε≤M2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710599106.1A CN107463952B (en) | 2017-07-21 | 2017-07-21 | Object material classification method based on multi-mode fusion deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710599106.1A CN107463952B (en) | 2017-07-21 | 2017-07-21 | Object material classification method based on multi-mode fusion deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107463952A CN107463952A (en) | 2017-12-12 |
CN107463952B true CN107463952B (en) | 2020-04-03 |
Family
ID=60546004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710599106.1A Active CN107463952B (en) | 2017-07-21 | 2017-07-21 | Object material classification method based on multi-mode fusion deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107463952B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108734210B (en) * | 2018-05-17 | 2021-10-15 | 浙江工业大学 | Object detection method based on cross-modal multi-scale feature fusion |
CN108846375B (en) * | 2018-06-29 | 2019-06-18 | 山东大学 | A kind of multi-modal Cooperative Study method and device neural network based |
CN109190638A (en) * | 2018-08-09 | 2019-01-11 | 太原理工大学 | Classification method based on the online order limit learning machine of multiple dimensioned local receptor field |
EP3620978A1 (en) * | 2018-09-07 | 2020-03-11 | Ibeo Automotive Systems GmbH | Method and device for classifying objects |
CN109447124B (en) * | 2018-09-28 | 2019-11-19 | 北京达佳互联信息技术有限公司 | Image classification method, device, electronic equipment and storage medium |
CN109508740B (en) * | 2018-11-09 | 2019-08-13 | 郑州轻工业学院 | Object hardness identification method based on Gaussian mixed noise production confrontation network |
CN109902585B (en) * | 2019-01-29 | 2023-04-07 | 中国民航大学 | Finger three-mode fusion recognition method based on graph model |
CN110020596B (en) * | 2019-02-21 | 2021-04-30 | 北京大学 | Video content positioning method based on feature fusion and cascade learning |
CN110659427A (en) * | 2019-09-06 | 2020-01-07 | 北京百度网讯科技有限公司 | City function division method and device based on multi-source data and electronic equipment |
CN110942060B (en) * | 2019-10-22 | 2023-05-23 | 清华大学 | Material identification method and device based on laser speckle and modal fusion |
CN110909637A (en) * | 2019-11-08 | 2020-03-24 | 清华大学 | Outdoor mobile robot terrain recognition method based on visual-touch fusion |
CN111028204B (en) * | 2019-11-19 | 2021-10-08 | 清华大学 | Cloth defect detection method based on multi-mode fusion deep learning |
CN110861853B (en) * | 2019-11-29 | 2021-10-19 | 三峡大学 | Intelligent garbage classification method combining vision and touch |
CN111590611B (en) * | 2020-05-25 | 2022-12-02 | 北京具身智能科技有限公司 | Article classification and recovery method based on multi-mode active perception |
CN113111902B (en) * | 2021-01-02 | 2024-10-15 | 大连理工大学 | Pavement material identification method based on voice and image multi-mode collaborative learning |
CN112893180A (en) * | 2021-01-20 | 2021-06-04 | 同济大学 | Object touch classification method and system considering friction coefficient abnormal value elimination |
CN113780460A (en) * | 2021-09-18 | 2021-12-10 | 广东人工智能与先进计算研究院 | Material identification method and device, robot, electronic equipment and storage medium |
CN114723963B (en) * | 2022-04-26 | 2024-06-04 | 东南大学 | Task action and object physical attribute identification method based on visual touch signal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715260A (en) * | 2015-03-05 | 2015-06-17 | 中南大学 | Multi-modal fusion image sorting method based on RLS-ELM |
CN105512609A (en) * | 2015-11-25 | 2016-04-20 | 北京工业大学 | Multi-mode fusion video emotion identification method based on kernel-based over-limit learning machine |
CN105956351A (en) * | 2016-07-05 | 2016-09-21 | 上海航天控制技术研究所 | Touch information classified computing and modelling method based on machine learning |
CN106874961A (en) * | 2017-03-03 | 2017-06-20 | 北京奥开信息科技有限公司 | A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field |
WO2017100903A1 (en) * | 2015-12-14 | 2017-06-22 | Motion Metrics International Corp. | Method and apparatus for identifying fragmented material portions within an image |
-
2017
- 2017-07-21 CN CN201710599106.1A patent/CN107463952B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104715260A (en) * | 2015-03-05 | 2015-06-17 | 中南大学 | Multi-modal fusion image sorting method based on RLS-ELM |
CN105512609A (en) * | 2015-11-25 | 2016-04-20 | 北京工业大学 | Multi-mode fusion video emotion identification method based on kernel-based over-limit learning machine |
WO2017100903A1 (en) * | 2015-12-14 | 2017-06-22 | Motion Metrics International Corp. | Method and apparatus for identifying fragmented material portions within an image |
CN105956351A (en) * | 2016-07-05 | 2016-09-21 | 上海航天控制技术研究所 | Touch information classified computing and modelling method based on machine learning |
CN106874961A (en) * | 2017-03-03 | 2017-06-20 | 北京奥开信息科技有限公司 | A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field |
Non-Patent Citations (3)
Title |
---|
Deep Learning for Surface Material Classification Using Haptic and Visual Information;Haitian Zheng et al.;《IEEE TRANSACTIONS ON MULTIMEDIA》;20161130;第2407-2416页 * |
Multi-Modal Local Receptive Field Extreme Learning Machine for Object Recognition;Fengxue Li et al.;《2016 International Joint Conference on Neural Networks (IJCNN)》;20161103;第1696-1701页 * |
基于神经网络的三维模型视觉特征分析;韦伟;《计算机工程与应用》;20080721;第174-178页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107463952A (en) | 2017-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463952B (en) | Object material classification method based on multi-mode fusion deep learning | |
Myers et al. | Affordance detection of tool parts from geometric features | |
Dering et al. | A convolutional neural network model for predicting a product's function, given its form | |
CN110443293B (en) | Zero sample image classification method for generating confrontation network text reconstruction based on double discrimination | |
CN109559758B (en) | Method for converting texture image into tactile signal based on deep learning | |
Bleed | Skill matters | |
CN108734138A (en) | A kind of melanoma skin disease image classification method based on integrated study | |
CN111639679A (en) | Small sample learning method based on multi-scale metric learning | |
CN107798349A (en) | A kind of transfer learning method based on the sparse self-editing ink recorder of depth | |
CN101021900A (en) | Method for making human face posture estimation utilizing dimension reduction method | |
CN103235947B (en) | A kind of Handwritten Numeral Recognition Method and device | |
KR102488516B1 (en) | Expansion authentication method of specimen | |
Beltramello et al. | Artistic robotic painting using the palette knife technique | |
CN105917356A (en) | Contour-based classification of objects | |
CN103218825A (en) | Quick detection method of spatio-temporal interest points with invariable scale | |
Bednarek et al. | Gaining a sense of touch object stiffness estimation using a soft gripper and neural networks | |
CN109447996A (en) | Hand Segmentation in 3-D image | |
CN103745233A (en) | Hyper-spectral image classifying method based on spatial information transfer | |
CN106529486A (en) | Racial recognition method based on three-dimensional deformed face model | |
CN104809471A (en) | Hyperspectral image residual error fusion classification method based on space spectrum information | |
Barbhuiya et al. | Alexnet-CNN based feature extraction and classification of multiclass ASL hand gestures | |
CN108108652A (en) | A kind of across visual angle Human bodys' response method and device based on dictionary learning | |
Mateo et al. | 3D visual data-driven spatiotemporal deformations for non-rigid object grasping using robot hands | |
Wang et al. | Improving generalization of deep networks for estimating physical properties of containers and fillings | |
Wang et al. | Accelerometer-based gesture recognition using dynamic time warping and sparse representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |