[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107463952B - Object material classification method based on multi-mode fusion deep learning - Google Patents

Object material classification method based on multi-mode fusion deep learning Download PDF

Info

Publication number
CN107463952B
CN107463952B CN201710599106.1A CN201710599106A CN107463952B CN 107463952 B CN107463952 B CN 107463952B CN 201710599106 A CN201710599106 A CN 201710599106A CN 107463952 B CN107463952 B CN 107463952B
Authority
CN
China
Prior art keywords
tactile
matrix
modality
scale
acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710599106.1A
Other languages
Chinese (zh)
Other versions
CN107463952A (en
Inventor
刘华平
方静
刘晓楠
孙富春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201710599106.1A priority Critical patent/CN107463952B/en
Publication of CN107463952A publication Critical patent/CN107463952A/en
Application granted granted Critical
Publication of CN107463952B publication Critical patent/CN107463952B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention relates to an object material classification method based on multi-mode fusion deep learning, and belongs to the technical field of computer vision, artificial intelligence and material classification. The invention discloses an object material classification method based on multi-mode fusion deep learning, in particular to a multi-mode fusion method based on an ultralimit learning machine with multi-scale local receptive fields. The method fuses perception information (including visual images, touch acceleration signals and touch sound signals) of different modes of the material of the object, and finally realizes correct classification of the material of the object. The method can not only utilize the multi-scale local receptive field to extract high-representation characteristics of the real complex material, but also effectively fuse the information of each mode to realize the information complementation between the modes. The method can improve the robustness and accuracy of the classification of the complex materials, so that the method has greater applicability and universality.

Description

Object material classification method based on multi-mode fusion deep learning
Technical Field
The invention relates to an object material classification method based on multi-mode fusion deep learning, and belongs to the technical field of computer vision, artificial intelligence and material classification.
Background
In the world, the materials are various and can be divided into plastic, metal, ceramic, glass, wood, textile, stone, paper, rubber, foam and the like. Recently, object material classification has attracted great attention in the social environment, the industry, and the academia. For example, the classification of materials can be effectively used for recycling materials; four large columns of packaging material: paper, plastic, metal and glass, require packaging of unusable materials under different market demands. For long-distance transportation without special requirements on transportation quality, paper, paperboard and packaging box paperboard are generally selected; the food package should accord with the hygiene calibration, the package of direct-entry food such as cake and the like should use carton paperboard, and the light-proof and damp-proof use cans such as salt and the like, and the fast food box can be made of natural plant fiber; the reasonable use of decorative materials is the key to the success of interior decoration. Based on the requirements of the above problems, it is necessary to research a set of methods capable of automatically classifying object materials.
The mainstream method for object material classification is to use a visual image containing rich information, but two objects with extremely similar appearances cannot be distinguished only by the visual image. Assume that there are two objects: a red rough paper and a red plastic foil, the visual image has less ability to distinguish between the two objects. However, for the above assumption, the human brain can inherently fuse different modal perception features of the same object, thereby achieving the purpose of classifying the object material. With this heuristic, to enable a computer to implement automatic classification of object material, object material classification may be performed using different modality information of the object at the same time.
There are also currently published techniques for object material classification, such as chinese patent application CN105005787A, a material classification based on joint sparse coding of dexterous hand haptic information. The invention only uses the tactile sequence for material classification, and does not combine the multi-modal information of the material. It was observed that classifying object material using only visual images does not robustly capture material features such as hardness or roughness. It can be assumed that when a rigid tool is dragged or moved over the surface of a different object, the tool will produce vibrations and sounds of different frequencies, and thus tactile information complementary to vision can be used to classify the material of the object. However, how to effectively combine visual modalities with tactile modalities remains a challenging problem.
Disclosure of Invention
The invention aims to provide an object material classification method based on multi-mode fusion deep learning, which is used for realizing multi-mode information fusion object material classification on the basis of an overrun learning machine method based on multi-scale local receptive field so as to improve the robustness and accuracy of classification and effectively fuse various modal information of object materials for material classification.
The invention provides an object material classification method based on multi-mode fusion deep learning, which comprises the following steps:
(1) let the number of training samples be N1The training sample material type is M1Each class of material training sample is marked with a label of
Figure BDA0001356686690000021
Wherein 1 is less than or equal to M1≤N1Separately collecting all N1Visual image I of a training sample1Tactile acceleration A1And a tactile sound S1Establishing an inclusion I1、A1And S1Data set D of1,I1The image size of (2) is 320 × 480;
setting the number of objects to be classified as N2The kind of the material of the object to be classified is M2Each class of object to be classified is labeled as
Figure BDA0001356686690000022
Wherein 1 is less than or equal to M2≤M1Separately collecting all N2Visual image I of an object to be classified2Tactile acceleration A2And a tactile sound S2Establishing an inclusion I2、A2And S2Data set D of2,I2The image size of (2) is 320 × 480;
(2) for the above data set D1And a data set D2The method comprises the following steps of carrying out visual image preprocessing on a visual image, carrying out tactile acceleration preprocessing on a tactile acceleration signal and carrying out tactile sound preprocessing on a tactile sound signal to respectively obtain a visual image, a tactile acceleration spectrogram and a tactile sound spectrogram, and comprises the following steps:
(2-1) image I with image size of 320X 480 by using down-sampling method1And image I2Down-sampling to obtain I1And I2A visual image of size 32 × 32 × 3;
(2-2) separately converting the tactile acceleration A into a plurality of tactile accelerations A by short-time Fourier transform1And tactile sense plusSpeed A2Converting to frequency domain, the window length of Hamming window in short-time Fourier transform is 500, the window offset is 100, the sampling frequency is 10kHz, and the tactile acceleration A is obtained respectively1And tactile acceleration A2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain A1And A2A haptic acceleration spectrum image of size 32 × 32 × 3;
(2-3) separately converting the tactile sounds S by short-time Fourier transform1And a tactile sound S2Converting to frequency domain, with Hamming window length of 500, window offset of 100, and sampling frequency of 10kHz in short-time Fourier transform, respectively obtaining haptic sound S1And a tactile sound S2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain S1And S2A sound spectrum image of size 32 × 32 × 3;
(3) obtaining convolution characteristics of a visual modality, a tactile acceleration modality and a tactile sound modality through multi-scale feature mapping, and comprising the following steps:
(3-1) subjecting the I obtained in the step (2) to1And I2A 32X 3 visual image, A1And A2Magnitude of 32 × 32 × 3 and S1And S2Is input into the first layer of the neural network, namely the input layer, the size of the input image is d x d, the local receptive field in the neural network has psi scale channels, the sizes of the psi scale channels are r1,r2,…,rΨGenerating K different input weights for each scale channel so as to randomly generate psi multiplied by K feature maps, and recording initial weights of a visual image, a tactile acceleration frequency spectrogram and a sound frequency spectrogram of a phi scale channel randomly generated by a neural network as
Figure BDA0001356686690000031
And
Figure BDA0001356686690000032
Figure BDA0001356686690000033
and
Figure BDA0001356686690000034
are respectively composed of
Figure BDA0001356686690000035
And
Figure BDA0001356686690000036
the method comprises the steps of composing column by column, wherein an upper corner mark I represents visual modals of a training sample and an object to be classified, an upper corner mark A represents a tactile acceleration modals of the training sample and the object to be classified, S represents a tactile sound modals of the training sample and the object to be classified,
Figure BDA0001356686690000037
it is shown that the initial weight is,
Figure BDA0001356686690000038
representing the initial weight for generating the zeta-th feature map, phi is more than or equal to 1 and less than or equal to psi, zeta is more than or equal to 1 and less than or equal to K, and the size of the phi-th scale local receptive field is rΦ×rΦ
Figure BDA0001356686690000039
Figure BDA00013566866900000310
And obtaining the size (d-r) of all K characteristic maps of the phi-th scale channelΦ+1)×(d-rΦ+1);
(3-2) initial weight matrix for the phi-th scale channel using singular value decomposition method
Figure BDA00013566866900000311
Performing orthogonalization processing to obtain an orthogonal matrix
Figure BDA00013566866900000312
And
Figure BDA00013566866900000313
Figure BDA0001356686690000041
and
Figure BDA0001356686690000042
each column of
Figure BDA0001356686690000043
And
Figure BDA0001356686690000044
are respectively as
Figure BDA0001356686690000045
And
Figure BDA0001356686690000046
orthogonal basis of (1), input weight of the ζ -th feature map of the Φ -th scale channel
Figure BDA0001356686690000048
Figure BDA0001356686690000049
And
Figure BDA00013566866900000410
are respectively composed of
Figure BDA00013566866900000411
And
Figure BDA00013566866900000412
forming a square matrix;
calculating the convolution characteristics of the nodes (i, j) in the zeta-th feature map of the phi-th scale channel of the visual, tactile acceleration and tactile sound modalities respectively by using the following formula:
Figure BDA00013566866900000413
Φ=1,2,3...,Ψ,
i,j=1,...,(d-rΦ+1),
ζ=1,2,3...,K,
Figure BDA00013566866900000414
and
Figure BDA00013566866900000415
a convolution feature of a node (i, j) of a ζ -th feature graph in a Φ -th scale channel respectively representing a visual modality, a tactile acceleration modality, and a tactile sound modality, x being a matrix corresponding to the node (i, j);
(4) performing multi-scale square root pooling on convolution characteristics of the visual modality, the tactile acceleration modality and the tactile sound modality, wherein the pooling scales have psi scales, and the magnitudes of the psi scales are e1,e2,…,eΨSize of pooling at the phi-th scale eΦIndicating the distance between the pooling center and the edge, the pooling map and the feature map being of the same size and being (d-r)Φ+1)×(d-rΦ+1) calculating the pooling feature from the convolution feature obtained in step (3) using the following formula:
Figure BDA0001356686690000051
p,q=1,...,(d-rΦ+1),
if the node (i, j) is not in (d-r)Φ+1), then
Figure BDA0001356686690000052
And
Figure BDA0001356686690000053
are all zero, and the total number of the active carbon particles is zero,
Φ=1,2,3...,Ψ,
ζ=1,2,3...,K,
wherein,
Figure BDA0001356686690000054
and
Figure BDA0001356686690000055
pooling features of nodes (p, q) of a ζ -th pooling graph in a Φ -th scale channel representing a visual modality, a haptic acceleration modality, and a haptic sound modality, respectively;
(5) obtaining full-connection feature vectors of three modes according to the pooling features, and the method comprises the following steps:
(5-1) connecting all the pooled features of the pooled graphs of the visual image modality, the tactile acceleration modality and the tactile sound modality of the omega training sample in the pooled features of the step (4) into a row vector respectively
Figure BDA0001356686690000056
And
Figure BDA0001356686690000057
wherein omega is more than or equal to 1 and less than or equal to N1
(5-2) traversal of N1Repeating the step (5-1) for each training sample to obtain N1The row vector combination of the visual image modality, the tactile acceleration modality, and the tactile sound modality of the training sample is recorded as:
Figure BDA0001356686690000058
wherein,
Figure BDA0001356686690000059
a matrix of combined feature vectors representing the visual modalities,
Figure BDA00013566866900000510
a matrix of tactile acceleration modal characteristics is represented,
Figure BDA00013566866900000511
representing tactile soundA feature vector matrix of the modality;
(6) the method comprises the following steps of performing multi-mode fusion on fully connected feature vectors of three modes to obtain a multi-mode fused mixing matrix:
(6-1) reacting N in the step (5)1The method comprises the steps of inputting a mixed layer of a visual image modality, a tactile acceleration modality and a tactile sound modality of a training sample in a row vector mode, and performing combination processing to obtain a mixed matrix H ═ HI,HA,HS];
(6-2) adjusting the mixing row vector of each sample in the mixing matrix H in the step (6-1) to generate a multi-mode fused two-dimensional mixing matrix, wherein the size of the two-dimensional mixing matrix is
Figure BDA0001356686690000061
Wherein d' is the length of the two-dimensional matrix and has a value range of
Figure BDA0001356686690000062
(7) Inputting the multi-modal fused mixing matrix obtained in the step (6) into a mixing network layer of a neural network, and obtaining multi-modal mixed convolution characteristics through multi-scale characteristic mapping, wherein the method comprises the following steps:
(7-1) inputting the multi-modal fused mixing matrix obtained in the step (6-2) into a mixing network, wherein the size of the mixing matrix is d 'multiplied by d', the mixing network is provided with psi 'scale channels, and the sizes of the psi' scale channels are r1,r2,…,rΨ'Generating K 'different input weights for each scale channel, thereby randomly generating psi' multiplied by K 'mixed feature maps, and recording phi' scale channel mixed initial weights randomly generated by the mixed network as
Figure BDA0001356686690000063
By
Figure BDA0001356686690000064
Column by column, wherein the superscript hybrid represents a tri-modal fusion,
Figure BDA0001356686690000065
an initial weight of the hybrid network is represented,
Figure BDA0001356686690000066
representing the initial weight for generating the zeta 'th mixed feature map, 1 ≦ phi' ≦ psi ',1 ≦ zeta' ≦ K ', and the size of the phi' th scale channel local receptive field is rΦ'×rΦ'Then, then
Figure BDA0001356686690000067
Figure BDA0001356686690000068
Further, the size of the zeta ' th characteristic diagram of the phi ' th scale channel is obtained as (d ' -r)Φ'+1)×(d”-rΦ'+1);
(7-2) using a singular value decomposition method to initialize a weight matrix for the phi' -th scale channel
Figure BDA0001356686690000069
Performing orthogonalization processing to obtain an orthogonal matrix
Figure BDA00013566866900000610
Each column of
Figure BDA00013566866900000611
Is that
Figure BDA00013566866900000612
The input weight of the ζ 'th feature map of the Φ' th scale channel
Figure BDA00013566866900000613
Is formed by
Figure BDA00013566866900000614
Forming a square matrix;
calculating the convolution node (i ', j') mixed convolution characteristics in the zeta 'th characteristic graph of the phi' th scale channel by using the following formula:
Figure BDA0001356686690000071
Φ'=1,2,3...,Ψ',
i',j'=1,...,(d'-rΦ'+1),
ζ'=1,2,3...,K',
Figure BDA0001356686690000072
is a convolution node (i ', j ') mixed convolution feature in the ζ ' th feature graph of the Φ ' th scale channel, and x ' is a matrix corresponding to the node (i ', j ');
(8) performing mixed multi-scale square root pooling on the mixed convolution characteristics, wherein the pooling scales have psi' scales and the sizes are e respectively1,e2,…,eΨ'The pooling map and the feature map at the phi 'th scale have the same size and are (d' -r)Φ'+1)×(d”-rΦ'+1), calculating the mixed pooling feature according to the mixed convolution feature obtained in the step (7) by using the following formula:
Figure BDA0001356686690000073
p',q'=1,...,(d'-rΦ'+1),
if the node (i ', j ') is not at d ' -rΦ'+1, then
Figure BDA0001356686690000074
The number of the carbon atoms is zero,
Φ'=1,2,3...,Ψ',
ζ'=1,2,3...,K',
wherein,
Figure BDA0001356686690000075
hybrid pooling characteristics of the combined nodes (p ', q') of the ζ 'th pooling map representing the Φ' th scale channel;
(9) repeating the step (5) according to the mixed pooling characteristics, and fully connecting the mixed pooling characteristic vectors with different scales to obtain the mixed pooling characteristic vectorCombined feature matrix to hybrid network
Figure BDA0001356686690000076
Wherein K' represents the number of different characteristic graphs generated by each scale channel;
(10) the combination characteristic matrix H of the hybrid network obtained according to the step (9)hybricUsing the following formula, based on the number N of training samples1Computing training sample output weights for the neural network β:
if it is
Figure BDA0001356686690000081
Then
Figure BDA0001356686690000082
If it is
Figure BDA0001356686690000083
Then
Figure BDA0001356686690000084
Where T is a training sample
Figure BDA0001356686690000085
C is a regularization coefficient and takes an arbitrary value, in one embodiment of the present invention, the value of C is 5, and superscript T represents matrix transposition;
(11) utilizing the orthogonal matrix after the initial weight orthogonalization of the three modes in the step (3)
Figure BDA0001356686690000086
And
Figure BDA0001356686690000087
for the preprocessed data set D to be classified2Obtaining the three-mode mixed feature vector H of the sample to be classified by using the method from the step (3) to the step (9)test
(12) The training sample output weights β from step (10) above and the method aboveThe three-mode mixed feature vector H of the step (11)testCalculating N by using the following formula2Prediction tag mu of sample to be classifiedεRealizes object material classification based on multi-mode fusion deep learning,
με=Htestβ1≤ε≤M。
the object material classification method based on the multi-mode fusion deep learning provided by the invention has the following characteristics and advantages:
1. the ultralimit learning machine method based on the multi-scale local receptive field can sense the material by using the local receptive fields of multiple scales, extract various characteristics and realize the classification of the material of a complex object.
2. The deep learning method of the ultralimit learning machine based on the multi-scale local receptive field can integrate feature learning and image classification, and does not extract features by a manually designed feature extractor, so that the algorithm is suitable for classifying most objects with different materials.
3. The invention discloses an ultralimit learning machine method based on multi-scale local receptive fields, which is a multi-mode fusion deep learning method based on the ultralimit learning machine of the multi-scale local receptive fields, can effectively fuse information of three modes of object materials, realizes information complementation and improves the robustness and accuracy of material classification.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
FIG. 2 is a block diagram of a flow of an ultralimit learning machine based on multi-scale local receptive fields in the method of the present invention.
FIG. 3 is a block diagram of a process of different modality fusion of the ultralimit learning machine method based on multi-scale local receptive fields in the present invention.
Detailed Description
The flow chart of the object material classification method based on the multi-mode fusion deep learning is shown in fig. 1 and mainly comprises a visual image mode, a touch acceleration mode, a touch sound mode and a mixed network. The method comprises the following steps:
(1) let the number of training samples be N1The training sample material type is M1Each class of material training sample is marked with a label of
Figure BDA0001356686690000091
Wherein 1 is less than or equal to M1≤N1Separately collecting all N1Visual image I of a training sample1Tactile acceleration A1And a tactile sound S1Establishing an inclusion I1、A1And S1Data set D of1,I1The image size of (2) is 320 × 480;
setting the number of objects to be classified as N2The kind of the material of the object to be classified is M2Each class of object to be classified is labeled as
Figure BDA0001356686690000092
Wherein 1 is less than or equal to M2≤M1Separately collecting all N2Visual image I of an object to be classified2Tactile acceleration A2And a tactile sound S2Establishing an inclusion I2、A2And S2Data set D of2,I2The image size of (2) is 320 × 480; in which the tactile acceleration A1And A2Is a one-dimensional signal acquired by a sensor when a rigid object slides on the surface of a material, namely a tactile sound S1And S2When the rigid object slides on the surface of the object material, the one-dimensional signal is stored by a microphone;
(2) for the above data set D1And a data set D2The method comprises the following steps of carrying out visual image preprocessing on a visual image, carrying out tactile acceleration preprocessing on a tactile acceleration signal and carrying out tactile sound preprocessing on a tactile sound signal to respectively obtain a visual image, a tactile acceleration spectrogram and a tactile sound spectrogram, and comprises the following steps:
(2-1) image I with image size of 320X 480 by using down-sampling method1And image I2Down-sampling to obtain I1And I2A visual image of size 32 × 32 × 3;
(2-2) separately converting the tactile acceleration A into a plurality of tactile accelerations A by short-time Fourier transform1And tactile acceleration A2Converting to frequency domain, the window length of Hamming window in short-time Fourier transform is 500, the window offset is 100, the sampling frequency is 10kHz, and the tactile acceleration A is obtained respectively1And tactile acceleration A2The first 500 low frequency channels are selected from the spectrogram as a spectral image, which retains most of the energy from the haptic signal, and the spectral image is down-sampled to obtain a1And A2A haptic acceleration spectrum image of size 32 × 32 × 3;
(2-3) separately converting the tactile sounds S by short-time Fourier transform1And a tactile sound S2Converting to frequency domain, with Hamming window length of 500, window offset of 100, and sampling frequency of 10kHz in short-time Fourier transform, respectively obtaining haptic sound S1And a tactile sound S2The first 500 low frequency channels are selected from the spectrogram as a spectral image, which retains most of the energy from the haptic signal, and the spectral image is down-sampled to obtain S1And S2A sound spectrum image of size 32 × 32 × 3;
(3) obtaining convolution characteristics of a visual modality, a tactile acceleration modality and a tactile sound modality through multi-scale feature mapping, and comprising the following steps:
(3-1) subjecting the I obtained in the step (2) to1And I2A 32X 3 visual image, A1And A2Magnitude of 32 × 32 × 3 and S1And S2Is input into the first layer of the neural network, namely the input layer, the size of the input image is d x d, the local receptive field in the neural network has psi scale channels, the sizes of the psi scale channels are r1,r2,…,rΨGenerating K different input weights for each scale channel so as to randomly generate psi multiplied by K feature maps, and randomly generating a visual image, a tactile acceleration frequency spectrogram and a sound frequency spectrogram of a phi scale channel generated by a neural networkIs given as an initial weight
Figure BDA0001356686690000101
And
Figure BDA0001356686690000102
Figure BDA0001356686690000103
and
Figure BDA0001356686690000104
are respectively composed of
Figure BDA0001356686690000105
And
Figure BDA0001356686690000106
the method comprises the steps of composing column by column, wherein an upper corner mark I represents visual modals of a training sample and an object to be classified, an upper corner mark A represents a tactile acceleration modals of the training sample and the object to be classified, S represents a tactile sound modals of the training sample and the object to be classified,
Figure BDA0001356686690000107
it is shown that the initial weight is,
Figure BDA0001356686690000108
representing the initial weight for generating the zeta-th feature map, phi is more than or equal to 1 and less than or equal to psi, zeta is more than or equal to 1 and less than or equal to K, and the size of the phi-th scale local receptive field is rΦ×rΦ
Figure BDA0001356686690000109
Figure BDA00013566866900001010
And obtaining the size (d-r) of all K characteristic maps of the phi-th scale channelΦ+1)×(d-rΦ+1);
(3-2) using a singular value decomposition method for the phi-th rulerInitial weight matrix of degree channel
Figure BDA0001356686690000111
Performing orthogonalization processing to obtain an orthogonal matrix
Figure BDA0001356686690000112
And
Figure BDA0001356686690000113
the orthogonalized input weights can extract more complete features,
Figure BDA0001356686690000114
and
Figure BDA0001356686690000115
each column of
Figure BDA0001356686690000116
And
Figure BDA0001356686690000117
are respectively as
Figure BDA0001356686690000118
And
Figure BDA0001356686690000119
orthogonal basis of (1), input weight of the ζ -th feature map of the Φ -th scale channel
Figure BDA00013566866900001110
Figure BDA00013566866900001111
And
Figure BDA00013566866900001112
are respectively composed of
Figure BDA00013566866900001113
And
Figure BDA00013566866900001114
forming a square matrix;
calculating the convolution characteristics of the nodes (i, j) in the zeta-th feature map of the phi-th scale channel of the visual, tactile acceleration and tactile sound modalities respectively by using the following formula:
Figure BDA00013566866900001115
Φ=1,2,3...,Ψ,
i,j=1,...,(d-rΦ+1),
ζ=1,2,3...,K,
Figure BDA00013566866900001116
and
Figure BDA00013566866900001117
a convolution feature of a node (i, j) of a ζ -th feature graph in a Φ -th scale channel respectively representing a visual modality, a tactile acceleration modality, and a tactile sound modality, x being a matrix corresponding to the node (i, j);
(4) performing multi-scale square root pooling on convolution characteristics of the visual modality, the tactile acceleration modality and the tactile sound modality, wherein the pooling scales have psi scales, and the magnitudes of the psi scales are e1,e2,…,eΨSize of pooling at the phi-th scale eΦThe distance between the pooling center and the edge is shown, and as shown in FIG. 2, the pooling map and the feature map are the same size, and are (d-r)Φ+1)×(d-rΦ+1) calculating the pooling feature from the convolution feature obtained in step (3) using the following formula:
Figure BDA0001356686690000121
p,q=1,...,(d-rΦ+1),
if the node (i, j) is not in (d-r)Φ+1), then
Figure BDA0001356686690000122
And
Figure BDA0001356686690000123
are all zero, and the total number of the active carbon particles is zero,
Φ=1,2,3...,Ψ,
ζ=1,2,3...,K,
wherein,
Figure BDA0001356686690000124
and
Figure BDA0001356686690000125
pooling features of nodes (p, q) of a ζ -th pooling graph in a Φ -th scale channel representing a visual modality, a haptic acceleration modality, and a haptic sound modality, respectively;
(5) obtaining full-connection feature vectors of three modes according to the pooling features, and the method comprises the following steps:
(5-1) connecting all the pooled features of the pooled graphs of the visual image modality, the tactile acceleration modality and the tactile sound modality of the omega training sample in the pooled features of the step (4) into a row vector respectively
Figure BDA0001356686690000126
And
Figure BDA0001356686690000127
wherein omega is more than or equal to 1 and less than or equal to N1
(5-2) traversal of N1Repeating the step (5-1) for each training sample to obtain N1The row vector combination of the visual image modality, the tactile acceleration modality, and the tactile sound modality of the training sample is recorded as:
Figure BDA0001356686690000128
wherein,
Figure BDA0001356686690000129
a matrix of combined feature vectors representing the visual modalities,
Figure BDA00013566866900001210
a matrix of tactile acceleration modal characteristics is represented,
Figure BDA00013566866900001211
a matrix of feature vectors representing haptic sound modalities;
(6) the method comprises the following steps of performing multi-mode fusion on fully connected feature vectors of three modes to obtain a multi-mode fused mixing matrix:
(6-1) reacting N in the step (5)1The method comprises the steps of inputting a mixed layer of a visual image modality, a tactile acceleration modality and a tactile sound modality of a training sample in a row vector mode, and performing combination processing to obtain a mixed matrix H ═ HI,HA,HS];
(6-2) adjusting the mixing row vector of each sample in the mixing matrix H in the step (6-1) to generate a multi-mode fused two-dimensional mixing matrix, wherein the size of the two-dimensional mixing matrix is
Figure BDA0001356686690000131
As shown in fig. 3, wherein d' is the length of the two-dimensional matrix and has a value range of
Figure BDA0001356686690000132
(7) Inputting the multi-modal fused mixing matrix obtained in the step (6) into a mixing network layer of a neural network, and obtaining multi-modal mixed convolution characteristics through multi-scale characteristic mapping, wherein the method comprises the following steps:
(7-1) inputting the multi-modal fused mixing matrix obtained in the step (6-2) into a mixing network, wherein the size of the mixing matrix is d 'multiplied by d', the mixing network is provided with psi 'scale channels, and the sizes of the psi' scale channels are r1,r2,…,rΨ'Generating K 'different input weights for each scale channel, thereby randomly generating psi' multiplied by K 'mixed feature maps, and recording phi' scale channel mixed initial weights randomly generated by the mixed network as
Figure BDA0001356686690000133
By
Figure BDA0001356686690000134
Column by column, wherein the superscript hybrid represents a tri-modal fusion,
Figure BDA0001356686690000135
an initial weight of the hybrid network is represented,
Figure BDA0001356686690000136
representing the initial weight for generating the zeta 'th mixed feature map, 1 ≦ phi' ≦ psi ',1 ≦ zeta' ≦ K ', and the size of the phi' th scale channel local receptive field is rΦ'×rΦ'Then, then
Figure BDA0001356686690000137
Figure BDA0001356686690000138
Further, the size of the zeta ' th characteristic diagram of the phi ' th scale channel is obtained as (d ' -r)Φ'+1)×(d”-rΦ'+1);
(7-2) using a singular value decomposition method to initialize a weight matrix for the phi' -th scale channel
Figure BDA0001356686690000139
Performing orthogonalization processing to obtain an orthogonal matrix
Figure BDA00013566866900001310
The orthogonalized input weights can extract more complete features,
Figure BDA00013566866900001311
each column of
Figure BDA00013566866900001312
Is that
Figure BDA00013566866900001313
The input weight of the ζ 'th feature map of the Φ' th scale channel
Figure BDA00013566866900001314
Is formed by
Figure BDA00013566866900001315
Forming a square matrix;
calculating the convolution node (i ', j') mixed convolution characteristics in the zeta 'th characteristic graph of the phi' th scale channel by using the following formula:
Figure BDA0001356686690000141
Φ'=1,2,3...,Ψ',
i',j'=1,...,(d'-rΦ'+1),
ζ'=1,2,3...,K',
Figure BDA0001356686690000142
is a convolution node (i ', j ') mixed convolution feature in the ζ ' th feature graph of the Φ ' th scale channel, and x ' is a matrix corresponding to the node (i ', j ');
(8) performing mixed multi-scale square root pooling on the mixed convolution characteristics, wherein the pooling scales have psi' scales and the sizes are e respectively1,e2,…,eΨ'The pooling map and the feature map at the phi 'th scale have the same size and are (d' -r)Φ'+1)×(d”-rΦ'+1), calculating the mixed pooling feature according to the mixed convolution feature obtained in the step (7) by using the following formula:
Figure BDA0001356686690000143
p',q'=1,...,(d'-rΦ'+1),
if the node (i ', j ') is not at d ' -rΦ'+1, then
Figure BDA0001356686690000144
The number of the carbon atoms is zero,
Φ'=1,2,3...,Ψ',
ζ'=1,2,3...,K',
wherein,
Figure BDA0001356686690000145
hybrid pooling characteristics of the combined nodes (p ', q') of the ζ 'th pooling map representing the Φ' th scale channel;
(9) repeating the step (5) according to the mixed pooling characteristics, and fully connecting the mixed pooling characteristic vectors with different scales to obtain a combined characteristic matrix of the mixed network
Figure BDA0001356686690000146
Wherein K' represents the number of different characteristic graphs generated by each scale channel;
(10) the combination characteristic matrix H of the hybrid network obtained according to the step (9)hybricUsing the following formula, based on the number N of training samples1Computing training sample output weights for the neural network β:
if it is
Figure BDA0001356686690000151
Then
Figure BDA0001356686690000152
If it is
Figure BDA0001356686690000153
Then
Figure BDA0001356686690000154
Where T is a training sample
Figure BDA0001356686690000155
C is a regularization coefficient and takes an arbitrary value, in one embodiment of the present invention, the value of C is 5, and superscript T represents matrix transposition;
(11) utilizing the orthogonal matrix after the initial weight orthogonalization of the three modes in the step (3)
Figure BDA0001356686690000156
And
Figure BDA0001356686690000157
for the preprocessed data set D to be classified2Obtaining a three-mode mixed feature vector H of the sample to be classifiedtest(ii) a By utilizing the step (3), convolution characteristic vectors of three modes of the object to be classified can be obtained; by utilizing the step (4), the pooling feature vectors of the three modes of the object to be classified can be obtained; by utilizing the step (5), the full-connection feature vectors of the three modes of the object to be classified can be obtained; by utilizing the step (6), a multi-mode fused mixing matrix of the object to be classified can be obtained; by utilizing the step (7), the multi-modal mixed convolution characteristics of the object to be classified can be obtained; by utilizing the step (8), the multi-mode mixed pooling characteristics of the object to be classified can be obtained; by using the step (9), the three-mode mixed feature vector H of the object to be classified can be obtainedtest
(12) According to the training sample output weight β of the step (10) and the tri-modal mixture feature vector H of the step (11)testCalculating N by using the following formula2Prediction tag mu of sample to be classifiedεRealizes object material classification based on multi-mode fusion deep learning,
με=Htestβ 1≤ε≤M。

Claims (1)

1. an object material classification method based on multi-mode fusion deep learning is characterized by comprising the following steps:
(1) let the number of training samples be N1The training sample material type is M1Each class of material training sample is marked with a label of
Figure FDA0002241220340000011
Wherein 1 is less than or equal to M1≤N1Separately collecting all N1Visual image I of a training sample1Tactile acceleration A1And a tactile sound S1Establishing an inclusion I1、A1And S1Data set D of1,I1The image size of (2) is 320 × 480;
setting the number of objects to be classified as N2The kind of the material of the object to be classified is M2Each class of object to be classified is labeled as
Figure FDA0002241220340000012
Wherein 1 is less than or equal to M2≤M1Separately collecting all N2Visual image I of an object to be classified2Tactile acceleration A2And a tactile sound S2Establishing an inclusion I2、A2And S2Data set D of2,I2The image size of (2) is 320 × 480;
(2) for the above data set D1And a data set D2The method comprises the following steps of carrying out visual image preprocessing on a visual image, carrying out tactile acceleration preprocessing on a tactile acceleration signal and carrying out tactile sound preprocessing on a tactile sound signal to respectively obtain a visual image, a tactile acceleration spectrogram and a tactile sound spectrogram, and comprises the following steps:
(2-1) image I with image size of 320X 480 by using down-sampling method1And image I2Down-sampling to obtain I1And I2A visual image of size 32 × 32 × 3;
(2-2) separately converting the tactile acceleration A into a plurality of tactile accelerations A by short-time Fourier transform1And tactile acceleration A2Converting to frequency domain, the window length of Hamming window in short-time Fourier transform is 500, the window offset is 100, the sampling frequency is 10kHz, and the tactile acceleration A is obtained respectively1And tactile acceleration A2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain A1And A2A haptic acceleration spectrum image of size 32 × 32 × 3;
(2-3) separately converting the tactile sounds S by short-time Fourier transform1And a tactile sound S2Conversion to frequency domain, short timeThe window length of Hamming window in Fourier transform is 500, window offset is 100, sampling frequency is 10kHz, and tactile sound S is obtained respectively1And a tactile sound S2The first 500 low-frequency channels are selected from the spectrogram to be used as spectrum images, and the spectrum images are subjected to down-sampling to obtain S1And S2A sound spectrum image of size 32 × 32 × 3;
(3) obtaining convolution characteristics of a visual modality, a tactile acceleration modality and a tactile sound modality through multi-scale feature mapping, and comprising the following steps:
(3-1) subjecting the I obtained in the step (2) to1And I2A 32X 3 visual image, A1And A2Magnitude of 32 × 32 × 3 and S1And S2The size of the input image is d × d × 3, the local receptive field in the neural network has Ψ scale channels, and the Ψ scale channels have r sizes respectively1,r2,…,rΨGenerating K different input weights for each scale channel so as to randomly generate psi multiplied by K feature maps, and recording initial weights of a visual image, a tactile acceleration frequency spectrogram and a sound frequency spectrogram of a phi scale channel randomly generated by a neural network as
Figure FDA0002241220340000021
And
Figure FDA0002241220340000022
Figure FDA0002241220340000023
and
Figure FDA0002241220340000024
are respectively composed of
Figure FDA0002241220340000025
And
Figure FDA0002241220340000026
the method comprises the steps of composing column by column, wherein an upper corner mark I represents visual modals of a training sample and an object to be classified, an upper corner mark A represents a tactile acceleration modals of the training sample and the object to be classified, S represents a tactile sound modals of the training sample and the object to be classified,
Figure FDA0002241220340000027
it is shown that the initial weight is,
Figure FDA0002241220340000028
representing the initial weight for generating the zeta-th feature map, phi is more than or equal to 1 and less than or equal to psi, zeta is more than or equal to 1 and less than or equal to K, and the size of the phi-th scale local receptive field is rΦ×rΦ
Figure FDA0002241220340000029
Figure FDA00022412203400000210
And obtaining the size (d-r) of all K characteristic maps of the phi-th scale channelΦ+1)×(d-rΦ+1);
(3-2) initial weight matrix for the phi-th scale channel using singular value decomposition method
Figure FDA00022412203400000211
Performing orthogonalization processing to obtain an orthogonal matrix
Figure FDA00022412203400000212
And
Figure FDA00022412203400000213
Figure FDA00022412203400000214
and
Figure FDA00022412203400000215
each column of
Figure FDA00022412203400000216
And
Figure FDA00022412203400000217
are respectively as
Figure FDA00022412203400000218
Orthogonal basis of (1), input weight of the ζ -th feature map of the Φ -th scale channel
Figure FDA00022412203400000219
Figure FDA00022412203400000220
And
Figure FDA00022412203400000221
are respectively composed of
Figure FDA00022412203400000222
And
Figure FDA00022412203400000223
forming a square matrix;
calculating the convolution characteristics of the nodes (i, j) in the zeta-th feature map of the phi-th scale channel of the visual, tactile acceleration and tactile sound modalities respectively by using the following formula:
Figure FDA00022412203400000224
Figure FDA00022412203400000225
and
Figure FDA00022412203400000226
a convolution feature of a node (i, j) of a ζ -th feature graph in a Φ -th scale channel respectively representing a visual modality, a tactile acceleration modality, and a tactile sound modality, x being a matrix corresponding to the node (i, j);
(4) performing multi-scale square root pooling on convolution characteristics of the visual modality, the tactile acceleration modality and the tactile sound modality, wherein the pooling scales have psi scales, and the magnitudes of the psi scales are e1,e2,…,eΨSize of pooling at the phi-th scale eΦIndicating the distance between the pooling center and the edge, the pooling map and the feature map being of the same size and being (d-r)Φ+1)×(d-rΦ+1) calculating the pooling feature from the convolution feature obtained in step (3) using the following formula:
Figure FDA0002241220340000031
if node i is not (0, (d-r)Φ+1)), node j is not at (0, (d-r)Φ+1)), then
Figure FDA0002241220340000032
And
Figure FDA0002241220340000033
are all zero, and the total number of the active carbon particles is zero,
Φ=1,2,3...,Ψ,
ζ=1,2,3...,K,
wherein,
Figure FDA0002241220340000034
and
Figure FDA0002241220340000035
pooling features of nodes (p, q) of a ζ -th pooling graph in a Φ -th scale channel representing a visual modality, a haptic acceleration modality, and a haptic sound modality, respectively;
(5) obtaining full-connection feature vectors of three modes according to the pooling features, and the method comprises the following steps:
(5-1) connecting all the pooled features of the pooled graphs of the visual image modality, the tactile acceleration modality and the tactile sound modality of the omega training sample in the pooled features of the step (4) into a row vector respectively
Figure FDA0002241220340000036
And
Figure FDA0002241220340000037
wherein omega is more than or equal to 1 and less than or equal to N1
(5-2) traversal of N1Repeating the step (5-1) for each training sample to obtain N1The row vector combination of the visual image modality, the tactile acceleration modality, and the tactile sound modality of the training sample is recorded as:
Figure FDA0002241220340000038
wherein,
Figure FDA0002241220340000041
a matrix of combined feature vectors representing the visual modalities,
Figure FDA0002241220340000042
a matrix of tactile acceleration modal characteristics is represented,
Figure FDA0002241220340000043
a matrix of feature vectors representing haptic sound modalities;
(6) the method comprises the following steps of performing multi-mode fusion on fully connected feature vectors of three modes to obtain a multi-mode fused mixing matrix:
(6-1) reacting N in the step (5)1The method comprises the steps of inputting a mixed layer of a visual image modality, a tactile acceleration modality and a tactile sound modality of a training sample in a row vector mode, and performing combination processing to obtain a mixed matrix H ═ HI,HA,HS];
(6-2) adjusting the mixing row vector of each sample in the mixing matrix H in the step (6-1) to generate a multi-mode fused two-dimensional mixing matrix, wherein the size of the two-dimensional mixing matrix is d' × d ",
Figure FDA0002241220340000044
wherein d' is the length of the two-dimensional matrix and has a value range of
Figure FDA0002241220340000045
(7) Inputting the multi-modal fused mixing matrix obtained in the step (6) into a mixing network layer of a neural network, and obtaining multi-modal mixed convolution characteristics through multi-scale characteristic mapping, wherein the method comprises the following steps:
(7-1) inputting the multi-modal fused mixing matrix obtained in the step (6-2) into a mixing network, wherein the size of the mixing matrix is d 'multiplied by d', the mixing network is provided with psi 'scale channels, and the sizes of the psi' scale channels are r1,r2,…,rΨ'Generating K 'different input weights for each scale channel, thereby randomly generating psi' multiplied by K 'mixed feature maps, and recording phi' scale channel mixed initial weights randomly generated by the mixed network as
Figure FDA0002241220340000046
Figure FDA00022412203400000414
By
Figure FDA0002241220340000047
Column by column, wherein the superscript hybrid represents a tri-modal fusion,
Figure FDA0002241220340000048
an initial weight of the hybrid network is represented,
Figure FDA0002241220340000049
representing the initial weight for generating the zeta 'th mixed feature map, 1 ≦ phi' ≦ psi ',1 ≦ zeta' ≦ K ', and the size of the phi' th scale channel local receptive field is rΦ'×rΦ'Then, then
Figure FDA00022412203400000412
Figure FDA00022412203400000413
Further, the size of the zeta ' th characteristic diagram of the phi ' th scale channel is obtained as (d ' -r)Φ'+1)×(d”-rΦ'+1);
(7-2) using a singular value decomposition method to initialize a weight matrix for the phi' -th scale channel
Figure FDA0002241220340000051
Performing orthogonalization processing to obtain an orthogonal matrix
Figure FDA0002241220340000052
Figure FDA00022412203400000512
Each column of
Figure FDA0002241220340000053
Is that
Figure FDA0002241220340000054
The input weight of the ζ 'th feature map of the Φ' th scale channel
Figure FDA0002241220340000055
Is formed by
Figure FDA0002241220340000056
Forming a square matrix;
calculating the convolution node (i ', j') mixed convolution characteristics in the zeta 'th characteristic graph of the phi' th scale channel by using the following formula:
Figure FDA0002241220340000057
Figure FDA0002241220340000058
is a convolution node (i ', j ') mixed convolution feature in the ζ ' th feature graph of the Φ ' th scale channel, and x ' is a matrix corresponding to the node (i ', j ');
(8) performing mixed multi-scale square root pooling on the mixed convolution characteristics, wherein the pooling scales have psi' scales and the sizes are e respectively1,e2,…,eΨ'The pooling map and the feature map at the phi 'th scale have the same size and are (d' -r)Φ'+1)×(d”-rΦ'+1), calculating the mixed pooling feature according to the mixed convolution feature obtained in the step (7) by using the following formula:
Figure FDA0002241220340000059
if node i 'is not (0, (d' -r)Φ’+1)), node j 'is not (0, (d' -r)Φ’+1)), then
Figure FDA00022412203400000510
The number of the carbon atoms is zero,
Φ'=1,2,3...,Ψ',
ζ'=1,2,3...,K';
wherein,
Figure FDA00022412203400000511
hybrid pooling characteristics of the combined nodes (p ', q') of the ζ 'th pooling map representing the Φ' th scale channel;
(9) according to the mixed pooling characteristics, adopting the method of the step (5) to fully connect the mixed pooling characteristic vectors with different scales to obtain a combined characteristic matrix of the mixed network
Figure FDA0002241220340000061
Wherein K' represents the number of different characteristic graphs generated by each scale channel;
(10) the combination characteristic matrix H of the hybrid network obtained according to the step (9)hybricUsing the following formula, based on the number N of training samples1Computing training sample output weights for the neural network β:
if it is
Figure FDA0002241220340000062
Then
Figure FDA0002241220340000063
If it is
Figure FDA0002241220340000064
Then
Figure FDA0002241220340000065
Where T is a training sample
Figure FDA0002241220340000066
C is a regularization coefficient, the value is an arbitrary value, and superscript T represents matrix transposition;
(11) utilizing the orthogonal matrix after the initial weight orthogonalization of the three modes in the step (3)
Figure FDA0002241220340000067
And
Figure FDA0002241220340000068
for the preprocessed data set D to be classified2Obtaining a combined feature matrix H of the mixed network of the samples to be classified by using the methods from the step (3) to the step (9)test
(12) According to the training sample output weight β of the step (10) and the combined feature matrix H of the mixed network of the samples to be classified of the step (11)testBy usingN is calculated by the following formula2Prediction tag mu of sample to be classifiedεRealizes object material classification based on multi-mode fusion deep learning,
με=Htestβ 1≤ε≤M2
CN201710599106.1A 2017-07-21 2017-07-21 Object material classification method based on multi-mode fusion deep learning Active CN107463952B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710599106.1A CN107463952B (en) 2017-07-21 2017-07-21 Object material classification method based on multi-mode fusion deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710599106.1A CN107463952B (en) 2017-07-21 2017-07-21 Object material classification method based on multi-mode fusion deep learning

Publications (2)

Publication Number Publication Date
CN107463952A CN107463952A (en) 2017-12-12
CN107463952B true CN107463952B (en) 2020-04-03

Family

ID=60546004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710599106.1A Active CN107463952B (en) 2017-07-21 2017-07-21 Object material classification method based on multi-mode fusion deep learning

Country Status (1)

Country Link
CN (1) CN107463952B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734210B (en) * 2018-05-17 2021-10-15 浙江工业大学 Object detection method based on cross-modal multi-scale feature fusion
CN108846375B (en) * 2018-06-29 2019-06-18 山东大学 A kind of multi-modal Cooperative Study method and device neural network based
CN109190638A (en) * 2018-08-09 2019-01-11 太原理工大学 Classification method based on the online order limit learning machine of multiple dimensioned local receptor field
EP3620978A1 (en) * 2018-09-07 2020-03-11 Ibeo Automotive Systems GmbH Method and device for classifying objects
CN109447124B (en) * 2018-09-28 2019-11-19 北京达佳互联信息技术有限公司 Image classification method, device, electronic equipment and storage medium
CN109508740B (en) * 2018-11-09 2019-08-13 郑州轻工业学院 Object hardness identification method based on Gaussian mixed noise production confrontation network
CN109902585B (en) * 2019-01-29 2023-04-07 中国民航大学 Finger three-mode fusion recognition method based on graph model
CN110020596B (en) * 2019-02-21 2021-04-30 北京大学 Video content positioning method based on feature fusion and cascade learning
CN110659427A (en) * 2019-09-06 2020-01-07 北京百度网讯科技有限公司 City function division method and device based on multi-source data and electronic equipment
CN110942060B (en) * 2019-10-22 2023-05-23 清华大学 Material identification method and device based on laser speckle and modal fusion
CN110909637A (en) * 2019-11-08 2020-03-24 清华大学 Outdoor mobile robot terrain recognition method based on visual-touch fusion
CN111028204B (en) * 2019-11-19 2021-10-08 清华大学 Cloth defect detection method based on multi-mode fusion deep learning
CN110861853B (en) * 2019-11-29 2021-10-19 三峡大学 Intelligent garbage classification method combining vision and touch
CN111590611B (en) * 2020-05-25 2022-12-02 北京具身智能科技有限公司 Article classification and recovery method based on multi-mode active perception
CN113111902B (en) * 2021-01-02 2024-10-15 大连理工大学 Pavement material identification method based on voice and image multi-mode collaborative learning
CN112893180A (en) * 2021-01-20 2021-06-04 同济大学 Object touch classification method and system considering friction coefficient abnormal value elimination
CN113780460A (en) * 2021-09-18 2021-12-10 广东人工智能与先进计算研究院 Material identification method and device, robot, electronic equipment and storage medium
CN114723963B (en) * 2022-04-26 2024-06-04 东南大学 Task action and object physical attribute identification method based on visual touch signal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715260A (en) * 2015-03-05 2015-06-17 中南大学 Multi-modal fusion image sorting method based on RLS-ELM
CN105512609A (en) * 2015-11-25 2016-04-20 北京工业大学 Multi-mode fusion video emotion identification method based on kernel-based over-limit learning machine
CN105956351A (en) * 2016-07-05 2016-09-21 上海航天控制技术研究所 Touch information classified computing and modelling method based on machine learning
CN106874961A (en) * 2017-03-03 2017-06-20 北京奥开信息科技有限公司 A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field
WO2017100903A1 (en) * 2015-12-14 2017-06-22 Motion Metrics International Corp. Method and apparatus for identifying fragmented material portions within an image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715260A (en) * 2015-03-05 2015-06-17 中南大学 Multi-modal fusion image sorting method based on RLS-ELM
CN105512609A (en) * 2015-11-25 2016-04-20 北京工业大学 Multi-mode fusion video emotion identification method based on kernel-based over-limit learning machine
WO2017100903A1 (en) * 2015-12-14 2017-06-22 Motion Metrics International Corp. Method and apparatus for identifying fragmented material portions within an image
CN105956351A (en) * 2016-07-05 2016-09-21 上海航天控制技术研究所 Touch information classified computing and modelling method based on machine learning
CN106874961A (en) * 2017-03-03 2017-06-20 北京奥开信息科技有限公司 A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Learning for Surface Material Classification Using Haptic and Visual Information;Haitian Zheng et al.;《IEEE TRANSACTIONS ON MULTIMEDIA》;20161130;第2407-2416页 *
Multi-Modal Local Receptive Field Extreme Learning Machine for Object Recognition;Fengxue Li et al.;《2016 International Joint Conference on Neural Networks (IJCNN)》;20161103;第1696-1701页 *
基于神经网络的三维模型视觉特征分析;韦伟;《计算机工程与应用》;20080721;第174-178页 *

Also Published As

Publication number Publication date
CN107463952A (en) 2017-12-12

Similar Documents

Publication Publication Date Title
CN107463952B (en) Object material classification method based on multi-mode fusion deep learning
Myers et al. Affordance detection of tool parts from geometric features
Dering et al. A convolutional neural network model for predicting a product's function, given its form
CN110443293B (en) Zero sample image classification method for generating confrontation network text reconstruction based on double discrimination
CN109559758B (en) Method for converting texture image into tactile signal based on deep learning
Bleed Skill matters
CN108734138A (en) A kind of melanoma skin disease image classification method based on integrated study
CN111639679A (en) Small sample learning method based on multi-scale metric learning
CN107798349A (en) A kind of transfer learning method based on the sparse self-editing ink recorder of depth
CN101021900A (en) Method for making human face posture estimation utilizing dimension reduction method
CN103235947B (en) A kind of Handwritten Numeral Recognition Method and device
KR102488516B1 (en) Expansion authentication method of specimen
Beltramello et al. Artistic robotic painting using the palette knife technique
CN105917356A (en) Contour-based classification of objects
CN103218825A (en) Quick detection method of spatio-temporal interest points with invariable scale
Bednarek et al. Gaining a sense of touch object stiffness estimation using a soft gripper and neural networks
CN109447996A (en) Hand Segmentation in 3-D image
CN103745233A (en) Hyper-spectral image classifying method based on spatial information transfer
CN106529486A (en) Racial recognition method based on three-dimensional deformed face model
CN104809471A (en) Hyperspectral image residual error fusion classification method based on space spectrum information
Barbhuiya et al. Alexnet-CNN based feature extraction and classification of multiclass ASL hand gestures
CN108108652A (en) A kind of across visual angle Human bodys' response method and device based on dictionary learning
Mateo et al. 3D visual data-driven spatiotemporal deformations for non-rigid object grasping using robot hands
Wang et al. Improving generalization of deep networks for estimating physical properties of containers and fillings
Wang et al. Accelerometer-based gesture recognition using dynamic time warping and sparse representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant