[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113673567B - Panorama emotion recognition method and system based on multi-angle sub-region self-adaption - Google Patents

Panorama emotion recognition method and system based on multi-angle sub-region self-adaption Download PDF

Info

Publication number
CN113673567B
CN113673567B CN202110816786.4A CN202110816786A CN113673567B CN 113673567 B CN113673567 B CN 113673567B CN 202110816786 A CN202110816786 A CN 202110816786A CN 113673567 B CN113673567 B CN 113673567B
Authority
CN
China
Prior art keywords
emotion
feature
sub
module
panorama
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110816786.4A
Other languages
Chinese (zh)
Other versions
CN113673567A (en
Inventor
青春美
黄容
徐向民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110816786.4A priority Critical patent/CN113673567B/en
Publication of CN113673567A publication Critical patent/CN113673567A/en
Application granted granted Critical
Publication of CN113673567B publication Critical patent/CN113673567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/047Fisheye or wide-angle transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a panorama emotion recognition method and system based on multi-angle sub-region self-adaption, wherein the panorama emotion recognition method and system comprise a multi-angle rotation module, a feature extraction module, a sub-region self-adaption module, a multi-scale fusion module and an emotion classification module, wherein the multi-angle rotation module, the feature extraction module, the sub-region self-adaption module, the multi-scale fusion module and the emotion classification module are used for predicting user emotion recognition in an immersive virtual environment. A series of equidistant columnar projection panoramic images are generated by utilizing a spherical multi-angle rotation algorithm, and characteristic advantages of different layers are obtained by inputting a convolutional neural network. And guiding local features through global features, adaptively establishing relevance between current scale context features, and capturing global and local context dependencies of feature graphs of different layers. And (3) up-sampling the feature graphs of different layers, and splicing in the channel dimension to realize feature fusion and obtain the emotion classification labels of the users. The invention can correctly predict the emotion preference and distribution of the user in various scenes and promote the user experience under VR.

Description

Panorama emotion recognition method and system based on multi-angle sub-region self-adaption
Technical Field
The invention relates to the field of emotion recognition, in particular to a panorama emotion recognition method and system based on multi-angle sub-region self-adaption.
Background
Emotion is a psychological and physiological state, accompanied by cognitive and conscious processes, and the study of human emotion and cognition is a high-level stage of artificial intelligence. With the explosive development of artificial intelligence and deep learning, it is possible to build emotion models with the ability to perceive, recognize and understand human emotion. The intelligent, sensitive and friendly feedback capability of the machine to the emotion of the user is provided, so that a natural environment where people are harmonious is finally built, and the attractive prospect guides a new direction for future application of the computer.
The traditional emotion induction has modes of pictures, words, voice, video and the like, and the actual prediction effect of the corresponding emotion recognition data set is not satisfactory. The virtual reality technology achieves the aim of emotion induction through immersive vivid three-dimensional experience, and is a better emotion induction element. In recent years, deep learning has revolutionized in practice, but in emotion interaction, emotion tag data based on a virtual reality induced state is rare, and an effective emotion research method and model are lacking. The panorama is a storage form of all-dimensional and real space information on a two-dimensional plane, and can be used as an effective material for analyzing emotion of the VR immersive virtual environment.
Disclosure of Invention
In order to overcome the defects and shortcomings of the prior art, the invention provides a panorama emotion recognition system and method based on multi-angle sub-region self-adaption.
According to the invention, through the display characteristics of panoramic content in the head-mounted display and the equidistant columnar projection mode, a spherical multi-angle rotation algorithm is designed to obtain panoramic images at different angles, and the panoramic images are combined with the self-adaptive context convolutional neural network, so that the accuracy of emotion classification labels is effectively improved.
The invention adopts the following technical scheme:
a panorama emotion recognition method based on multi-angle sub-region self-adaption comprises the following steps:
a multi-angle rotation step: the conversion from a three-dimensional omnibearing stereoscopic view to a two-dimensional plane panoramic view is realized by spherical multi-angle rotation and equidistant columnar projection;
and a feature extraction step: performing feature extraction on the two-dimensional plane panorama by using a pre-training convolutional neural network model to obtain feature maps of different layers;
and a sub-region self-adaption step: inputting feature graphs of different levels, searching for the relevance between the global and the local, adaptively establishing the context features of the current scale, and capturing the context dependence between the global and the local of the feature graphs of different levels;
a multi-scale fusion step: unifying the sizes of the feature graphs of different layers through an up-sampling step, and splicing the feature graphs in the channel dimension to realize multi-scale feature fusion;
and emotion classification: and determining target emotion according to the advantages of different layers of features, and outputting a corresponding emotion label.
Further, the spherical multi-angle rotation specifically includes:
establishing a three-dimensional spherical coordinate system with the head of the user as the sphere center, and firstly projecting a 360-degree panoramic image presented by the user under the head-mounted display onto the surface of the sphere;
rotating the projection graph according to the content distribution characteristics of the panoramic graph;
the rotation comprises horizontal rotation and vertical rotation, and the horizontal rotation realizes that the edge content cut at two sides rotates to an intermediate main vision area; vertical rotation achieves rotation of the bipolar severely distorted content to near the equator.
Further, the equidistant cylindrical projection is to map warp lines to vertical lines with constant spacing, map weft lines to horizontal lines with constant spacing, and project a three-dimensional stereoscopic view equidistant cylinder to a two-dimensional panorama.
Further, the three-dimensional spherical coordinates are a right-hand coordinate system, the field angle is 90 degrees, and the binocular direct vision direction of the user is taken as a horizontal axis, so that the center coordinates of the front vision port are [0, 0]; the center coordinate of the right viewing port is [90,0,0]; the center coordinate of the rear view port is [180,0,0]; the center coordinate of the left view port is [ -90,0,0]; the center coordinate of the upper view port is [0,90,0]; the center coordinates of the lower view port are [0, -90,0]; corresponding to the six faces of the cube tangent to the sphere.
Further, the feature extraction step specifically includes:
inputting the two-dimensional panorama into a pre-trained convolutional neural network, extracting the hierarchical structures of different universal feature spaces of the visual world, and forming a feature vector set [ X ] 1 ,X 2 ,...,X l ]Each element in the collection represents a feature map of the current hierarchy.
Further, the sub-region self-adaptation step comprises two branches, namely a sub-region content representation branch and an emotion contribution degree representation branch;
the sub-region content representation branch obtains a sub-region content representation y through self-adaptive average pooling operation of a feature map with the input size of h multiplied by w multiplied by c s Wherein h, w, c, s respectively representThe height, width, channel number and preset size of the feature map;
the emotion contribution degree characterizes branches, and specifically comprises the following steps:
for feature vector set [ X ] 1 ,X 2 ,...,X l ]Each element in the list is subjected to global pooling to obtain a global information representation g (X l );
Global information characterization g (X) using broadcast mechanism l ) Residual connection is realized by adding the residual connection with the input characteristic diagram element by element, and the number of channels is converted into s through convolution operation of 1x1 2 Thereby constructing a size of hw×s 2 Is an adaptive emotion contribution matrix a s
Will adapt emotion contribution degree matrix a s Representation y of sub-region content s Multiplying to obtain a context feature characterization vector Z l The vector represents each pixel point i and each sub-area y s×s Is a degree of association of (a) with each other.
Further, the adaptive averaging pooling divides the input feature map into sxs subregions, resulting in a set of subregions representing Y s×s =[y 1 ,y 2 ,...,y s×s ]The feature map with the size of sxsxc is deformed into s 2 X c sub-region content characterization y s
Further, constructing an emotion contribution degree matrix a s The method comprises the following specific steps: set sub-area y s×s The contribution degree to the emotion classification label at the point i of the feature map is a i Then any point of the feature map corresponds to s×s emotion contribution degree vectors a i Form a collectionDeformation to obtain emotion contribution degree matrix a s The size of the catalyst is hw×s 2
Further, the multi-scale fusion step specifically comprises the following steps: by means of up-sampling operation, such as deconvolution or interpolation operation, multi-scale feature graphs of different layers are achieved, the sizes are uniform, the feature fusion is completed by splicing in the channel dimension, and finally the multi-scale feature graph with the size of H multiplied by W multiplied by C (C) 1 +C 2 +...+C l ) Is combined with the high-level semantic information representation.
A system for realizing a panorama emotion recognition method based on multi-angle sub-region self-adaption comprises:
a multi-angle rotation module: the method is used for realizing conversion from a three-dimensional panoramic view to a two-dimensional panoramic view through multi-angle rotation and equidistant columnar projection;
and the feature extraction module is used for: the method is used for extracting the characteristics of the two-dimensional panoramic image to obtain characteristic images of different levels;
a subregion self-adaption module: the method is used for correlating the areas with consistent emotion classification labels, the global features guide local features to adaptively establish the correlation of the context features of the current scale, and long-distance dependence is captured;
a multi-scale fusion module: the method is used for unifying the sizes of the feature graphs of different layers and splicing the feature graphs in the channel dimension to realize multi-scale feature fusion;
and an emotion classification module: and determining target emotion according to the advantages of different layers of features, and outputting a corresponding emotion label.
The invention has the following beneficial effects:
1. aiming at the problem of sparse emotion tag data in a virtual reality induction state, a spherical multi-angle rotation algorithm is provided to realize data enhancement. And establishing a three-dimensional spherical coordinate system for the 360-degree view in the virtual environment of the user, respectively carrying out equidistant columnar projection after rotating the sphere along different coordinate axes to obtain an expanded data sample, and effectively improving the generalization capability of the model.
2. Equidistant columnar projection projects warp and weft to a rectangular plane equidistantly, so that serious distortion and distortion of panoramic content occur at the upper pole and the lower pole. The data sample expanded by the spherical multi-angle rotation algorithm can keep rotation invariance, and rotate the edge information at two sides to a central main visual area while relieving distortion, so that the content characteristics can be better captured and extracted by the emotion model, and the model identification accuracy is improved.
3. And extracting different layers of features of the panorama by using the pre-trained convolutional neural network, and playing the complementary advantages of the bottom layer detail information and the high-level semantic information. And guiding local features through global features, adaptively establishing the relevance between different areas or objects of the feature map, and capturing long-distance dependence. Therefore, the prediction performance of the model on the panoramic image emotion induction area is effectively improved.
4. The invention fills the blank in the field of panorama emotion recognition, is beneficial to interpretation and feedback collection of user emotion in an immersive virtual environment, and is critical to development of VR application scenes such as user behavior prediction, VR scene modeling and the like.
Drawings
FIG. 1 is a flow chart of a general implementation method of the present invention.
FIG. 2 is a schematic diagram of a user wearing a display in a virtual environment.
Fig. 3 (a) and 3 (b) are respectively three-dimensional spherical coordinates and a two-dimensional plan view after projection.
FIG. 4 is a schematic diagram of the effect of the multi-angle rotation algorithm rotated 180 degrees along the x-axis.
Fig. 5 is a schematic diagram of a sub-region adaptive module of the present invention.
FIG. 6 is a schematic diagram of a model framework of the overall implementation of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
Examples
As shown in fig. 1, a panorama emotion recognition method based on multi-angle sub-region self-adaption is used for recognizing and predicting user emotion in an immersive virtual environment, and comprises the following steps:
the multi-angle rotation module, for an interactive 360 degree view presented to a user of an immersive virtual environment, acquires a series of data expansion samples using a spherical multi-angle rotation algorithm as shown in fig. 2. And the warp yarns are mapped into vertical lines with constant intervals by utilizing equidistant columnar projection, and the weft yarns are mapped into horizontal lines with constant intervals, so that the conversion from the three-dimensional omnibearing stereoscopic view to the two-dimensional plane panoramic view is completed.
HMD in fig. 2 represents a head mounted display.
The spherical multi-angle rotation algorithm specifically comprises the following steps: and establishing a three-dimensional Cartesian coordinate system with the head of the user as the sphere center. The spheres are sequentially rotated by a certain angle along the horizontal axis, so that the object which is seriously distorted at two poles is rotated to the vicinity of the equator at multiple angles to improve distortion. And simultaneously, the sphere is sequentially rotated by a certain angle along a vertical axis, and the edge contents cut at two sides are rotated to a central main visual area.
The multi-angle rotation algorithm is used for enabling the region of the panoramic image inducing emotion to rotate to a position close to the equator of the main view according to the content distribution characteristics of the panoramic image, reducing adverse effects caused by distortion projection, and facilitating capturing of relevant features by the model.
The rotation comprises horizontal rotation and vertical rotation, and the horizontal rotation realizes that the edge content cut at two sides rotates to an intermediate main vision area; vertical rotation achieves rotation of the bipolar severely distorted content to near the equator.
Further, the spherical multi-angle rotation algorithm specifically comprises the following steps:
a three-dimensional spherical coordinate system with the head of the user as the origin o is constructed to conform to the right-hand coordinate system, as shown in fig. 3 (a). The spherical body is rotated by 90 degrees along the horizontal direction by using a spherical multi-angle rotation algorithm and repeated for 2 times, so that the cut edge contents at the two sides are rotated to the middle main vision area, and the method is shown in fig. 4. And then the sphere is rotated for 45 degrees along the vertical direction and repeated for 4 times, and the object with the severely distorted original two poles is rotated to the vicinity of the equator to improve the distortion. Each panorama gives results of 2x4=8 data enhancements.
The height of the panorama is H, the width of the panorama is W, the coordinates of any point on the plane are (u, v), the corresponding three-dimensional spherical coordinate points are (x, y, z), and the longitude and latitude values areThe relationship between longitude and latitude and the spherical coordinates is as follows:
the conversion formula of the same point in the three-dimensional space and the two-dimensional plane is as follows:
the warp yarns are mapped to vertical lines of constant pitch and the weft yarns are mapped to horizontal lines of constant pitch, as shown in fig. 3 (b).
In the emotion recognition field, due to the limitation of distortion of content in the panorama ERP storage format, in order to facilitate model capture of relevant features, a multi-angle algorithm needs to rotate an object or region inducing emotion to a position of a main view, which is close to the equator, so that the object or region is projected to the center of a two-dimensional plane through equidistant rectangles. However, the rotation angles required by different panoramic views are different, and personalized customization of each panoramic view is impractical by manpower. Typically, the above requirement is basically achieved by rotating the sphere horizontally by 90 degrees, repeating 2 times, and rotating the sphere by 45 degrees along the x-axis, repeating 4 times, and obtaining 2×4=8 results for each panorama.
And the feature extraction module is used for realizing feature extraction by using a convolutional neural network which is pre-trained on a large-scale image classification task. For the input image I, equation X is utilized l =f(Σk l ·X l-1 +b l ) Extracting the hierarchical structure of different feature spaces universal to the visual world to form a feature map vector set [ X ] 1 ,X 2 ,...,X l ]. Wherein k is l X is the convolution kernel of the first layer l-1 B for the l-1 layer output feature map l Is a bias term. Each element in the set represents the characteristic diagram of the current level, and serves as the input of the sub-area self-adaptive module to exert the complementary advantages of information of different levels.
And the sub-region self-adaption module is used for adaptively establishing the context characteristics of the current scale by searching the global and local relevance as shown in fig. 5 and capturing the global and local context dependencies of the feature graphs of different layers. The module consists of two branches of a sub-region content representation branch and an emotion contribution degree representation branch, and specifically comprises the following steps:
sub-region content characterization branch pair feature vector set [ X ] 1 ,X 2 ,...,X l ]Each element in the (c) is subjected to adaptive average pooling, and an adaptive average pooling function is defined as follows:
kernel_size=(input_size+2×padding)-(output_size-1)×stride
i.e. the input size, the output size, the boundary filling and the step size of the movement determine the size of the current convolution kernel. Feature map X with size of h multiplied by w multiplied by c l Converted into sxsxc where h, w, c, s represent the height, width, number of channels, and preset size of the feature map, respectively. Then the adaptive averaging pooling divides the input feature map into sxs subregions, resulting in a set of subregions representing Y s×s =[y 1 ,y 2 ,...,y s×s ]. Transforming a feature map of size sxsxc to s 2 X c sub-region content characterization y s
Emotion contribution degree characterization branch pair feature vector set [ X ] 1 ,X 2 ,...,X l ]Each element in the list is subjected to global average pooling to obtain a global information representation g (X l ). And adding the 1 multiplied by c global information representation with the input feature map pixel by using a broadcasting mechanism to realize residual connection, so as to obtain the feature map with the size of h multiplied by w multiplied by c.
Set sub-area y s×s The contribution degree to the emotion classification label at the point i of the feature map is a i Converting the number of channels to s by a convolution operation of 1x1 2 Then any point of the feature map corresponds to s×s emotion contribution degree vectors a i Form a collectionDeformation to obtain a size hw×s 2 Self-adaptive emotion contribution degree matrix a s
Emotion contribution matrix a for representing emotion contribution degree to branch output s Sub-region content representation y branching from sub-region content representation s Multiplied, the function is defined as follows:
obtaining a context feature characterization vector Z l The vector represents each pixel point i and each sub-area y s×s Is the degree of association of its implicit emotion contribution vector A i The global and local connection weights are characterized and automatically optimized with continuous iteration of the network.
Further, the dependency refers to a correlation between two or more emotion subjects. The feature extraction module can identify different areas or objects, such as emotion principals and cats, by utilizing global and local features of the panorama, but the feature extraction module is insufficient as a standard for emotion prediction. It is also necessary to adaptively establish the association between a person and a cat through a sub-region adaptation module, where the person is teasing or stroking the cat, thereby giving the correct positive emotion label.
And the multi-scale fusion module is used for realizing feature fusion of the feature graphs of different layers. The up-sampling operation is utilized to realize the dimension unification of the feature images of different layers, then the feature images with the unification dimension are spliced in the channel dimension, and finally the size H multiplied by W multiplied by (C) 1 +C 2 +...+C l ) Is combined with the high-level semantic information characterization.
And the emotion classification module can realize higher emotion classification effect on the panoramic image containing the remarkable main body and the panoramic image not containing the remarkable main body. Due to the parameter redundancy of the full connection layer, the global average pooling is utilized to replace the full connection layer to play the role of a classifier. And carrying out emotion recognition on the panorama with the remarkable main body by using deep features which pay more attention to abstract semantic information. The panoramic view without significant subjects is emotion identified using shallow features that provide detail perception information about edges, fringes, and colors. And obtaining the emotion classification label with higher accuracy, wherein the overall framework of the model is shown in fig. 6.
The features extracted by the different layers of convolution operations of the feature extraction module are different, and the lower-layer convolution of conv layers_1, 2 and the like extracts visual layer features such as colors, textures, contours and the like, and the higher-layer convolution of conv layers 4,5 and the like extracts object layer features and concept layer features, namely abstract semantic information. Predicting emotion areas of different/same panoramic images needs to combine the characteristic advantages of different layers, and if the panoramic image content is a single straight white natural scene, the bottom color and texture information are key for correct classification; if the panorama content is a complex multi-object interaction scenario, then the higher-level semantic information is important. The sub-region self-adaptive module is beneficial to better capturing the emotion induction region by establishing the relevance between different regions of the feature map and the object, so that a correct emotion label is given.
In this embodiment, the feature extraction module extracts 4 layers of feature maps of conv layers_2, 3,4,5, and at the same time, the feature map of each layer is sent to the sub-region adaptive module, and establishes the relevance of different regions under different scales s=1, 2,4, n (S is set to be not limited, and the combination effect of 1,2,4 is generally the best). Because the feature images of different levels are different in size, the feature images are firstly unified in size through a multi-scale fusion module, then all the feature images are spliced in the channel dimension, the spliced total features are used as the basis of emotion classification, and finally the emotion polarity of the input panoramic image, namely positive or negative, is obtained.
The embodiments described above are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the embodiments described above, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principles of the present invention should be made in the equivalent manner, and are included in the scope of the present invention.

Claims (6)

1. A panorama emotion recognition method based on multi-angle sub-region self-adaption is characterized by comprising the following steps:
a multi-angle rotation step: the conversion from a three-dimensional omnibearing stereoscopic view to a two-dimensional plane panoramic view is realized by spherical multi-angle rotation and equidistant columnar projection;
and a feature extraction step: extracting features of the two-dimensional panorama by using a pre-training model to obtain feature maps of different levels;
and a sub-region self-adaption step: inputting feature graphs of different levels, searching for the relevance between the global and the local, adaptively establishing the context features of the current scale, and capturing the context dependence between the global and the local of the feature graphs of different levels;
a multi-scale fusion step: splicing the feature graphs of different layers on the channel dimension to realize multi-scale feature fusion;
and emotion classification: determining target emotion according to the advantages of different layers of features, and outputting a corresponding emotion label;
the spherical multi-angle rotation specifically comprises the following steps:
establishing a three-dimensional spherical coordinate system with the head of the user as the sphere center, and firstly projecting a 360-degree panoramic image presented by the user under the head-mounted display onto the surface of the sphere;
rotating the projection graph according to the content distribution characteristics of the panoramic graph;
the rotation comprises horizontal rotation and vertical rotation, and the horizontal rotation realizes that the edge content cut at two sides rotates to an intermediate main vision area; the vertical rotation realizes the rotation of the severely distorted content of the two poles to the vicinity of the equator;
the equidistant cylindrical projection is to map warp lines into vertical lines with constant spacing, map weft lines into horizontal lines with constant spacing, and project the equidistant cylinders of the three-dimensional stereoscopic view into the two-dimensional panoramic view;
the sub-region self-adaptation step comprises two branches, namely a sub-region content representation branch and an emotion contribution degree representation branch;
the sub-region content representation branch obtains a sub-region content representation y through self-adaptive average pooling operation of a feature map with the input size of h multiplied by w multiplied by c s Wherein h, w, c and s respectively represent the height, width, channel number and preset size of the feature map;
the emotion contribution degree characterizes branches, and specifically comprises the following steps:
for feature vector set [ X ] 1 ,X 2 ,...,X l ]Each element in the list is subjected to global pooling to obtain a global information representation g (X l );
Global information characterization g (X) using broadcast mechanism l ) Residual connection is realized by adding the residual connection with the input characteristic diagram element by element, and the number of channels is converted into s through convolution operation of 1x1 2 Thereby constructing a size of hw×s 2 Is an adaptive emotion contribution matrix a s
Will adapt emotion contribution degree matrix a s Representation y of sub-region content s Multiplying to obtain a context feature characterization vector Z l The vector represents each pixel point i and each sub-area y s×s Is a degree of association of (a) with each other.
2. The panorama emotion recognition method according to claim 1, wherein the three-dimensional spherical coordinates are a right-hand coordinate system, the angle of view is 90 degrees, and the center coordinates of the front view port are [0, 0] with the binocular direct view direction of the user as the horizontal axis; the center coordinate of the right viewing port is [90,0,0]; the center coordinate of the rear view port is [180,0,0]; the center coordinate of the left view port is [ -90,0,0]; the center coordinate of the upper view port is [0,90,0]; the center coordinates of the lower view port are [0, -90,0]; corresponding to the six faces of the cube tangent to the sphere.
3. The panorama emotion recognition method according to claim 1, wherein the feature extraction step specifically comprises:
inputting the two-dimensional panorama into a pre-trained convolutional neural network, extracting the hierarchical structures of different universal feature spaces of the visual world, and forming a feature vector set [ X ] 1 ,X 2 ,...,X l ]Each element in the collection represents a feature map of the current hierarchy.
4. The panorama emotion recognition method according to claim 1, wherein a size hw x s is constructed 2 Is an adaptive emotion contribution matrix a s The method specifically comprises the following steps: set sub-area y s×s The contribution degree to the emotion classification label at the point i of the feature map is a i Converting the number of channels to s by a convolution operation of 1x1 2 Then the arbitrary point of the feature map corresponds to s×s emotion contribution degreesQuantity a i Form a collectionDeformation to obtain a size hw×s 2 Self-adaptive emotion contribution degree matrix a s
5. The panorama emotion recognition method according to claim 1, wherein the adaptive averaging pooling divides the input feature map into sxs sub-regions, resulting in a set of sub-region representations Y s×s =[y 1 ,y 2 ,...,y s×s ]The feature map with the size of sxsxc is deformed into s 2 X c sub-region content characterization y s
6. A system for implementing a panorama emotion recognition method based on multi-angle sub-region adaptation according to any one of claims 1-5, comprising:
a multi-angle rotation module: the method is used for realizing conversion from a three-dimensional panoramic view to a two-dimensional panoramic view through multi-angle rotation and equidistant columnar projection;
and the feature extraction module is used for: extracting features of the two-dimensional panoramic image to obtain feature images of different levels, and capturing the overall and local context dependence of the feature images;
a subregion self-adaption module: the areas with consistent emotion classification labels are associated with each other, and the context characteristics of the current scale are built in a self-adaptive mode by searching for the association between the global and the local;
a multi-scale fusion module: splicing the feature graphs in the channel dimension, and carrying out multi-scale feature fusion;
and an emotion classification module: and determining target emotion according to the advantages of different layers of features, and outputting a corresponding emotion label.
CN202110816786.4A 2021-07-20 2021-07-20 Panorama emotion recognition method and system based on multi-angle sub-region self-adaption Active CN113673567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110816786.4A CN113673567B (en) 2021-07-20 2021-07-20 Panorama emotion recognition method and system based on multi-angle sub-region self-adaption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110816786.4A CN113673567B (en) 2021-07-20 2021-07-20 Panorama emotion recognition method and system based on multi-angle sub-region self-adaption

Publications (2)

Publication Number Publication Date
CN113673567A CN113673567A (en) 2021-11-19
CN113673567B true CN113673567B (en) 2023-07-21

Family

ID=78539860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110816786.4A Active CN113673567B (en) 2021-07-20 2021-07-20 Panorama emotion recognition method and system based on multi-angle sub-region self-adaption

Country Status (1)

Country Link
CN (1) CN113673567B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201970A (en) * 2021-11-23 2022-03-18 国家电网有限公司华东分部 Method and device for capturing power grid scheduling event detection based on semantic features
CN114827749A (en) * 2022-04-21 2022-07-29 应急管理部天津消防研究所 Method for seamless switching and playing of multi-view panoramic video

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506722A (en) * 2017-08-18 2017-12-22 中国地质大学(武汉) One kind is based on depth sparse convolution neutral net face emotion identification method
CN111832620A (en) * 2020-06-11 2020-10-27 桂林电子科技大学 Image emotion classification method based on double-attention multilayer feature fusion
CN112784764A (en) * 2021-01-27 2021-05-11 南京邮电大学 Expression recognition method and system based on local and global attention mechanism
CN112800875A (en) * 2021-01-14 2021-05-14 北京理工大学 Multi-mode emotion recognition method based on mixed feature fusion and decision fusion
CN113011504A (en) * 2021-03-23 2021-06-22 华南理工大学 Virtual reality scene emotion recognition method based on visual angle weight and feature fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506722A (en) * 2017-08-18 2017-12-22 中国地质大学(武汉) One kind is based on depth sparse convolution neutral net face emotion identification method
CN111832620A (en) * 2020-06-11 2020-10-27 桂林电子科技大学 Image emotion classification method based on double-attention multilayer feature fusion
CN112800875A (en) * 2021-01-14 2021-05-14 北京理工大学 Multi-mode emotion recognition method based on mixed feature fusion and decision fusion
CN112784764A (en) * 2021-01-27 2021-05-11 南京邮电大学 Expression recognition method and system based on local and global attention mechanism
CN113011504A (en) * 2021-03-23 2021-06-22 华南理工大学 Virtual reality scene emotion recognition method based on visual angle weight and feature fusion

Also Published As

Publication number Publication date
CN113673567A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN110599605B (en) Image processing method and device, electronic equipment and computer readable storage medium
Henderson et al. Unsupervised object-centric video generation and decomposition in 3D
CN111563502A (en) Image text recognition method and device, electronic equipment and computer storage medium
CN112954292B (en) Digital museum navigation system and method based on augmented reality
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
WO2022197431A1 (en) Methods and systems for personalized 3d head model deformation
US11823415B2 (en) 3D pose estimation in robotics
CN113673567B (en) Panorama emotion recognition method and system based on multi-angle sub-region self-adaption
CN113822965A (en) Image rendering processing method, device and equipment and computer storage medium
Li et al. Three-dimensional traffic scenes simulation from road image sequences
Karakottas et al. 360 surface regression with a hyper-sphere loss
CN117333645A (en) Annular holographic interaction system and equipment thereof
CN117557714A (en) Three-dimensional reconstruction method, electronic device and readable storage medium
CN116097316A (en) Object recognition neural network for modeless central prediction
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN115115805A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
Yao et al. Neural Radiance Field-based Visual Rendering: A Comprehensive Review
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
WO2021151380A1 (en) Method for rendering virtual object based on illumination estimation, method for training neural network, and related products
CN116740261A (en) Image reconstruction method and device and training method and device of image reconstruction model
CN117635801A (en) New view synthesis method and system based on real-time rendering generalizable nerve radiation field
CN117115398A (en) Virtual-real fusion digital twin fluid phenomenon simulation method
CN117079313A (en) Image processing method, device, equipment and storage medium
CN116994148A (en) Building identification method, device and equipment based on building identification model
CN115393471A (en) Image processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant