CN115022617A - Video quality evaluation method based on electroencephalogram signal and space-time multi-scale combined network - Google Patents
Video quality evaluation method based on electroencephalogram signal and space-time multi-scale combined network Download PDFInfo
- Publication number
- CN115022617A CN115022617A CN202210601991.3A CN202210601991A CN115022617A CN 115022617 A CN115022617 A CN 115022617A CN 202210601991 A CN202210601991 A CN 202210601991A CN 115022617 A CN115022617 A CN 115022617A
- Authority
- CN
- China
- Prior art keywords
- space
- distortion
- network
- scale
- electroencephalogram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000000605 extraction Methods 0.000 claims description 22
- 230000000638 stimulation Effects 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 16
- 238000013528 artificial neural network Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 230000008447 perception Effects 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 6
- 230000002457 bidirectional effect Effects 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims description 2
- 239000012634 fragment Substances 0.000 abstract description 7
- 230000008901 benefit Effects 0.000 abstract description 6
- 230000009471 action Effects 0.000 abstract description 3
- 210000004556 brain Anatomy 0.000 abstract description 3
- 230000004438 eyesight Effects 0.000 abstract description 3
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000000737 periodic effect Effects 0.000 abstract description 2
- 238000004088 simulation Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 230000007547 defect Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 4
- 230000016776 visual perception Effects 0.000 description 2
- 238000010420 art technique Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000003334 potential effect Effects 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013522 software testing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
Abstract
The invention provides a video quality evaluation method based on an electroencephalogram signal and a space-time multi-scale combined network, which comprises the following implementation steps: acquiring distortion videos with different distortion levels; collecting an electroencephalogram signal; intercepting an electroencephalogram signal segment; generating a training sample set by using the electroencephalogram signal fragments; constructing a space-time multi-scale combined network; training a spatio-temporal multiscale combined network; and evaluating the video quality by adopting a trained spatio-temporal multi-scale joint network. The time-space multi-scale combined network constructed by the invention can effectively learn electroencephalogram signal characteristics, and solves the problem that the brain action mechanism of sensing electroencephalogram signals by video quality and the periodic time-domain sensing characteristic of human vision are not considered in the prior art, so that the method has the advantages of high accuracy, capability of automatically processing electroencephalogram data in batches and high efficiency.
Description
Technical Field
The invention belongs to the technical field of image processing, and further relates to a video quality evaluation method based on an electroencephalogram signal and space-time multi-scale combined network in the technical field of image video quality evaluation. The method can be used for analyzing the electroencephalogram signals collected in the process of observing the video to obtain the quality evaluation corresponding to the video quality.
Background
The compression and imperfect transmission of video inevitably cause distortion, which affects the appearance of video, so that the evaluation of video quality becomes an important and general problem. The quality evaluation of video can be divided into objective quality evaluation and subjective quality evaluation. The objective quality evaluation method is to obtain the quality score of the video by establishing a mathematical model simulating the process of perceiving the video by human eyes. The method can be realized by depending on software and has the advantages of batch processing, reproducible result and low processing cost. However, it is not yet determined whether the quality score obtained by the objective method calculation model can represent the perceived quality of a human watching video in reality. Subjective quality assessment usually requires the subject to determine whether they can detect the distortion or to grade the intensity of the distortion, however, this method is time-consuming, labor-intensive, and dependent on subjective judgment, is susceptible to the strategy and bias of the subject. The electroencephalogram, as a non-invasive electrophysiological device, can directly acquire electroencephalogram signals reflecting nerve potential activity through a head surface electrode, and is further used for video quality evaluation, so that the electroencephalogram is a simple, safe and reliable method. The method overcomes the defects that the objective method can not fully reflect the subjective perception quality, and the subjective method is long in time consumption and high in cost, and has important theoretical significance and practical value for obtaining the real video perception quality.
The patent document "video quality evaluation method based on electroencephalogram signals and space-time distortion" (patent application number: CN202010341014.5, publication number: CN111510710A) applied by the university of Western' an electronic technology discloses a video quality evaluation method based on electroencephalogram signals and space-time distortion. The method comprises the steps of firstly, selecting a time-space distortion video of water surface fluctuation, and taking the video as visual excitation; then acquiring continuous electroencephalogram signals and subjective evaluation, and calculating the subjective evaluation detection rate; and finally, segmenting the electroencephalogram signals, classifying the segmented electroencephalogram signals, and calculating the accuracy of classification of the electroencephalogram signals so as to evaluate the video quality. Although the method has the advantages that the video quality evaluation result is more consistent with the human subjective evaluation and the evaluation result is more accurate, the method has the defects that the characteristics of the electroencephalogram signal are obtained by performing dimension reduction processing on the electroencephalogram signal, only the time domain characteristics of the electroencephalogram signal in a single scale are considered, the time-space domain characteristics for representing visual perception components in the electroencephalogram signal are not considered, the extracted characteristics are not representative enough, the classification result is influenced, and the quality evaluation result is inaccurate.
A research method for Video Distortion Perception Under different Content Motion Conditions Based on electroencephalogram signals is disclosed in a published paper document 'An EEG-Based Study on Perception of Video Distortion Under Video Distortion Content Motion Conditions' of Qinghua university. Firstly, recording electroencephalogram signals when a subject watches distorted videos, and selecting a P300 component caused by quality change of human perception videos as an index of human perception distortion according to characteristic analysis of the electroencephalogram signals; through classification based on linear discriminant analysis, finding that the separability of the P300 component is in positive correlation with the perceptibility of distortion; the regression analysis result shows that S-shaped quantitative relation exists between the perceptibility of distortion and the separability of the P300 component; based on the relation, the electroencephalogram signals are utilized to calibrate distortion perception threshold values corresponding to different motion speed contents. The method has the defects that the linear discriminant classifier is a traditional machine learning algorithm, and electroencephalogram characteristics need to be manually extracted before classification of electroencephalogram signals by adopting linear discriminant analysis, so that time and labor are wasted, and newly acquired electroencephalogram signals cannot be efficiently processed.
Disclosure of Invention
The invention aims to provide a video quality evaluation method based on electroencephalogram signals and a space-time multi-scale combined network, aiming at overcoming the defects of the prior art, solving the problems that effective features cannot be extracted due to the fact that the brain action mechanism of the electroencephalogram signals for video quality perception and the periodic time domain perception characteristics of human vision are not considered in the prior art, further the classification result is influenced, the quality evaluation result is inaccurate, and solving the problems that the features need to be manually extracted when the traditional machine is used for learning, and the efficiency is low in the prior art.
The specific idea for realizing the method is that aiming at the problem that the video quality evaluation result is inaccurate due to the limitation of the existing electroencephalogram-based video quality evaluation method, distorted videos with different levels are generated, electroencephalograms of subjects watching the distorted videos with different distortion levels are collected, electroencephalogram fragments of the subjects watching the distorted videos with different distortion levels are intercepted, the electroencephalogram fragments are marked to generate a training set, the training set is used for training the constructed space-time multi-scale combined network, and then the trained space-time multi-scale combined network is used for evaluating the video quality to obtain the video quality evaluation result with higher accuracy.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) acquiring distortion videos with different distortion levels;
(2) acquiring electroencephalogram signals of a subject when watching distorted videos with different distortion levels;
(3) intercepting electroencephalogram signal segments of a subject when watching distorted videos with different distortion levels;
(4) generating a training sample set and a testing sample set by utilizing the electroencephalogram signal segments;
(5) constructing a space-time multi-scale combined network;
(6) training a space-time multi-scale joint network;
(7) and evaluating the video quality by adopting a trained spatio-temporal multi-scale joint network.
Compared with the prior art, the invention has the following advantages:
firstly, aiming at the characteristic that the electroencephalogram signals have time and space information, the time-space multi-scale joint network is constructed, multi-scale features are extracted from the time and space information of the electroencephalogram signals, the defect that only electroencephalogram signal time-domain features are considered in an electroencephalogram signal video quality evaluation model in the prior art is overcome, the electroencephalogram signal features capable of representing visual perception components can be learned by the time-space multi-scale joint network, the classification accuracy is improved, and the electroencephalogram signal classification method has the advantage of high accuracy.
Secondly, the multi-scale deep neural network provided by the invention is an end-to-end model, the result of video quality evaluation can be obtained by inputting the preprocessed electroencephalogram signals, the defect that the characteristics need to be manually extracted before classification of the electroencephalogram signals by using the traditional method is avoided, the complexity of operation is reduced, the electroencephalogram data can be automatically processed in batches, and the multi-scale deep neural network has the advantage of high efficiency.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a schematic structural diagram of a multi-scale spatiotemporal feature extraction network constructed by the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
The implementation steps of the present invention are further described with reference to fig. 1 and the embodiment.
Step 1, obtaining distortion videos with different distortion levels;
step 1.1, in the embodiment of the present invention, a "Frozen words" ("Frozen world") unit intercepted from an Our Planet 2019 (Our Planet 2019) of a documentary is selected, and a video with a total of 150 frames and a duration of 5s is intercepted from 31 minutes and 29 seconds of the unit to serve as an undistorted stimulus video in the embodiment of the present invention, and the resolution of the video is 1928 × 1080;
step 1.2, in the embodiment of the present invention, the Quality parameter of the VideoWriter tool of the MATLAB software is adjusted to {7,14,20,30,100}, and five parameters of the Quality correspond to five distortion parameters representing different distortion levels, wherein when the distortion parameter is equal to 100, undistorted is represented;
step 1.3, inputting 150 frames of undistorted stimulus videos into MATLAB software, outputting distorted video stimuli after image compression of 60 th to 89 th frames in 150 frames by using a VideoWriter tool, and obtaining 5 space-time distorted videos corresponding to distortion parameters as a distorted stimulus video set;
step 2, acquiring electroencephalogram signals of a subject when watching distorted videos with different distortion levels:
and acquiring continuous electroencephalograms generated by each tested person in the process of watching the distorted stimulus video set by using a 64-bit Neuroscan electroencephalograph to obtain an electroencephalogram signal set comprising electroencephalogram signal samples of each tested person watching the distorted stimulus video set each time. In the embodiment of the invention, 8 testees are selected, each tested person watches each video for 40 times, the sampling frequency is 1000Hz, and the number of sampling channels is 64, so that each tested person has 200 electroencephalogram signal samples, wherein 40 electroencephalogram signal samples correspond to undistorted videos, and 160 electroencephalogram samples correspond to distorted videos. Therefore, 1600 electroencephalogram signal samples are tested, wherein 320 electroencephalogram signal samples correspond to undistorted video, and 1280 electroencephalogram samples correspond to distorted video;
step 3, intercepting electroencephalogram signal segments of a subject when the subject watches distorted videos with different distortion levels:
the method comprises the steps that the acquired electroencephalogram signals pass through band-pass filters with lower limit and upper limit of cut-off frequency being 0.2Hz and 30Hz respectively to obtain single electroencephalogram signals after band-pass filtering, and single electroencephalogram signal fragments of the filtered single electroencephalogram signals are intercepted from 200ms before the occurrence time of the distorted stimulation to 1000ms after the occurrence time of the distorted stimulation;
step 4, generating a training set and a testing set by utilizing the electroencephalogram signal segments:
step 4.1, taking 75% of samples in a single electroencephalogram signal fragment set of each subject as a training sample set, and taking the rest electroencephalogram signal fragments as a test sample set;
4.2, labeling single electroencephalogram signal segments in the training sample set and the test sample set, wherein the single electroencephalogram signal segment is used for watching undistorted videos, and the label of the single electroencephalogram signal segment is set to be 1; the single electroencephalogram signal segment comes from watching a distorted video, and the label of the distorted video is set to be 0;
4.3, combining single electroencephalogram signal segments and labels thereof in a training sample set to generate a training set, wherein in the embodiment of the invention, each training set sample in the training set is used as the input of a space-time multi-scale joint network model to train the network model; combining single electroencephalogram signal fragments and labels thereof in a test sample set to generate a test set, wherein in the embodiment of the invention, each test sample in the test set is used as the input of a quality evaluation model after model training is finished to detect and predict distortion;
step 5, constructing a space-time multi-scale combined network:
the embodiment of the invention builds a space-time multi-scale combined network formed by connecting a space-time multi-scale feature extraction sub-network and a space-time multi-scale feature fusion sub-network in series, wherein the space-time multi-scale feature extraction sub-network is formed by connecting a time domain feature extraction module and a space domain feature extraction module in parallel.
The structure of the spatio-temporal multi-scale federated network constructed by the present invention is further described with reference to FIG. 2.
Step 5.1, a time domain feature extraction module is set up, the time domain feature extraction module is formed by connecting two sub-modules in series, the first sub-module comprises three parallel bidirectional LSTM layers, and the second sub-module is formed by connecting a feature splicing layer, a full connection layer and a ReLU activation layer in series in sequence;
setting parameters of each layer of a first module in the time domain feature extraction sub-network:
the stacking layer number of the first to third bidirectional LSTM layers is set to be 1, and the number of nodes of the hidden layer is set to be 128;
setting parameters of each layer of a second module in the time domain feature extraction sub-network:
the characteristic splicing layer is realized by adopting a concat function;
setting the output dimension of the BN layer to be 6;
setting the number of the neurons of the full connection layer as 2;
the method comprises the following steps that a ReLU activation layer is realized by adopting a ReLU function, and an infionce parameter of the ReLU activation layer is set to True;
step 5.2, a space domain feature extraction module is built, the space domain feature extraction module is formed by connecting two sub-modules in series, the first sub-module is formed by connecting three convolution layers in parallel, and the second sub-module sequentially has the following structure: the characteristic splicing layer, the BN layer, the full connecting layer and the ReLU activation layer are connected in series;
setting parameters of each layer of a first submodule in the airspace feature extraction module as follows:
setting the sizes of convolution kernels in the first convolution layer, the second convolution layer and the third convolution layer to be 64 multiplied by 1, 32 multiplied by 1 and 8 multiplied by 1 respectively, setting the number of the convolution kernels to be 6 respectively, and setting the step length to be 1, 32 and 8 respectively;
setting parameters of each layer of a second submodule in the airspace feature extraction module as follows:
setting the output dimension of the BN layer to be 6;
setting the number of neurons in the full connecting layer to be 2;
the method comprises the following steps that a ReLU activation layer is realized by adopting a ReLU function, and a parameter inplace of the ReLU activation layer is set to True;
the Dropout layer is realized by Dropout, and the parameter drop _ rate of the Dropout layer is set to be 0.05;
step 5.3, building a space-time multi-scale feature fusion sub-network, wherein the structure of the sub-network sequentially comprises the following steps: the device comprises a characteristic splicing layer, a full connection layer, a ReLU activation function layer, a Dropout layer and a full connection layer;
setting parameters of each layer of the spatio-temporal multi-scale feature fusion sub-network:
setting the neuron parameter of the first layer full connecting layer to be 2;
the ReLU activation layer is realized by adopting a ReLU, and the parameter inplace of the ReLU activation layer is set to True;
the Dropout layer is realized by Dropout, and the parameter drop _ rate of the Dropout layer is set to be 0.05;
setting the neuron parameter of the second layer full-link layer to be 2;
step 6, training a space-time multi-scale combined network:
step 6.1, initializing time-space multi-scale combined network parameters: the regularization coefficient is set to 1 × 10 -5 The initial learning rate is set to 2 × 10 -4 The learning rate attenuation rate decreases 1/10 for every 50 increments of the number of iterations, the number of iterations is set to 150, the batch size is set to 64, the attenuation weight is set to 2 × 10 -3 ;
Step 6.2, inputting the training set into a space-time multi-scale combined network, and iteratively updating the parameters of the space-time multi-scale combined network by using a back propagation method until a loss function is converged to obtain the trained space-time multi-scale combined network;
the loss function is as follows:
wherein,representing a cross entropy loss function, n representing the total number of classification results after all samples in the training set are input into the multi-scale joint neural network, Σ representing summation operation, i representing the serial number of the multi-scale joint neural network classification result corresponding to the ith sample in the training set, y (i) representing the real value of the multi-scale joint neural network ith classification result corresponding to the ith sample in the training set, and log representing logarithm operation with 10 as a base,the predicted value of the ith classification result of the multi-scale joint neural network corresponding to the ith sample in the training set is represented, | | represents the operation of taking the absolute value,denotes the introduced L1 regularization term, k denotes the L1 regularization coefficient,ω i The weight value of the ith classification result of the multi-scale joint neural network corresponding to the ith sample in the training set is represented;
and 7, evaluating the video quality by adopting a trained space-time multi-scale combined network:
step 7.1, inputting the test set of the EEG signal segments with distortion labels into a trained space-time multi-scale combined network, judging each classification data output by the space-time multi-scale combined network, if the output is 1, evaluating that the video corresponding to the EEG signal segments is undistorted stimulation video, and if the output is 0, evaluating that the video corresponding to the EEG signal segments is distorted stimulation video;
and 7.2, counting the classification results of the space-time multi-scale joint network on all the electroencephalogram signal segments according to whether the judgment result of the space-time multi-scale joint network is consistent with the actual condition. And obtaining the classification result of the distorted stimulation video and the undistorted stimulation video corresponding to the electroencephalogram according to the label of each electroencephalogram signal segment, and taking the classification result as the video quality evaluation result.
The implementation process of the present invention is further described below with reference to simulation experiments:
1, simulation conditions:
the hardware test platform of the simulation experiment is as follows: the CPU is Intel (R) core (TM) i7-8700, the dominant frequency is 3.2GHz, the memory is 16GB, and the GPU is NVIDIA GeForce GT 710.
The software testing platform of the simulation experiment is as follows: the system comprises a Widows7 operating system, professional electroencephalogram acquisition and analysis software Curry7, a psychological experiment operating platform E-Prime 2.0 and mathematical software MATLAB R2019 a.
2, simulation content and result analysis:
the simulation experiment of the invention is to adopt the method of the invention and two prior arts to classify the electroencephalogram signals respectively and calculate the average accuracy of classification.
Two prior art techniques are employed:
prior art 1 refers to a method for classifying EEG signals proposed in Vernon J Lawow et al published article "EEGNet: a compact connected neural network for EEG-based brain-computer interfaces".
Prior art 2 refers to a method for classifying electroencephalograms proposed by Robin Tibor Schirrmeister et al in its published paper "Deep learning with a connected neural networks for EEG decoding and visualization".
The experimental process of the simulation experiment for collecting the electroencephalogram signals of 8 testees repeatedly watching the distorted stimulation video and the undistorted stimulation video comprises the following steps: firstly, the distortion levels of stimulation videos are selected to be 7,14,20,30 and 100, after undistorted stimulation videos watched by a subject are played to the 59 th frame, 60 to 89 frames of distortion stimulation videos generated by the five distortion levels are played, and then electroencephalogram signals of the subject are collected. The experimental process for collecting the electroencephalogram signals consists of four computer screen interfaces. The first interface is an introduction interface, and requirements of simulation experiments of the invention are introduced in the interface. The second interface is a black screen interface, which inserts a white "+" sign in the middle of the black background to separate each experimental video from each other. The third interface is a video presentation interface which presents a section of stimulation video converted from undistorted to distorted. And after the third interface is presented, returning to the second interface to prepare for converting undistorted video into distorted video next time. The stimulation video corresponding to each distortion level is played repeatedly 40 times, and the presentation sequence is random. And the fifth interface is an end interface, and the fifth interface enters the end interface after all the distorted stimulation videos are presented.
The process of classifying the collected electroencephalogram signals in the simulation experiment of the invention is as follows: firstly, the acquired electroencephalogram signals are subjected to reference conversion, baseline correction, filtering and segmentation, and single electroencephalogram signal segments are extracted. Secondly, the segmented electroencephalogram signals are respectively input into a time domain convolution network and a space domain bidirectional long-short term memory network, and the time domain characteristics and the space domain characteristics of the single electroencephalogram signal segment are obtained. And then, inputting the time domain characteristics of the single electroencephalogram signal segment and the space domain characteristics into a characteristic fusion module to obtain space-time characteristics, performing full connection operation on the space-time characteristics twice, outputting a classification result by using a SoftMax function, performing gradient analysis on errors by using a cross entropy loss function, and performing parameter optimization on the space-time multi-scale combined network by using a small-batch random gradient descent method to achieve the purpose of training the space-time multi-scale combined network. And finally, classifying the electroencephalogram signals by using the trained space-time multi-scale joint network, and calculating the accuracy.
And (3) evaluating the classification results of the three methods by using two evaluation indexes (classification precision of each type and average precision AA). The classification accuracy of the average accuracy AA was calculated using the following formula, and all results are plotted in table 1:
TABLE 1
As shown in the table 1, the average classification accuracy AA of the method is 87.5%, and the index is higher than 2 prior art methods, so that the method can obtain more accurate electroencephalogram signals for classification.
The above simulation experiments show that: the method can extract the multi-scale space-time characteristics of the electroencephalogram signals by utilizing the built space-time multi-scale combined network, solves the problems that effective characteristics cannot be extracted due to the fact that the brain action mechanism of video quality perception electroencephalogram signals and the staged time domain perception characteristics of human vision are not considered in the prior art, further classification results are influenced, and quality evaluation results are inaccurate, and is a very practical video quality evaluation method.
Claims (4)
1. A video quality evaluation method based on an electroencephalogram signal and space-time multi-scale combined network is characterized in that a training set and a test set are generated according to electroencephalogram signal segments corresponding to a distorted stimulus video, and a multi-scale space-time feature extraction network is constructed and trained; the method of evaluation comprises the steps of:
step 1, generating distortion videos with different distortion levels;
selecting a natural video with a period of time of at least 4s and a frame rate of 30 frames/second as a stimulus video, and carrying out distortion processing on the stimulus video by using 5 distortion levels representing different distortion degrees to obtain distortion videos with different distortion levels;
step 2, collecting single electroencephalogram signals of a subject when the subject watches distorted videos with different distortion levels:
acquiring single electroencephalogram signals of at least 8 subjects repeatedly watching each video distortion video for at least 30 times through an electroencephalogram signal collector, wherein the sampling frequency is 1000Hz, and the number of sampling channels is 64;
step 3, intercepting electroencephalogram signal segments of a subject when the subject watches distorted videos with different distortion levels:
performing band-pass filtering on the acquired single electroencephalogram signal through a band-pass filter with the lower limit of cut-off frequency between 0.2-0.5Hz and the upper limit of cut-off frequency between 5-30Hz to obtain a single electroencephalogram signal after band-pass filtering, and intercepting a single electroencephalogram signal segment with the time length between 800 plus 1200 ms;
step 4, generating a training set and a testing set by utilizing the single electroencephalogram signal segment:
selecting at least 75% of electroencephalogram signal segments from the electroencephalogram signal segment set as a training set, using the rest electroencephalogram signal segments as a test set, and labeling the electroencephalogram signal segments in the training set and the test set, wherein the training set is used for training a space-time multi-scale joint network, and the test set is used as the input of a trained network for distortion prediction;
step 5, constructing a space-time multi-scale combined network:
constructing a space-time multi-scale combined network formed by connecting a space-time multi-scale feature extraction sub-network and a space-time multi-scale feature fusion sub-network in series, wherein the space-time multi-scale feature extraction sub-network is formed by connecting a time domain feature extraction module and a space domain feature extraction module in parallel;
step 5.1, a time domain feature extraction module is set up, the time domain feature extraction module is formed by connecting two sub-modules in series, the first sub-module comprises three parallel bidirectional LSTM layers, and the second sub-module is formed by connecting a feature splicing layer, a BN layer, a full connection layer and a ReLU activation layer in series in sequence;
step 5.2, a space domain feature extraction module is built, the space domain feature extraction module is formed by connecting two sub-modules in series, the first sub-module is formed by connecting three convolution layers in parallel, and the second sub-module sequentially has the following structure: the characteristic splicing layer, the BN layer, the full connecting layer and the ReLU activation layer are connected in series;
step 5.3, building a space-time multi-scale feature fusion sub-network, wherein the structure of the sub-network sequentially comprises the following steps: the device comprises a characteristic splicing layer, a full connection layer, a ReLU activation function layer, a Dropout layer and a full connection layer;
step 6, training a space-time multi-scale combined network:
initializing parameters of the space-time multi-scale joint network, inputting a training sample set into the space-time multi-scale joint network, and iteratively updating the parameters of the space-time multi-scale joint network by using a back propagation method until a cross entropy loss function is converged to obtain the trained space-time multi-scale joint network;
and 7, evaluating the video quality by adopting a trained space-time multi-scale combined network:
inputting the EEG signal segments with distortion labels in the test set into a trained space-time multi-scale combined network, judging each classification data output by the space-time multi-scale combined network, and counting classification results of the space-time multi-scale combined network on all the EEG signal segments according to whether the judgment result of the space-time multi-scale combined network is consistent with the actual condition or not. And obtaining a classification result of the distorted stimulation video and the undistorted stimulation video corresponding to the electroencephalogram according to the label of each electroencephalogram signal segment, and taking the classification result as a video quality evaluation result.
2. Video quality based on electroencephalogram and spatiotemporal multiscale joint network as in claim 1The evaluation method is characterized in that the distortion level in the step 1 refers to: setting K image distortion levels q ═ q in the vicinity of human eye distortion perception threshold values 1 ,q 2 ,...,q k ,...,q K -the spacing of each distortion level is such that distortion variation is just noticeable; the human eye distortion perception threshold refers to the distortion degree of an image when human eyes can just observe image distortion, and the just noticeable distortion change refers to the distortion degree which gradually increases or decreases the distortion degree under the current distortion condition until the human eyes can just perceive the distortion change, wherein K is more than or equal to 4 and less than or equal to 6.
3. The method for evaluating video quality based on electroencephalogram and spatiotemporal multi-scale joint network according to claim 1, wherein the distortion processing in step 1 refers to: and (3) taking continuous 30 frames of images from the stimulus video, carrying out image compression of different degrees according to each frame of image with different distortion levels, and recombining the images with other unprocessed frames of images in the stimulus video to obtain the distortion videos corresponding to different distortion levels.
4. The video quality evaluation method based on electroencephalogram signals and a spatiotemporal multi-scale joint network as claimed in claim 1, wherein the cross-entropy loss function in step 6 is as follows:
wherein,representing a cross entropy loss function, n representing the total number of classification results after all samples in the training set are input into the multi-scale joint neural network, Σ representing summation operation, i representing the serial number of the multi-scale joint neural network classification result corresponding to the ith sample in the training set, y (i) representing the real value of the multi-scale joint neural network ith classification result corresponding to the ith sample in the training set, and log representing logarithm operation with 10 as a base,the predicted value of the ith classification result of the multi-scale joint neural network corresponding to the ith sample in the training set is represented, | | represents the operation of taking the absolute value,denotes the introduced L1 regularization term, k denotes the L1 regularization coefficient, ω i And the weight value of the ith classification result of the multi-scale joint neural network corresponding to the ith sample in the training set is represented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210601991.3A CN115022617B (en) | 2022-05-30 | 2022-05-30 | Video quality evaluation method based on electroencephalogram signal and space-time multi-scale combined network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210601991.3A CN115022617B (en) | 2022-05-30 | 2022-05-30 | Video quality evaluation method based on electroencephalogram signal and space-time multi-scale combined network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115022617A true CN115022617A (en) | 2022-09-06 |
CN115022617B CN115022617B (en) | 2024-04-19 |
Family
ID=83070314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210601991.3A Active CN115022617B (en) | 2022-05-30 | 2022-05-30 | Video quality evaluation method based on electroencephalogram signal and space-time multi-scale combined network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115022617B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116636815A (en) * | 2023-06-06 | 2023-08-25 | 中国人民解放军海军特色医学中心 | Electroencephalogram signal-based sleeping quality assessment method and system for underwater operators |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170034024A1 (en) * | 2015-07-28 | 2017-02-02 | Futurewei Technologies, Inc. | Parametric model for video scoring |
CN106408037A (en) * | 2015-07-30 | 2017-02-15 | 阿里巴巴集团控股有限公司 | Image recognition method and apparatus |
CN107220599A (en) * | 2017-05-16 | 2017-09-29 | 北京信息科技大学 | Image quality evaluating method based on EEG signal |
CN107609492A (en) * | 2017-08-25 | 2018-01-19 | 西安电子科技大学 | Distorted image quality based on EEG signals perceives evaluation method |
CN108090902A (en) * | 2017-12-30 | 2018-05-29 | 中国传媒大学 | A kind of non-reference picture assessment method for encoding quality based on multiple dimensioned generation confrontation network |
WO2018200734A1 (en) * | 2017-04-28 | 2018-11-01 | Pcms Holdings, Inc. | Field-of-view prediction method based on non-invasive eeg data for vr video streaming services |
CN111510710A (en) * | 2020-04-27 | 2020-08-07 | 西安电子科技大学 | Video quality evaluation method based on electroencephalogram signals and space-time distortion |
CN112288657A (en) * | 2020-11-16 | 2021-01-29 | 北京小米松果电子有限公司 | Image processing method, image processing apparatus, and storage medium |
CN113255789A (en) * | 2021-05-31 | 2021-08-13 | 西安电子科技大学 | Video quality evaluation method based on confrontation network and multi-tested electroencephalogram signals |
CN113662565A (en) * | 2021-08-09 | 2021-11-19 | 清华大学 | Video playing quality evaluation method and device based on electroencephalogram characteristics |
US20220095988A1 (en) * | 2020-09-30 | 2022-03-31 | Tsinghua University | Method and apparatus for determining quality grade of video data |
-
2022
- 2022-05-30 CN CN202210601991.3A patent/CN115022617B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170034024A1 (en) * | 2015-07-28 | 2017-02-02 | Futurewei Technologies, Inc. | Parametric model for video scoring |
CN106408037A (en) * | 2015-07-30 | 2017-02-15 | 阿里巴巴集团控股有限公司 | Image recognition method and apparatus |
WO2018200734A1 (en) * | 2017-04-28 | 2018-11-01 | Pcms Holdings, Inc. | Field-of-view prediction method based on non-invasive eeg data for vr video streaming services |
CN107220599A (en) * | 2017-05-16 | 2017-09-29 | 北京信息科技大学 | Image quality evaluating method based on EEG signal |
CN107609492A (en) * | 2017-08-25 | 2018-01-19 | 西安电子科技大学 | Distorted image quality based on EEG signals perceives evaluation method |
CN108090902A (en) * | 2017-12-30 | 2018-05-29 | 中国传媒大学 | A kind of non-reference picture assessment method for encoding quality based on multiple dimensioned generation confrontation network |
CN111510710A (en) * | 2020-04-27 | 2020-08-07 | 西安电子科技大学 | Video quality evaluation method based on electroencephalogram signals and space-time distortion |
US20220095988A1 (en) * | 2020-09-30 | 2022-03-31 | Tsinghua University | Method and apparatus for determining quality grade of video data |
CN112288657A (en) * | 2020-11-16 | 2021-01-29 | 北京小米松果电子有限公司 | Image processing method, image processing apparatus, and storage medium |
CN113255789A (en) * | 2021-05-31 | 2021-08-13 | 西安电子科技大学 | Video quality evaluation method based on confrontation network and multi-tested electroencephalogram signals |
CN113662565A (en) * | 2021-08-09 | 2021-11-19 | 清华大学 | Video playing quality evaluation method and device based on electroencephalogram characteristics |
Non-Patent Citations (2)
Title |
---|
ELENI KROUPI等: ""EEG corelates during video quality perception"", 2014 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE, 13 November 2014 (2014-11-13) * |
武天妍: ""基于脑电信号的视觉感知特性与影像质量评价"", 中国优秀硕士学位论文全文数据库, 15 May 2021 (2021-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116636815A (en) * | 2023-06-06 | 2023-08-25 | 中国人民解放军海军特色医学中心 | Electroencephalogram signal-based sleeping quality assessment method and system for underwater operators |
CN116636815B (en) * | 2023-06-06 | 2024-03-01 | 中国人民解放军海军特色医学中心 | Electroencephalogram signal-based sleeping quality assessment method and system for underwater operators |
Also Published As
Publication number | Publication date |
---|---|
CN115022617B (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109646022B (en) | Child attention assessment system and method thereof | |
CN108959895B (en) | Electroencephalogram EEG (electroencephalogram) identity recognition method based on convolutional neural network | |
CN111510710B (en) | Video quality evaluation method based on electroencephalogram signals and space-time distortion | |
CN108960182A (en) | A kind of P300 event related potential classifying identification method based on deep learning | |
CN113554597B (en) | Image quality evaluation method and device based on electroencephalogram characteristics | |
CN113180701B (en) | Electroencephalogram signal deep learning method for image label labeling | |
WO2022135449A1 (en) | Interictal epileptiform discharge activity detection apparatus and method for epileptic patient | |
CN114305452B (en) | Cross-task cognitive load identification method based on electroencephalogram and field adaptation | |
CN105279380A (en) | Facial expression analysis-based depression degree automatic evaluation system | |
CN113076878B (en) | Constitution identification method based on attention mechanism convolution network structure | |
CN111671445A (en) | Consciousness disturbance degree analysis method | |
CN117883082A (en) | Abnormal emotion recognition method, system, equipment and medium | |
CN106943150A (en) | Mental fatigue detecting system and its method for use | |
CN111067513B (en) | Sleep quality detection key brain area judgment method based on characteristic weight self-learning | |
CN115022617B (en) | Video quality evaluation method based on electroencephalogram signal and space-time multi-scale combined network | |
CN113255789B (en) | Video quality evaluation method based on confrontation network and multi-tested electroencephalogram signals | |
CN110569968A (en) | Method and system for evaluating entrepreneurship failure resilience based on electrophysiological signals | |
CN114998252B (en) | Image quality evaluation method based on electroencephalogram signals and memory characteristics | |
CN113723206A (en) | Brain wave identification method based on quantum neural network algorithm | |
CN113255786B (en) | Video quality evaluation method based on electroencephalogram signals and target salient characteristics | |
CN112274154B (en) | Cross-subject fatigue driving classification method based on electroencephalogram sample weight adjustment | |
CN116898454A (en) | Epileptic classification method and system based on electroencephalogram feature fusion deep learning model | |
CN114118176A (en) | Continuous and rapid visual demonstration electroencephalogram signal classification method based on decoupling representation learning | |
CN115429272B (en) | Psychological health state assessment method and system based on multi-mode physiological signals | |
CN112906539B (en) | Object identification method based on EEG data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |