Summary of the invention
The purpose of the present invention is assessing the panoramic video quality in virtual reality system, propose a kind of based on multilayer
The panoramic video appraisal procedure and system of grade quality factor, the system input one section of lossless panoramic video and one section of same content
Video is damaged, the quality assessment result of output damage video realizes the automatic assessment to damage video.
Idea of the invention is that the area-of-interest based on multi-layer calculates multiple quality factors, to cope in panoramic video
Important area the problem of being affected to video quality;Then multi-layer quality factor is merged by Fusion Model,
Model parameter can be learnt to obtain by subjective data, to cope with the subjective initiative of panoramic video user.
The purpose of the present invention is what is solved by the following technical programs: a kind of panoramic video based on multi-layer quality factor
Appraisal procedure and system include a kind of panoramic video appraisal procedure based on multi-layer quality factor and a kind of based on multi-layer matter
The panoramic video of the amount factor comments system.
Wherein, a kind of panoramic video appraisal procedure based on multi-layer quality factor, abbreviation this method;One kind being based on multilayer
The panoramic video of grade quality factor comments system, abbreviation this system.
This system include panoramic video input module, region of interesting extraction module, multi-layer quality factor computing module,
Time domain processing module and multi-layer quality factor Fusion Module.
The connection relationship of each module of this system is as follows:
Panoramic video input module is connected with region of interesting extraction module, region of interesting extraction module and multi-layer matter
Factor computing module is measured to be connected;Multi-layer quality factor computing module is connected with time domain processing module;Time domain processing module and more
Level quality factor Fusion Module is connected.
The function of each module of this system is as follows:
The function of panoramic video input module is to be decoded to obtain panoramic frame image pair to the video file of input;Feel emerging
The function of interesting region extraction module is to extract the multi-layer region of interest domain matrix of panoramic picture;Multi-layer quality factor calculates mould
The function of block is that the quality factor of panoramic picture is calculated according to region of interest domain matrix;The function of time domain processing module is according to complete
The quality factor of scape image calculates the quality factor of panoramic video;The function of multi-layer quality factor Fusion Module is by aphorama
The quality factor of frequency is merged to obtain the automatic assessment result of damage video.
A kind of panoramic video appraisal procedure based on multi-layer quality factor, comprising the following steps:
Step 1: panoramic video input module to input this system a pair of of panoramic video source file carry out video processing and
Decoding process obtains panoramic frame image pair;
Wherein, the panoramic video in a pair of of panoramic video source file of input be one section lossless reference video S ' and one section and
The identical damage video S of reference video content, the damage damaged in video S include be artificially introduced fuzzy plus make an uproar and encode
Based on processing caused by damage caused by the reason of damaging, also include in network transmission process as packet loss and error code being main;
Wherein, lossless reference video is also referred to as reference video;
Step 1.1 judge to input a pair of of panoramic video source file resolution ratio whether having the same of this system, frame per second and
Duration and identical mapping format, including the mapping based on the mapping of longitude and latitude figure, hexahedron mapping and rectangular pyramid mapping, and
Corresponding operating is carried out according to judging result:
If 1.1A inputs a pair of of panoramic video source file resolution ratio, frame per second and duration having the same of this system, and
Identical mapping format, then skip to step 1.2;
A pair of of panoramic video source file that 1.1B inputs this system does not have identical resolution ratio, frame per second and duration, and
Identical mapping format then become with picture element interpolation, duplicated frame image, mapping to damage video in panoramic video input module
It is changed to main video processing, so that damage video and reference video resolution ratio having the same, frame per second and duration and identical
Mapping format;
Step 1.2 uses the decoding tool based on ffmpeg, according to a pair of of panoramic video source file of input this system
Coded format, be decoded processing, each panoramic video be decoded as multiple image, to obtain panoramic frame image pair,
In, the video frame number of panoramic video source file is N, and obtained panoramic frame image is to for N group, including the N obtained by reference video
A reference frame image and the N number of damage frame image obtained by damage video, the width of each panoramic frame image and high respectively W and H;
Step 2: region of interesting extraction module exports step 1 using image procossing and computer vision algorithms make
Panoramic frame image exports multi-layer area-of-interest set of matrices to region of interesting extraction is carried out;
Specific: the reference frame image I ' carry out area-of-interest of the panoramic frame image pair of step 1 output, i.e. ROI are mentioned
It takes;
Wherein, multi-layer area-of-interest set of matrices is low-level area-of-interest set of matrices
Middle level area-of-interest set of matricesHigh-level area-of-interest set of matricesTime domain level area-of-interest set of matricesAnd mapping level is interested
Matrix of areas MpIn all set of matrices, wherein M indicates that size is the two-dimensional matrix of H × W, i.e. the one of image I ' is interested
Matrix of areas, the element value range in M are [0,1], and the numerical value of value, that is, matrix the i-th row jth column of M (i, j) is bigger, indicates ginseng
Examine the easier influence degree noticed by viewer to video quality of pixel I ' (i, j) of corresponding position in frame image I '
It is bigger;The subscript l, m, h of M, it is by the area-of-interest of basic, normal, high, time domain and mapping level that t, p, which respectively indicate the matrix,
What extracting method obtained, superscript 1,2 ... the n of M indicate the matrix be obtained by the n method of place level, wherein
nl,nm,nh,ntThe integer more than or equal to 1 is taken, indicates that basic, normal, high, time domain level can be obtained using one or more kinds of methods
To one or more region of interest domain matrixs, and mapping level only can select a kind of method to obtain area-of-interest square
Battle array;
It is above that step 1 is exported for a reference frame image I ' for the explanation of region of interest domain matrix number
N group panoramic frame image pair N number of reference frame image, step 2 output region of interest domain matrix number be (nl+nm+nh+
nt+1)×N;
Multi-layer region of interest domain matrix is generated by step 2.1 to step 2.5 respectively, specifically:
Step 2.1 calculates the low-level area-of-interest of reference frame image, output using pixel scale image processing method
Low-level area-of-interest set of matrices
Wherein, pixel scale image processing method is based on color contrast and edge detection;
Step 2.2 calculates the middle level area-of-interest of reference frame image, level in output using super-pixel processing method
Area-of-interest set of matrices
Wherein, super-pixel processing method is based on the sequence of super-pixel block conspicuousness;
Step 2.3 calculates the high-level area-of-interest of reference frame image using computer vision methods, and usually viewer holds
The region based on people, animal and vehicle easily paid close attention to exports high-level area-of-interest set of matrices
Wherein, computer vision methods are based on Target Segmentation and semantic segmentation;
Step 2.4 calculates temporal levels area-of-interest using adjacent two frames reference picture using image processing method, leads to
The moving object of concern, output time-domain level area-of-interest set of matrices are often easy for viewer
Wherein, image processing method is based on light stream estimation and estimation;
Step 2.5 selects corresponding weight square according to the mapping format of a pair of of panoramic video source file of input this system
Battle array, output weight matrix is as mapping level area-of-interest matrix Mp;
It is for longitude and latitude figure mapping format, and weight ratio equator, the two poles of the earth weight of corresponding weight matrix is small, tetragonous Cone mapping
The bottom surface weight ratio conical surface weight of the corresponding weight matrix of format is big;
Wherein, the mapping level area-of-interest matrix that step 2.5 exports is only related with video mapping format, with frame image
Itself is unrelated, once it is determined that the video mapping format of input, then the region of interest domain matrix of each frame is identical;
Step 3: multi-layer quality factor computing module, using quality evaluation algorithm, the multi-layer based on step 2 output
Area-of-interest set of matrices calculates the weighted difference of the panoramic frame image pair of step 1 output, exports the more of N framing image pair
Level quality factor set;
Wherein, multi-layer quality factor set is low-level quality factor setMiddle level quality
Factor setHigh-level quality factor setTime domain level quality factor collection
It closesAnd mapping level quality factor set fpIn all numerical value set, wherein f indicate one be greater than 0
Natural number, subscript is consistent with the upper subscript of the M in step 2 thereon, indicates the quality factor by corresponding region of interest
Domain matrix obtains the treatment process and is specifically completed by following steps:
The basic, normal, high of panoramic frame image pair and step 2 output, time domain and the mapping sense that step 3.1 exports step 1 are emerging
Interesting matrix of areas obtains N group according to the sequence of frame
Each group includes: a lossless panorama sketch, a width damage panorama sketch and more
Level area-of-interest set of matrices;
Step 3.2 calculates lossless and damages the quality difference matrix D of panorama sketch using pixel difference appraisal procedure, D be H ×
The two-dimensional matrix of W, D (i, j) indicate color/luminance difference that is lossless and damaging in panorama sketch pixel at the position (i, j), can make
It is calculated with Euclidean distance method;
Each region of interest domain matrix M is multiplied by step 3.3 with difference matrix D corresponding element, the difference square weighted
Battle array set
Step 3.4 uses traditional images objective quality assessment method by the difference matrix compound mapping of weighting for damage image
Multi-layer quality factor set
Wherein, traditional images objective quality assessment method is based on MSE, PSNR and SSIM;
Step 4: time domain processing module, the N group image multi-layer quality factor set that input step three obtains, according to when
Domain processing method, fusion become one group, export the multi-layer quality factor set of video S
Wherein, time-domain processing method is based on average and weighted average;
Step 5: multi-layer quality factor Fusion Module, the multi-layer quality factor that input step four obtains, using fusion
Model Fusion is a quality evaluation result
Export result Q, the i.e. matter of video S
Measure evaluation result;
Wherein, Fusion Model is based on linear regression, nonlinear regression and neural network model;
The parameter of the Fusion Model can be obtained by Experience Design, can also be trained and be obtained by way of machine learning,
Wherein the method based on machine learning can mainly be completed by following steps: design a BP neural network structure first, then
The parameter of BP network is obtained using training data training, so that the result of these quality factors fusion is obtained close to subjective
Point;
Wherein, the quality score for some panoramic videos that the training data utilized is obtained specifically by subjective experiment, and
The video quality factor obtained by step 1 to step 4;
So far, by step 1 to step 5, this method, i.e., a kind of aphorama based on multi-layer quality factor are completed
Frequency appraisal procedure.
Beneficial effect
A kind of panoramic video appraisal procedure and system based on multi-layer quality factor of the present invention, compared with prior art,
It has the following beneficial effects:
This method is suitable for panoramic video quality evaluation: with existing ordinary video method for evaluating quality and existing panorama
Video quality evaluation method is compared, and method of the invention is due to considering and having merged user's area-of-interest of multi-layer to video
The influence of quality, the quality evaluation valuation of obtained damage video and the evaluation result of subjective experiment are more consistent, are more suitable for pair
The automatic Evaluation of panoramic video quality.
Embodiment 1
The present embodiment is the lossless panoramic video to the method for the invention and system based on two sections of 4K resolution ratio
Concert.mp4 and damaging is illustrated for panoramic video concert_3M.mp4.
Fig. 1 is that a kind of panoramic video appraisal procedure and system based on multi-layer quality factor of the present invention is based on multi-layer matter
Measure the panoramic video quality evaluation system module map of the factor.
From figure 1 it appears that this system solves reference video and damage video input panoramic video input module
Code processing, is then fed into region of interesting extraction module, extracts low-level, middle level, high-level, time domain level and mapping layer
Grade area-of-interest, is then based on these region of interest domain matrixs, calculates panorama sketch in multi-layer quality factor computing module
As pair low-level, middle level, high-level, time domain level and mapping level quality factor set, then by these quality because
Son is sent into time domain processing module, obtains the multi-layer quality factor set of panoramic video, finally merges in multi-layer quality factor
These quality factors are permeated the output of quality score in module, that is, damage the automatic assessment result of video.
Using this system relied on it is a kind of based on the panoramic video appraisal procedure of multi-layer quality factor to the present embodiment
In two sections of 4K resolution ratio lossless panoramic video concert.mp4 and damage at panoramic video concert_3M.mp4
Reason, includes the following steps:
Step A: panoramic video input module is decoded processing, two videos to a pair of of panoramic video source file of input
It is duration 10 seconds, the longitude and latitude bitmap-format panoramic video of frame per second 30fps, resolution ratio 4096*2048, damage video is by lossless view
What frequency obtained after H.264 compressed encoding, the code rate of lossless video is 50Mbps, and the code rate for damaging video is 3Mbps, the two
300 pairs of panorama sketch are obtained after decoding, wide high respectively 4096 and 2048 pixels of image, wherein Fig. 2 (A) is lossless video
5th frame panorama sketch
Step B: region of interesting extraction module carries out region of interesting extraction, the treatment process to 300 lossless images
Specifically completed by following steps:
B.1, the method that step uses color contrast to calculate notable figure, is calculated 300 low-levels of 300 images
Region of interest domain matrixMatrix size is 2048 × 4096, wherein the result of the 5th frame is mapped to image space (by [0,1]
Furthermore the value of range adds another low-level region of interest domain matrix multiplied by 256) such as Fig. 2 (B) is shownIt is one 2048
The all 1's matrix of × 4096 sizes;
B.2, step divides the image into super-pixel, then uses two kinds of super-pixel block conspicuousness sort methods, calculates reference
The middle level area-of-interest matrix of frame imageWithIt is mapped to after image space as shown in Fig. 2 (C, D);
B.3, the method that step uses full convolutional neural networks carries out target semantic segmentation to reference frame image, will divide
The mask arrived is as high-level region of interest domain matrix Mh, be mapped as bianry image such as Fig. 2 (E), matrix element be 1 belong to people,
Target area based on animal and vehicle, element belong to background area for 0;
Step is B.4 in the present embodiment without using inter motion information, therefore the time domain level in the present embodiment is interested
Matrix of areas MtFor null matrix;
B.5, step is longitude and latitude figure according to the mapping format of input video, selects corresponding weight matrix Mp, be mapped to [0,
255] as shown in Fig. 2 (F), the value of each element of matrix is determined by latitude, as shown in formula (1);
B.6 B.5 B.1 the present embodiment in step be obtained 6 region of interest domain matrixs of every frame image to step to stepTotally 1800 matrixes.
Step C: multi-layer quality factor computing module, this example use PSNR quality evaluation algorithm, are exported based on step B
Multi-layer area-of-interest weighting matrix set, calculate 300 frame images pair weighted difference set of matrices, export multi-layer matter
Factor set is measured, which is specifically completed by following steps:
C.1, the multiple semi-cylindrical hills matrix that step exports the step A panoramic picture pair exported and step B, according to frame
Sequence obtain 300 groupsEach group includes: a lossless panorama sketch, a width
Damage panorama sketch, 6 region of interest domain matrixs;
C.2, step calculates weighted difference set of matrices between two image pixels, as shown in formula (2), I (i, j), I '
(i, j) and M (i, j) are respectively the value for damaging each element in image, lossless image and weighting matrix, and wherein image is if threeway
Road then calculates separately weighted difference matrix according to each channel, obtains
D (i, j)=(I (i, j)-I ' (i, j))2×M(i,j) (2)
C.3, step usesCalculate quality factor set
As shown in formula (3), the present embodiment then takes triple channel quality factor if triple channel image using the calculation method of PSNR
Quality factor of the average value as damage image;
C.4, C.3 C.1 step can be obtained 6 quality factors of every frame damage image by step to step,Output of the set as this module as totally 300.
Step D: time domain processing module, 300 multi-layer quality factor set that input step C is obtained, this example according to
The quality factor of corresponding position in each set is averaged by the processing method of time domain average, i.e., shown in formula (4), x, y difference
Indicate the area-of-interest method index in the level index and the level of quality factor, output damage video concert_
3M.mp4 multi-layer quality factor set
Step E: multi-layer quality factor Fusion Module, the multi-layer quality factor set that input step D is obtained, using BP
Neural network is merged, and the final quality evaluation score Q (I, I ') of video fengjing_3M.mp4 is obtained.
E.1, BP neural network that step uses is respectively connected to step 4 and obtains as shown in figure 3, network possesses 6 input nodes
6 quality factors arrived, 10 concealed nodes, 1 output node input the quality evaluation result of [0,1] range;
The parameter of the step E.2 Fusion Model is the aphorama frequency by not including test video concert_3M.mp4
It is got according to training.
In this example, the quality assessment value and single-factor result obtained using 6 multi-layer quality factor amalgamation modes
It compares, it is more linearly related with subjective results.As shown in table 1, the quality factor for successively removing each level obtains and subjectivity
Spearman rank correlation coefficient SROCC ratio uses the SROCC of the quality factor of all levels small.Value in table uses 12 sections of originals
288 sections of damage video BP network parameters of beginning video and corresponding content, then using in other 4 sections of original videos and correspondence
The 96 sections of damage videos held are tested, and obtained SROCC is bigger, and the explanation automatic evaluation method is better.
1 multi-layer quality factor of table and a certain level quality factor of reduction compare
Above-described specific descriptions have carried out further specifically the purpose of invention, technical scheme and beneficial effects
It is bright, it should be understood that the above is only a specific embodiment of the present invention, the protection model being not intended to limit the present invention
It encloses, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention
Protection scope within.