More Web Proxy on the site http://driver.im/

research-article

Unified Quality Assessment of in-the-Wild Videos with Mixed Datasets Training

Authors:

Tingting Jiang,

Ming JiangAuthors Info & Claims

International Journal of Computer Vision, Volume 129, Issue 4

Pages 1238 - 1257

https://doi.org/10.1007/s11263-020-01408-w

Published: 01 April 2021 Publication History

Abstract

Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA.

References

[1]

Bampis CG, Li Z, Moorthy AK, Katsavounidis I, Aaron A, and Bovik AC Study of temporal effects on subjective video quality of experience IEEE Transactions on Image Processing 2017 26 11 5217-5231

[2]

Barron, J. T. (2019). A general and adaptive robust loss function. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 4331–4339.

[3]

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

[4]

Choi LK and Bovik AC Video quality assessment accounting for temporal visual masking of local flicker Signal Processing: Image Communication 2018 67 182-198

[5]

Deng, J., Dong, W., Socher, R., Li, LJ., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 248–255.

[6]

Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural networks. In International conference on quality of multimedia experience (QoMEX), pp. 1–6.

[7]

Freitas PG, Akamine WY, and Farias MC Using multiple spatio-temporal features to estimate video quality Signal Processing: Image Communication 2018 64 1-10

[8]

Ghadiyaram D and Bovik AC Perceptual quality prediction on authentically distorted images using a bag of features approach Journal of Vision 2017 17 1 32-32

[9]

Ghadiyaram D, Pan J, Bovik AC, Moorthy AK, Panda P, and Yang KC In-capture mobile video distortions: A study of subjective behavior and objective algorithms IEEE Transactions on Circuits and Systems for Video Technology 2018 28 9 2061-2077

[10]

He, H., Zhang, J., Zhang, Q., & Tao, D. (2019). Grapy-ML: Graph pyramid mutual learning for cross-dataset human parsing. arXiv preprint arXiv:1911.12053.

[11]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778.

[12]

Hosu, V., Hahn, F., Jenadeleh, M., Lin, H., Men, H., Szirányi, T., Li, S., & Saupe, D. (2017). The Konstanz natural video database (KoNViD-1k). In International conference on quality of multimedia experience (QoMEX), pp. 1–6.

[13]

Isogawa M, Mikami D, Takahashi K, Iwai D, Sato K, and Kimata H Which is the better inpainted image? Training data generation without any manual operations International Journal of Computer Vision 2019 127 11–12 1751-1766

[14]

Juluri P, Tamarapalli V, and Medhi D Measurement of quality of experience of video-on-demand services: A survey IEEE Communications Surveys and Tutorials 2015 18 1 401-418

[15]

Kang, L., Ye, P., Li, Y., & Doermann, D. (2014). Convolutional neural networks for no-reference image quality assessment. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1733–1740.

[16]

Kim, W., Kim, J., Ahn, S., Kim, J., & Lee, S. (2018). Deep video quality assessor: From spatio-temporal visual sensitivity to a convolutional neural aggregation network. In European conference on computer vision (ECCV), pp. 219–234.

[17]

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[18]

Korhonen J Two-level approach for no-reference consumer video quality assessment IEEE Transactions on Image Processing 2019 28 12 5923-5938

[19]

Krasula L, Yoann B, and Le Callet P Training objective image and video quality estimators using multiple databases IEEE Transactions on Multimedia 2020 22 4 961-969

[20]

Lasinger, K., Ranftl, R., Schindler, K., & Koltun, V. (2019). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341.

[21]

Li, D., Jiang, T., & Jiang, M. (2019a). Quality assessment of in-the-wild videos. In ACM international conference on multimedia (MM), pp. 2351–2359.

[22]

Li D, Jiang T, Lin W, and Jiang M Which has better visual quality: The clear blue sky or a blurry animal? IEEE Transactions on Multimedia 2019 21 5 1221-1234

[23]

Li, D., Jiang, T., & Jiang, M. (2020). Norm-in-norm loss with faster convergence and better performance for image quality assessment. In ACM International conference on multimedia (MM), pp. 789–797.

[24]

Li X, Guo Q, and Lu X Spatiotemporal statistics for video quality assessment IEEE Transactions on Image Processing 2016 25 7 3329-3342

[25]

Li Y, Po LM, Cheung CH, Xu X, Feng L, Yuan F, and Cheung KW No-reference video quality assessment with 3D shearlet transform and convolutional neural networks IEEE Transactions on Circuits and Systems for Video Technology 2016 26 6 1044-1057

[26]

Li, YJ., Lin, CS., Lin, YB., & Wang, YCF. (2019c). Cross-dataset person re-identification via unsupervised pose disentanglement and adaptation. In IEEE international conference on computer vision (ICCV), pp. 7919–7929.

[27]

Lin, KY., & Wang, G. (2018). Hallucinated-IQA: No-reference image quality assessment via adversarial learning. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 732–741.

[28]

Liu, W., Duanmu, Z., & Wang, Z. (2018). End-to-end blind quality assessment of compressed videos using deep neural networks. In ACM international conference on multimedia (MM), pp. 546–554.

[29]

Liu, X., van de Weijer, J., & Bagdanov, A. D. (2017). RankIQA: Learning from rankings for no-reference image quality assessment. In IEEE international conference on computer vision (ICCV), pp. 1040–1049.

[30]

Lu W, He R, Yang J, Jia C, and Gao X A spatiotemporal model of video quality assessment via 3D gradient differencing Information Sciences 2019 478 141-151

[31]

Lv, J., Chen, W., Li, Q., & Yang, C. (2018). Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 7948–7956.

[32]

Ma, K., Wu, Q., Wang, Z., Duanmu, Z., Yong, H., Li, H., & Zhang, L. (2016). Group MAD competition—a new methodology to compare objective image quality models. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1664–1673.

[33]

Ma, K., Duanmu, Z., & Wang, Z. (2018). Geometric transformation invariant image quality assessment using convolutional neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 6732–6736.

[34]

Manasa, K., & Channappayya, S. S. (2016). An optical flow-based no-reference video quality assessment algorithm. In IEEE international conference on image processing (ICIP), 1 pp. 2400–2404.

[35]

Men, H., Lin, H., & Saupe, D. (2017). Empirical evaluation of no-reference VQA methods on a natural video quality database. In International conference on quality of multimedia experience (QoMEX), pp. 1–3.

[36]

Men, H., Lin, H., & Saupe, D. (2018). Spatiotemporal feature combination model for no-reference video quality assessment. In International conference on quality of multimedia experience (QoMEX), pp. 1–3.

[37]

Mittal A, Moorthy AK, and Bovik AC No-reference image quality assessment in the spatial domain IEEE Transactions on Image Processing 2012 21 12 4695-4708

[38]

Mittal A, Soundararajan R, and Bovik AC Making a “completely blind” image quality analyzer IEEE Signal Processing Letters 2013 20 3 209-212

[39]

Mittal A, Saad MA, and Bovik AC A completely blind video integrity oracle IEEE Transactions on Image Processing 2016 25 1 289-300

[40]

Moorthy AK, Choi LK, Bovik AC, and De Veciana G Video quality assessment on mobile devices: Subjective, behavioral and objective studies IEEE Journal of Selected Topics in Signal Processing 2012 6 6 652-671

[41]

Nieto, RG., Restrepo, HDB., & Cabezas, I. (2019). How video object tracking is affected by in-capture distortions? In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2227–2231.

[42]

Nuutinen M, Virtanen T, Vaahteranoksa M, Vuori T, Oittinen P, and Häkkinen J CVD2014–a database for evaluating no-reference video quality assessment algorithms IEEE Transactions on Image Processing 2016 25 7 3073-3086

[43]

Park J, Seshadrinathan K, Lee S, and Bovik AC Video quality pooling adaptive to perceptual distortion severity IEEE Transactions on Image Processing 2013 22 2 610-620

[44]

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems (NeurIPS), pp. 8024–8035.

[45]

Rippel, O., Nair, S., Lew, C., Branson, S., Anderson, AG., & Bourdev, L. (2019). Learned video compression. In IEEE international conference on computer vision (ICCV), pp. 3454–3463.

[46]

Saad MA, Bovik AC, and Charrier C Blind prediction of natural video quality IEEE Transactions on Image Processing 2014 23 3 1352-1365

[47]

Seshadrinathan K and Bovik AC Motion tuned spatio-temporal quality assessment of natural videos IEEE Transactions on Image Processing 2010 19 2 335-350

[48]

Seshadrinathan, K., & Bovik, AC. (2011). Temporal hysteresis model of time varying subjective video quality. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1153–1156.

[49]

Seshadrinathan K, Soundararajan R, Bovik AC, and Cormack LK Study of subjective and objective quality assessment of video IEEE Transactions on Image Processing 2010 19 6 1427-1441

[50]

Seufert M, Egger S, Slanina M, Zinner T, Hoßfeld T, and Tran-Gia P A survey on quality of experience of HTTP adaptive streaming IEEE Communications Surveys and Tutorials 2014 17 1 469-492

[51]

Siahaan E, Hanjalic A, and Redi JA Semantic-aware blind image quality assessment Signal Processing: Image Communication 2018 60 237-252

[52]

Sinno Z and Bovik AC Large scale study of perceptual video quality IEEE Transactions on Image Processing 2019 28 2 612-627

[53]

Sinno, Z., & Bovik, AC. (2019b). Spatio-temporal measures of naturalness. In IEEE international conference on image processing (ICIP), pp. 1750–1754.

[54]

Triantaphillidou S, Allen E, and Jacobson R Image quality comparison between JPEG and JPEG2000. II. Scene dependency, scene analysis, and classification Journal of Imaging Science and Technology 2007 51 3 259-270

[55]

Varga D No-reference video quality assessment based on the temporal pooling of deep features Neural Processing Letters 2019 50 2595-2608

[56]

Varga D and Szirányi T No-reference video quality assessment via pretrained CNN and LSTM networks Signal, Image and Video Processing 2019 13 1569-1576

[57]

VQEG. (2000). Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment. https://www.its.bldrdoc.gov/media/8212/frtv_phase1_final_report.doc.

[58]

Wang H, Katsavounidis I, Zhou J, Park J, Lei S, Zhou X, Pun MO, Jin X, Wang R, Wang X, Zhang Y, Huang J, Kwong S, and Kuo CCJ VideoSet: A large-scale compressed video quality dataset based on JND measurement Journal of Visual Communication and Image Representation 2017 46 292-302

[59]

Wang Y, Jiang T, Ma S, and Gao W Novel spatio-temporal structural information based video quality metric IEEE Transactions on Circuits and Systems for Video Technology 2012 22 7 989-998

[60]

Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP Image quality assessment: From error visibility to structural similarity IEEE Transactions on Image Processing 2004 13 4 600-612

[61]

Wang Z, Lu L, and Bovik AC Video quality assessment based on structural distortion measurement Signal Processing: Image Communication 2004 19 2 121-132

[62]

Xu, J., Ye, P., Liu, Y., & Doermann, D. (2014). No-reference video quality assessment via feature learning. In IEEE international conference on image processing (ICIP), pp. 491–495.

[63]

Yan, P., & Mou, X. (2019). No-reference video quality assessment based on spatiotemporal slice images and deep convolutional neural networks. In Proc. SPIE 11187, Optoelectronic Imaging and Multimedia Technology VI, pp. 74–83.

[64]

Yang D, Peltoketo VT, Kamarainen JK (2019) CNN-based cross-dataset no-reference image quality assessment. In ieee international conference on computer vision workshop (ICCVW), pp. 3913–3921

[65]

Ye, P., Kumar, J., Kang, L., & Doermann, D. (2012). Unsupervised feature learning framework for no-reference image quality assessment. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1098–1105.

[66]

You, J., & Korhonen, J. (2019). Deep neural networks for no-reference video quality assessment. In IEEE international conference on image processing (ICIP), pp. 2349–2353.

[67]

You J, Ebrahimi T, and Perkis A Attention driven foveated video quality assessment IEEE Transactions on Image Processing 2014 23 1 200-213

[68]

Zhang L, Shen Y, and Li H VSI: A visual saliency-induced index for perceptual image quality assessment IEEE Transactions on Image Processing 2014 23 10 4270-4281

[69]

Zhang, R., Isola, P., Efros, AA., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 586–595.

[70]

Zhang W and Liu H Study of saliency in objective video quality assessment IEEE Transactions on Image Processing 2017 26 3 1275-1288

[71]

Zhang, W., Liu, Y., Dong, C., & Qiao, Y. (2019a). RankSRGAN: Generative adversarial networks with ranker for image super-resolution. In IEEE international conference on computer vision (ICCV), pp. 3096–3105.

[72]

Zhang, W., Ma, K., & Yang, X. (2019b). Learning to blindly assess image quality in the laboratory and wild. arXiv preprint arXiv:1907.00516.

[73]

Zhang Y, Gao X, He L, Lu W, and He R Blind video quality assessment with weakly supervised learning and resampling strategy IEEE Transactions on Circuits and Systems for Video Technology 2019 29 8 2244-2255

[74]

Zhang Y, Gao X, He L, Lu W, and He R Objective video quality assessment combining transfer learning with CNN IEEE Transactions on Neural Networks and Learning Systems 2020 31 8 2716-2730

Cited By

Chi BSu RChen X(2024)Using Spatial-Temporal Attention for Video Quality EvaluationInternational Journal of Intelligent Systems10.1155/2024/55146272024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/5514627
Yuan DWang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Dual-Criterion Quality Loss for Blind Image Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681250(7823-7832)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681250
Tan XZhang JQuan YLi JWu YBian ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training StrategyProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680907(9913-9922)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680907
Show More Cited By

Recommendations

Quality Assessment of In-the-Wild Videos
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild ...
The Role of Audio in Visual Perception of Quality
HCI International 2023 – Late Breaking Papers
Abstract
Subjective Video Quality Assessment (VQA) is typically based on video stimuli without sound, for studying pure visual quality perception. This silent approach to VQA does not accurately represent the typical multisensory everyday use of video ...
Cross-Dimensional Perceptual Quality Assessment for Low Bit-Rate Videos

Most studies in the literature for video quality assessment have been focused on the evaluation of quantized video sequences at fixed and high spatial and temporal resolutions. Only limited work has been reported for assessing video quality under ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision

International Journal of Computer Vision Volume 129, Issue 4

Apr 2021

535 pages

ISSN:0920-5691

Issue’s Table of Contents

© Springer Science+Business Media, LLC, part of Springer Nature 2021.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2021

Accepted: 18 November 2020

Received: 21 December 2019

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Science Foundation of China
Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chi BSu RChen X(2024)Using Spatial-Temporal Attention for Video Quality EvaluationInternational Journal of Intelligent Systems10.1155/2024/55146272024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/5514627
Yuan DWang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Dual-Criterion Quality Loss for Blind Image Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681250(7823-7832)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681250
Tan XZhang JQuan YLi JWu YBian ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training StrategyProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680907(9913-9922)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680907
Xie QYuan KQu YWu MSun MZhou CZhu JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)QPT-V2: Masked Image Modeling Advances Visual ScoringProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680721(2709-2718)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680721
Xiang JDang YChen PLiang RHuan RGao NCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality AssessmentProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680598(9970-9979)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680598
Zhou FYuan SLiang ZDuan JQiu G(2024)A Dataset and Model for the Visual Quality Assessment of Inversely Tone-Mapped HDR VideosIEEE Transactions on Image Processing10.1109/TIP.2023.334309933(366-381)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2023.3343099
Cui YYu ZFeng YWang HLi J(2024)A multi-scale no-reference video quality assessment method based on transformerMultimedia Systems10.1007/s00530-024-01403-y30:4Online publication date: 6-Jul-2024
https://dl.acm.org/doi/10.1007/s00530-024-01403-y
Chen ZQin HWang JYuan CLi BHu WWang L(2024)PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via PromptsComputer Vision – ECCV 202410.1007/978-3-031-73232-4_14(247-264)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73232-4_14
Zhang ARan YTang WWang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Vulnerabilities in video quality assessment modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668363(51477-51490)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668363
Telili AFezza SHamidouche WMeftah H(2023)2BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3632178Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3632178
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents