[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3469877.3490608acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization

Published: 10 January 2022 Publication History

Abstract

Video jitter is an uncomfortable product of irregular lens motion in time sequence. How to extract motion state information in a period of continuous video frames is a major issue for video stabilization. In this paper, we propose a novel sequence model, Intra- and Inter-frame Iterative Temporal Convolutional Networks (I3TC-Net), which alternatively transfer the spatial-temporal correlation of motion within and between frames. We hypothesize that the motion state information can be represented by transmission states. Specifically, we employ combination of Convolutional Long Short-Term Memory (ConvLSTM) and embedded encoder-decoder to generate the latent stable frame, which are used to update transmission states iteratively and learn a global homography transformation effectively for each unstable frame to generate the corresponding stabilized result along the time axis. Furthermore, we create a video dataset to solve the lack of stable data and improve the training effect. Experimental results show that our method outperforms state-of-the-art results on publicly available videos, such as 5.4 points improvements in stability score. The project page is available at https://github.com/root2022IIITC/IIITC.

References

[1]
[n.d.]. A demo of our dataset. ([n. d.]). [Online], Available: hhttps://www.youtube.com/watch?v=c9Lv73H_OCE.
[2]
[n.d.]. An example video of comparison result. ([n. d.]). [Online], Available: https://www.youtube.com/watch?v=a5vZuPchmqw.
[3]
2018. Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features. IEEE Access 6, 99 (2018), 1155–1166.
[4]
Chris Buehler, Michael Bosse, and Leonard McMillan. [n.d.]. Non-metric image-based rendering for video stabilization. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, Vol. 2. II–II.
[5]
J. Choi and I. S. Kweon. 2020. DIFRINT: Deep Iterative Frame Interpolation for Full-Frame Video Stabilization. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[6]
J. L. Elman. 1990. Finding Structure in Time. Cognitive Science 14, 2 (1990), 179–211.
[7]
Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo. 2019. Spatio-temporal video re-localization by warp LSTM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288–1297.
[8]
Amit Goldstein and Raanan Fattal. 2012. Video stabilization using epipolar geometry. ACM Transactions on Graphics (TOG) 31, 5 (2012), 1–10.
[9]
Matthias Grundmann, Vivek Kwatra, and Irfan Essa. [n.d.]. Auto-directed video stabilization with robust l1 optimal camera paths. In CVPR 2011. IEEE, 225–232.
[10]
K. Guo, N. Kim, D. Seo, I. Kim, and S. Lim. 2020. Non-Uniform Video Time-Lapse Method Based on Motion Scenario and Stabilization Constraint. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11]
S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735–1780.
[12]
Tae Hyun Kim, Kyoung Mu Lee, Bernhard Scholkopf, and Michael Hirsch. 2017. Online video deblurring via dynamic temporal blending network. In Proceedings of the IEEE International Conference on Computer Vision. 4038–4047.
[13]
Maria Silvia Ito and Ebroul Izquierdo. 2019. A dataset and evaluation framework for deep learning based video stabilization systems. In IEEE Visual Communications and Image Processing (VCIP). 1–4.
[14]
D. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).
[15]
Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala. 2009. Content-preserving warps for 3D video stabilization. ACM Transactions on Graphics (TOG) 28, 3 (2009), 1–9.
[16]
Feng Liu, Michael Gleicher, Jue Wang, Hailin Jin, and Aseem Agarwala. 2011. Subspace video stabilization. ACM Transactions on Graphics (TOG) 30, 1 (2011), 1–10.
[17]
S. Liu, M. Li, S. Zhu, and Z. Bing. 2017. CodingFlow: Enable Video Coding for Video Stabilization. IEEE Transactions on Image Processing 26, 7 (2017), 3291–3302.
[18]
Shuaicheng Liu, Ping Tan, Lu Yuan, Jian Sun, and Bing Zeng. 2016. Meshflow: Minimum latency online video stabilization. In European Conference on Computer Vision. Springer, 800–815.
[19]
Shuaicheng Liu, Binhan Xu, Chuang Deng, Shuyuan Zhu, Bing Zeng, and Moncef Gabbouj. 2016. A hybrid approach for near-range video stabilization. IEEE Transactions on Circuits and Systems for Video Technology 27, 9(2016), 1922–1933.
[20]
Shuaicheng Liu, Lu Yuan, Ping Tan, and Jian Sun. 2013. Bundled camera paths for video stabilization. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–10.
[21]
S. Mukherjee, S. Ghosh, S. Ghosh, P. Kumar, and P. P. Roy. 2019. Predicting Video-frames Using Encoder-convlstm Combination. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22]
Seungjun Nah, Sanghyun Son, and Kyoung Mu Lee. 2019. Recurrent neural networks with intra-frame iterations for video deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8102–8111.
[23]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.
[24]
Carlo Tomasi and Takeo Kanade. 1991. Detection and tracking of point features. (1991).
[25]
Aaron Van Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. In International Conference on Machine Learning. PMLR, 1747–1756.
[26]
Miao Wang, Guo-Ye Yang, Jin-Kun Lin, Song-Hai Zhang, Ariel Shamir, Shao-Ping Lu, and Shi-Min Hu. 2018. Deep online video stabilization with multi-grid warping transformation learning. IEEE Transactions on Image Processing 28, 5 (2018), 2283–2292.
[27]
Y. Wang, W. K. Zhang, Q. Liu, Z. Zhang, and X. Sun. 2020. Improving Intra- and Inter-Modality Visual Relation for Image Captioning. In MM ’20: The 28th ACM International Conference on Multimedia.
[28]
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems. 802–810.
[29]
Sen-Zhe Xu, Jun Hu, Miao Wang, Tai-Jiang Mu, and Shi-Min Hu. 2018. Deep video stabilization using adversarial networks. In Computer Graphics Forum, Vol. 37. Wiley Online Library, 267–276.
[30]
D Zhang, W. Zhang, S. Li, Q. Zhu, and G. Zhou. 2020. Modeling both Intra- and Inter-modal Influence for Real-Time Emotion Detection in Conversations. In MM ’20: The 28th ACM International Conference on Multimedia.
[31]
B. Zhao, X. Li, and X. Lu. 2018. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32]
Minda Zhao and Qiang Ling. 2020. PWStableNet: Learning Pixel-Wise Warping Maps for Video Stabilization. IEEE Transactions on Image Processing 29 (2020), 3582–3595.

Index Terms

  1. Intra- and Inter-frame Iterative Temporal Convolutional Networks for Video Stabilization
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia
        December 2021
        508 pages
        ISBN:9781450386074
        DOI:10.1145/3469877
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 January 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Intra- and Inter-frame Iteration
        2. Temporal Convolutional
        3. Transmission States
        4. Video Stabilization

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        MMAsia '21
        Sponsor:
        MMAsia '21: ACM Multimedia Asia
        December 1 - 3, 2021
        Gold Coast, Australia

        Acceptance Rates

        Overall Acceptance Rate 59 of 204 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 92
          Total Downloads
        • Downloads (Last 12 months)12
        • Downloads (Last 6 weeks)5
        Reflects downloads up to 01 Jan 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media