research-article

Dynamic Scene Graph Generation via Temporal Prior Inference

Authors:

Jingkuan SongAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 5793 - 5801

https://doi.org/10.1145/3503161.3548324

Published: 10 October 2022 Publication History

Get Access

Abstract

Real-world videos are composed of complex actions with inherent temporal continuity (eg "person-touching-bottle" is usually followed by "person-holding-bottle"). In this work, we propose a novel method to mine such temporal continuity for dynamic scene graph generation (DSGG), namely Temporal Prior Inference (TPI). As opposed to current DSGG methods, which individually capture the temporal dependence of each video by refining representations, we make the first attempt to explore the temporal continuity by extracting the entire co-occurrence patterns of action categories from a variety of videos in Action Genome (AG) dataset. Then, these inherent patterns are organized as Temporal Prior Knowledge (TPK) which serves as prior knowledge for models' learning and inference. Furthermore, given the prior knowledge, human-object relationships in current frames can be effectively inferred from adjacent frames via the robust Temporal Prior Inference algorithm with tiny computation cost. Specifically, to efficiently guide the generating of temporal-consistent dynamic scene graphs, we incorporate the temporal prior inference into a DSGG framework by introducing frame enhancement, continuity loss, and fast inference. The proposed model-agnostic strategies significantly boost the performances of existing state-of-the-art models on the Action Genome dataset, achieving 69.7 and 72.6 for R@10 and R@20 on PredCLS. In addition, the inference speed can be significantly reduced by 41% with an acceptable drop on R@10 (69.7 to 66.8) by utilizing fast inference.

Supplementary Material

MP4 File (MM22_fp2536.mp4)

This video presents the main content of the paper "dynamic scene graph generation via temporary prior reference". First, we concisely analyze the motivation of the thesis. Secondly, it shows the method to solve the problem, i.e. "Temporal Prior Inference (TPI) algorithm". Then, based on the TPI algorithm, Temporal Prior Frame Enhancement (TPFE), Temporal Prior Continuity Loss (TPCL) and Temporal Prior Fast Inference (TPFI) modules are proposed to improve the accuracy of the scene graph generated by the model. Finally, based on the experimental results, we analyze the function of each module in detail, and prove the effectiveness of the proposed method.

Download
12.77 MB

References

[1]

Oron Ashual and Lior Wolf. 2019. Specifying object attributes and relations in interactive scene generation. In ICCV. 4561--4569.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference

Post-inference prior swapping

Prior specification for Bayesian matrix factorization via prior predictive matching

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations