Google Scholar

Content-based video parsing and indexing based on audio-visual interaction

S Tsekeridou, I Pitas - IEEE transactions on circuits and systems …, 2001 - ieeexplore.ieee.org

… Audio-source parsing and indexing leads to the extraction of a speaker … parsing and
indexing results in the extraction of a talking-face shot mapping over time. Integration of the audio …

Save Cite Cited by 127 Related articles All 9 versions

[BOOK][B] Content-based audio classification and retrieval for audiovisual data parsing

T Zhang, CCJ Kuo - 2013 - books.google.com

… PARSING Based on observations described above, we propose a scheme for video content
parsing by using video models together with audio … demultiplexed into the audio data stream …

Save Cite Cited by 86 Related articles All 5 versions Library Search

[PDF] arxiv.org

Unified multisensory perception: Weakly-supervised audio-visual video parsing

Y Tian, D Li, C Xu - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer

… Thus, we introduce a Look, Listen, and Parse dataset for audio-visual video scene parsing,
which contains 11,849 YouTube video clips spanning over 25 categories for a total of 32.9 h …

Save Cite Cited by 186 Related articles All 9 versions

[PDF] thecvf.com

Exploring heterogeneous clues for weakly-supervised audio-visual video parsing

Y Wu, Y Yang - Proceedings of the IEEE/CVF Conference …, 2021 - openaccess.thecvf.com

… In addition, we also evaluate the overall audio-visual scene parsing performance of our
method by computing aggregated results, ie., “Type@AV” and “Event@AV”. Specifically, Type@…

Save Cite Cited by 117 Related articles All 6 versions View as HTML

Related searches

Video content parsing based on combined audio and visual information

T Zhang, CCJ Kuo - Multimedia Storage and Archiving …, 1999 - spiedigitallibrary.org

… for automatic parsing of video data based on both audio and visual content analysis. The
accompanying audio signal in video is segmented and classified into basic audio types such …

Save Cite Cited by 39 Related articles All 5 versions

[PDF] neurips.cc

Multi-modal grouping network for weakly-supervised audio-visual video parsing

S Mo, Y Tian - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

… For these non-aligned cases, audio signals … audio-visual video parsing (AVVP) task [3] that
aims to parse a video into temporal event segments and predict the audible, visible, or audio-…

Save Cite Cited by 47 Related articles All 3 versions View as HTML

[PDF] neurips.cc

Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing

YB Lin, HY Tseng, HY Lee, YY Lin… - Advances in Neural …, 2021 - proceedings.neurips.cc

… The audio-visual video parsing task aims to temporally parse a video into … audio and visual
parsing branches for the weakly-supervised audio-visual video parsing task as a baseline. …

Save Cite Cited by 78 Related articles All 11 versions View as HTML

[PDF] arxiv.org

Label-anticipated event disentanglement for audio-visual video parsing

J Zhou, D Guo, Y Mao, Y Zhong, X Chang… - European Conference on …, 2025 - Springer

… improved performances across audio, visual, and audio-visual event parsing, in contrast to
… metrics, indicative of the comprehensive audio and visual event parsing performance, at the …

Save Cite Cited by 6 Related articles All 6 versions

[PDF] arxiv.org

Improving audio-visual video parsing with pseudo visual labels

J Zhou, D Guo, Y Zhong, M Wang - arXiv preprint arXiv:2303.02344, 2023 - arxiv.org

… in parsing the audio events and audio-visual events. … comparable performance in audio
event parsing but generally … for the visual event and audio-visual event parsing. These results …

Save Cite Cited by 14 Related articles All 2 versions View as HTML

Dhhn: Dual hierarchical hybrid network for weakly-supervised audio-visual video parsing

X Jiang, X Xu, Z Chen, J Zhang, J Song… - Proceedings of the 30th …, 2022 - dl.acm.org

… Similar to the previous works [19, 31, 36], we evaluate our method with F-score by parsing
all types of events (audio, visual, and audio-visual events) under both segment-level and event…

Save Cite Cited by 31 Related articles

Create alert

Cite

Advanced search

Saved to My library

Content-based video parsing and indexing based on audio-visual interaction

[BOOK][B] Content-based audio classification and retrieval for audiovisual data parsing

Unified multisensory perception: Weakly-supervised audio-visual video parsing

Exploring heterogeneous clues for weakly-supervised audio-visual video parsing

Related searches

Video content parsing based on combined audio and visual information

Multi-modal grouping network for weakly-supervised audio-visual video parsing

Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing

Label-anticipated event disentanglement for audio-visual video parsing

Improving audio-visual video parsing with pseudo visual labels

Dhhn: Dual hierarchical hybrid network for weakly-supervised audio-visual video parsing