Content-based video parsing and indexing based on audio-visual interaction
S Tsekeridou, I Pitas - IEEE transactions on circuits and systems …, 2001 - ieeexplore.ieee.org
… Audio-source parsing and indexing leads to the extraction of a speaker … parsing and
indexing results in the extraction of a talking-face shot mapping over time. Integration of the audio …
indexing results in the extraction of a talking-face shot mapping over time. Integration of the audio …
[BOOK][B] Content-based audio classification and retrieval for audiovisual data parsing
T Zhang, CCJ Kuo - 2013 - books.google.com
… PARSING Based on observations described above, we propose a scheme for video content
parsing by using video models together with audio … demultiplexed into the audio data stream …
parsing by using video models together with audio … demultiplexed into the audio data stream …
Unified multisensory perception: Weakly-supervised audio-visual video parsing
… Thus, we introduce a Look, Listen, and Parse dataset for audio-visual video scene parsing,
which contains 11,849 YouTube video clips spanning over 25 categories for a total of 32.9 h …
which contains 11,849 YouTube video clips spanning over 25 categories for a total of 32.9 h …
Exploring heterogeneous clues for weakly-supervised audio-visual video parsing
… In addition, we also evaluate the overall audio-visual scene parsing performance of our
method by computing aggregated results, ie., “Type@AV” and “Event@AV”. Specifically, Type@…
method by computing aggregated results, ie., “Type@AV” and “Event@AV”. Specifically, Type@…
Video content parsing based on combined audio and visual information
T Zhang, CCJ Kuo - Multimedia Storage and Archiving …, 1999 - spiedigitallibrary.org
… for automatic parsing of video data based on both audio and visual content analysis. The
accompanying audio signal in video is segmented and classified into basic audio types such …
accompanying audio signal in video is segmented and classified into basic audio types such …
Multi-modal grouping network for weakly-supervised audio-visual video parsing
S Mo, Y Tian - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
… For these non-aligned cases, audio signals … audio-visual video parsing (AVVP) task [3] that
aims to parse a video into temporal event segments and predict the audible, visible, or audio-…
aims to parse a video into temporal event segments and predict the audible, visible, or audio-…
Exploring cross-video and cross-modality signals for weakly-supervised audio-visual video parsing
… The audio-visual video parsing task aims to temporally parse a video into … audio and visual
parsing branches for the weakly-supervised audio-visual video parsing task as a baseline. …
parsing branches for the weakly-supervised audio-visual video parsing task as a baseline. …
Label-anticipated event disentanglement for audio-visual video parsing
… improved performances across audio, visual, and audio-visual event parsing, in contrast to
… metrics, indicative of the comprehensive audio and visual event parsing performance, at the …
… metrics, indicative of the comprehensive audio and visual event parsing performance, at the …
Improving audio-visual video parsing with pseudo visual labels
… in parsing the audio events and audio-visual events. … comparable performance in audio
event parsing but generally … for the visual event and audio-visual event parsing. These results …
event parsing but generally … for the visual event and audio-visual event parsing. These results …
Dhhn: Dual hierarchical hybrid network for weakly-supervised audio-visual video parsing
… Similar to the previous works [19, 31, 36], we evaluate our method with F-score by parsing
all types of events (audio, visual, and audio-visual events) under both segment-level and event…
all types of events (audio, visual, and audio-visual events) under both segment-level and event…