- New York, NY
Stars
NYU Course Notes & Resources
Temporal Reasoning via Audio Question Answering
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
Speech, Language, Audio, Music Processing with Large Language Model
A curated list of resources in audio visual question answering and related area. :-)
ACM MM 2022 paper_AVQA: A Dataset for Audio-Visual Question Answering on Videos
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
PyTorch code and models for V-JEPA self-supervised learning from video.
A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Measuring compositionality in representation learning
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive arch…
Learning audio concepts from natural language supervision
A tool to visualize DCASE format SELD labels and predictions
Baseline method for sound event localization task of DCASE 2023 challenge
Example code to help people follow along with the tutorials
go binary for setting up singularity containers with a miniconda
Visualisation of VISOR Segmentations with Annotations and Relations
Instructional notebooks on music information retrieval.