-
Wuhan University
- Wuhan University
Lists (1)
Sort Name ascending (A-Z)
Stars
A list of tools, papers and code related to Fake Audio Detection.
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Awesome Knowledge-Distillation. 分类整理的知识蒸馏paper(2014-2021)。
This repository contains a collection of resources and papers on Detecting Multimedia Generated by Large AI Models
Research progress on speech deepfake detection: Relevant datasets aggregated from the review literature and publicly available codes
Code repo for our paper of "Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models"
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Pytorch Lightning入门中文教程,转载请注明来源。(当初是写着玩的,建议看完MNIST这个例子再上手)
[TIP 2022] End-to-end Temporal Action Detection with Transformer
面向人脸视频防伪鉴别的大规模中文数据评测基准(Large-Scale Chinese Data Benchmark for Face Video Anti-Forgery Identification)
pytorch实现Grad-CAM和Grad-CAM++,可以可视化任意分类网络的Class Activation Map (CAM)图,包括自定义的网络;同时也实现了目标检测faster r-cnn和retinanet两个网络的CAM图;欢迎试用、关注并反馈问题...
Open source implementation of "Vision Transformers Need Registers"
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
A self-supervised learning framework for audio-visual speech
[CVPR 2024] Official implementation of the paper "TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression"
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[ACM MM'23] UMMAFormer: A Universal Multimodal-adaptive Transformer Framework For Temporal Forgery Localization
[CVIU, DICTA Award] Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization