Stars
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
The code repository of paper "PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts"
The python library for real-time communication
VoiceBench: Benchmarking LLM-Based Voice Assistants
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Drag & drop UI to build your customized LLM flow
fork of ConcurrentLogHandler
Using modified BiSeNet for face parsing in PyTorch
Fast neural radiance field training with free camera trajectories
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
[ECCV2022] The implementation for "Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis".
The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.
Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
[ICCV2023] Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
WebRTC and ORTC implementation for Python using asyncio
Apache JMeter open-source load testing tool for analyzing and measuring the performance of a variety of services