- Shanghai
- https://x.com/FeitengLi
- @FeitengLi
Stars
A novel cross-modal decoupling and alignment framework for multimodal representation learning.
MAGI-1: Autoregressive Video Generation at Scale
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRβ¦
Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
Full-featured MP4 format, MPEG DASH, HLS, CMAF SDK and tools
A python binding for FFmpeg which provides sync and async APIs
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
AudioBench: A Universal Benchmark for Audio Large Language Models
VoiceBench: Benchmarking LLM-Based Voice Assistants
An easy-to-use, fast, and easily integrable tool for evaluating audio LLM
The official repository of Dynamic-SUPERB.
TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge
Tools for handling speech data in machine learning projects.
Unified high-performance Python client for object and file stores.
Terminal string styling done right, in Python π π
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
The python library for real-time communication
Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".
UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound
FlashMLA: Efficient MLA decoding kernels
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.