- Shanghai, China
- https://jiaxin-ye.github.io/
Lists (8)
Sort Name ascending (A-Z)
Stars
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Now we have become very big, Different from the original idea. Collect premium software in various categories.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Clone a voice in 5 seconds to generate arbitrary speech in real-time
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🔊 Text-Prompted Generative Audio Model
Instant voice cloning by MIT and MyShell. Audio foundation model.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Unlock your displays on your Mac! Flexible HiDPI scaling, XDR/HDR extra brig CD27 htness, virtual screens, DDC control, extra dimming, PIP/streaming, EDID override and lots more!
A generative world for general-purpose robotics & embodied AI learning.
Fully open reproduction of DeepSeek-R1
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Fast and memory-efficient exact attention
🔠Foreign language reading and translation assistant based on copy and translate.
PyTorch implementations of Generative Adversarial Networks.
✨✨Latest Advances on Multimodal Large Language Models
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
This repository contains the source code for the paper First Order Motion Model for Image Animation
📋 A list of open LLMs available for commercial use.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Foundational Models for State-of-the-Art Speech and Text Translation
[CVPR 2024] Official repository for "MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model"