8000 thanhlt998 (Le Tuan Thanh) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View thanhlt998's full-sized avatar
💪
howudoin
💪
howudoin

Block or report thanhlt998

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.

Python 279 22 Updated Jun 23, 2025

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python 149 5 Updated Jun 17, 2025

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching

Python 3,036 452 Updated Jun 20, 2025
Python 12 1 Updated Jun 17, 2025

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

Python 4,793 245 Updated Jun 22, 2025

The AI framework that adds the engineering to prompt engineering (Python/TS/Ruby/Java/C#/Rust/Go compatible)

Rust 4,079 179 Updated Jun 22, 2025

Large datasets for conversational AI

Python 1,345 171 Updated Nov 16, 2019

🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Python 17,155 2,005 Updated Jun 20, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 47,121 8,232 Updated Jun 16, 2025
Python 5,654 907 Updated Mar 16, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 52,817 6,466 Updated Jun 23, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 279 33 Updated Jun 15, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,623 870 Updated Apr 29, 2025

Pretraining code for a large-scale depth-recurrent language model

Python 783 65 Updated Jun 12, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 2 1 Updated Jun 22, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 25,329 2,280 Updated Jun 23, 2025

Python tool for converting files and office documents to Markdown.

Python 59,428 3,093 Updated Jun 4, 2025
Jsonnet 11 Updated Nov 27, 2022

first base model for full-duplex conversational audio

Python 1,749 111 Updated Jan 5, 2025

Large Reasoning Models

Python 804 45 Updated Dec 3, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,475 715 Updated Jun 23, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,943 198 Updated May 19, 2025

Vision infrastructure to turn complex documents into RAG/LLM-ready data

Rust 2,231 134 Updated Jun 20, 2025

Fast and memory-efficient exact attention

Python 7 Updated Oct 2, 2024

Gemma 2B with 10M context length using Infini-attention.

Python 948 61 Updated May 12, 2024

Mixture-of-Experts (MoE) Language Model

Python 189 41 Updated Sep 9, 2024
Next
0