-
HCMUS
- Ho Chi Minh
More
Stars
LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.
Supporting PyTorch models with the Google AI Edge TFLite runtime.
🦜🔗 Build context-aware reasoning applications
Port of OpenAI's Whisper model in C/C++
Chat language model that can use tools and interpret the results
c/ua is the Docker Container for Computer-Use AI Agents.
ngxson / llama.cpp
Forked from ggml-org/llama.cppForked from ggerganov/llama.cpp
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-fri…
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient MLA decoding kernels
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
This repository keeps track of everything including all modules and projects that I do during my internship at VNG
A toolbox for Vietnamese Optical Character Recognition.