Stars
An Private repository for development
The simplest, fastest repository for training/finetuning small-sized VLMs.
🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of-the-art methods, innovative applications, and key advanceme…
[CSUR 2025] Continual Learning of Large Language Models: A Comprehensive Survey
Multimodal Large Language Model (MLLM) Tuning Survey: Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Lightweight coding agent that runs in your terminal
mycfhs / finetune-LISA
Forked from dvlab-research/LISAfinetune code for "LISA: Reasoning Segmentation via Large Language Model"
Personalized Fragrance Recommendation for Aromatherapy: A Machine Learning Approach Based on Personality Traits and Electrodermal Activity
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
Hackable and optimized Transformers building blocks, supporting a composable construction.
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Tracking and collecting papers/projects/others related to Segment Anything.
Train transformer language models with reinforcement learning.
📚 A collection of papers about Referring Image Segmentation.
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoking search tools, reading papers, and selecting relevant refe…
This is the repository for the Tool Learning survey.
RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology.
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)