Lists (1)
Sort Name ascending (A-Z)
Stars
[TGRS'25] AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval
[IGARSS 2025 Oral] A Simple Aerial Detection Baseline of Multimodal Language Models.
【Numbered musical notation tools】je 简谱 处理工具,包括转调、播放、制谱、midi提取(转换)与制作等
Solve Visual Understanding with Reinforced VLMs
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
[ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
[TPAMI] Oriented object detection on STAR dataset.
A Survey on Vision-Language Geo-Foundation Models (VLGFMs)
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
Official implementation of the CVPR23 paper: Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
JDet is an object detection benchmark based on Jittor. Mainly focus on aerial image object detection (oriented object detection).