Highlights
- Pro
Lists (5)
Sort Name ascending (A-Z)
Stars
Lightweight PDF Q&A tool powered by RAG (Retrieval-Augmented Generation) with MCP (Model Context Protocol) Support.
Sharing scripts and functions for OPUS-PALA article, and LOTUS Software. All functions are usable with agreement from their owner.
RF-ULM: Ultrasound Localization Microscopy Learned from Radio-Frequency Wavefronts
中国交通警察指挥手势识别 Chinese Traffic Police Gesture Recognizer, pytorch version
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
This is the official implementation of our publication "Deep learning enables fast and dense single-molecule localization with high accuracy" (Nature Methods)
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Official repository of NeXt-TDNN for speaker verification
An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
MiniWoB++: a web interaction benchmark for reinforcement learning
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
An Autonomous LLM Agent for Complex Task Solving
DeepSeek Coder: Let the Code Write Itself
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"