Root970103

Molly Root970103

Stars

Darwin-lfl / langmanus

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search…

Python 792 1,077 Updated Mar 17, 2025

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 57,525 2,948 Updated Apr 13, 2025

modelscope / evalscope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Python 982 107 Updated May 19, 2025

harleyszhang / llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python 772 78 Updated May 18, 2025

songquanpeng / one-api

LLM API 管理 & 分发系统，支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型，统一 API 适配，可用于 key 管理与二次分发。单可执行文件，提供 Docker 镜像，一键部署，开箱即用。LLM API management & k…

JavaScript 25,295 5,178 Updated Feb 21, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,295 169 Updated Mar 28, 2025

dvlab-research / Lyra

Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"

Python 285 28 Updated Jan 9, 2025

usefulsensors / moonshine

Fast and accurate automatic speech recognition (ASR) for edge devices

Python 2,705 142 Updated May 12, 2025

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 10,538 1,052 Updated May 8, 2025

lifeiteng / OmniSenseVoice

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Python 845 34 Updated Mar 7, 2025

MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

551 20 Updated May 8, 2025

NVlabs / VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,243 263 Updated May 6, 2025

EvolvingLMMs-Lab / LongVA

Long Context Transfer from Language to Vision

Python 374 18 Updated Mar 18, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,377 363 Updated May 17, 2025

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,236 262 Updated Jan 18, 2025

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,243 234 Updated Dec 3, 2024

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,160 80 Updated Jan 23, 2025

AnswerDotAI / byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

Python 785 84 Updated Jan 28, 2025

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 33,698 2,708 Updated May 19, 2025

OpenBMB / VisRAG

Parsing-free RAG supported by VLMs

Python 707 57 Updated Feb 19, 2025

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,256 2,622 Updated Mar 4, 2025

illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 1,860 161 Updated May 15, 2025

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,562 660 Updated Feb 10, 2025

connectaman / RAGAlchamy

Summarize and perform RAG on PPTx/PPT file formats

Jupyter Notebook 17 2 Updated Oct 14, 2024

tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)

C++ 66,927 9,891 Updated May 2, 2025

ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 28,944 1,969 Updated Apr 28, 2025

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

TypeScript 52,848 5,059 Updated May 19, 2025

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Python 9,660 700 Updated May 19, 2025

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 588 61 Updated Apr 6, 2025

jeinlee1991 / chinese-llm-benchmark

目前已囊括232个大模型，覆盖chatgpt、gpt-4o、o3-mini、谷歌gemini、Claude3.5、智谱GLM-Zero、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型，以及DeepSeek-R1、qwq-32b、deepseek-v3、qwen2.5、llama3.3、phi-4、glm4、gemma3、mistral、书生in…

4,229 176 Updated May 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly