8000 Root970103 (Molly) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Root970103's full-sized avatar

Block or report Root970103

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search…

Python 792 1,077 Updated Mar 17, 2025

Python tool for converting files and office documents to Markdown.

Python 57,525 2,948 Updated Apr 13, 2025

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

Python 982 107 Updated May 19, 2025

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python 772 78 Updated May 18, 2025

LLM API 管理 & 分发系统,支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型,统一 API 适配,可用于 key 管理与二次分发。单可执行文件,提供 Docker 镜像,一键部署,开箱即用。LLM API management & k…

JavaScript 25,295 5,178 Updated Feb 21, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,295 169 Updated Mar 28, 2025

Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"

Python 285 28 Updated Jan 9, 2025

Fast and accurate automatic speech recognition (ASR) for edge devices

Python 2,705 142 Updated May 12, 2025

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 10,538 1,052 Updated May 8, 2025

Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯

Python 845 34 Updated Mar 7, 2025

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

551 20 Updated May 8, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,243 263 Updated May 6, 2025

Long Context Transfer from Language to Vision

Python 374 18 Updated Mar 18, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 2,377 363 Updated May 17, 2025

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 3,236 262 Updated Jan 18, 2025

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,243 234 Updated Dec 3, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,160 80 Updated Jan 23, 2025

Use late-interaction multi-modal models such as ColPali in just a few lines of code.

Python 785 84 Updated Jan 28, 2025

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 33,698 2,708 Updated May 19, 2025

Parsing-free RAG supported by VLMs

Python 707 57 Updated Feb 19, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,256 2,622 Updated Mar 4, 2025

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 1,860 161 Updated May 15, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,562 660 Updated Feb 10, 2025

Summarize and perform RAG on PPTx/PPT file formats

Jupyter Notebook 17 2 Updated Oct 14, 2024

Tesseract Open Source OCR Engine (main repository)

C++ 66,927 9,891 Updated May 2, 2025

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 28,944 1,969 Updated Apr 28, 2025

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

TypeScript 52,848 5,059 Updated May 19, 2025

Retrieval and Retrieval-augmented LLMs

Python 9,660 700 Updated May 19, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 588 61 Updated Apr 6, 2025

目前已囊括232个大模型,覆盖chatgpt、gpt-4o、o3-mini、谷歌gemini、Claude3.5、智谱GLM-Zero、文心一言、qwen-max、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及DeepSeek-R1、qwq-32b、deepseek-v3、qwen2.5、llama3.3、phi-4、glm4、gemma3、mistral、书生in…

4,229 176 Updated May 17, 2025
Next
0