Lists (1)
Sort Name ascending (A-Z)
Starred repositories
🌐 WebWalker [ACL2025] & WebDancer [Preprint]
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Recipes to train reward model for RLHF.
Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Code for the paper Fine-Tuning Language Models from Human Preferences
The official Python SDK for Model Context Protocol servers and clients
Damn Vulnerable MCP Server
✨First Open-Source R1-like Video-LLM [2025/02/18]
Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
A Text-guided Protein Design Framework, Nat Mach Intell 2025 (https://www.nature.com/articles/s42256-025-01011-z)
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Playwright Model Context Protocol Server - Tool to automate Browsers and APIs in Claude Desktop, Cline, Cursor IDE and More 🔌
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
Wan: Open and Advanced Large-Scale Video Generative Models
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation