8000 crazyofapple (Dongfang Li) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View crazyofapple's full-sized avatar
🀄
🀄

Block or report crazyofapple

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 11 1 Updated May 16, 2025

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Python 819 75 Updated May 13, 2025

The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 33 1 Updated May 13, 2025

IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent

Python 53 2 Updated May 13, 2025

User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice routing providing a content-based sparse attention mechanism.

Python 14 Updated May 3, 2025

Awesome RL Reasoning Recipes ("Triple R")

540 31 Updated May 8, 2025

SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning

Python 281 21 Updated May 15, 2025

The code for Consistent In-Context Editing, an approach for tuning language models through contextual distributions, overcoming the limitations of traditional fine-tuning methods that learn towards…

Python 29 2 Updated Apr 2, 2025

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 207 16 Updated Dec 16, 2024

Accelerate LLM preference tuning via prefix sharing with a single line of code

Python 41 Updated Apr 30, 2025

[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Python 14 Updated Mar 18, 2025

Paper list for Efficient Reasoning.

435 14 Updated May 14, 2025

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

384 11 Updated May 13, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,478 48 Updated May 15, 2025
Python 13 Updated Oct 3, 2024

Efficient LLM Inference over Long Sequences

Python 372 19 Updated Apr 29, 2025

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 549 34 Updated May 14, 2025

Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,493 105 Updated May 2, 2025

[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"

Python 69 5 Updated Feb 6, 2025

Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".

Python 14 1 Updated Sep 15, 2024

Trains Transformer model variants. Data isn't shuffled between batches.

Python 143 18 Updated Oct 5, 2022

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 286 21 Updated Feb 23, 2025
Python 14 Updated Apr 11, 2025
Python 12 Updated Jan 16, 2025

KV cache compression for high-throughput LLM inference

Python 127 5 Updated Feb 5, 2025

[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Python 37 3 Updated Apr 18, 2025

[ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"

Python 75 3 Updated Nov 25, 2024

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,948 222 Updated May 16, 2025

[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"

Python 101 3 Updated May 15, 2025
Next
0