crazyofapple

🀄

Dongfang Li crazyofapple

🀄

Ph.D. at Harbin Institute of Technology, Shenzhen

88 followers · 327 following

Shenzhen
crazyofapple.github.io

Achievements

Lists (1)

Sort

🔮 Future ideas

1 repository

Starred repositories

microsoft / RetrievalAttention

Python 11 1 Updated May 16, 2025

Alibaba-NLP / ZeroSearch

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Python 819 75 Updated May 13, 2025

qiuzh20 / gated_attention

The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 33 1 Updated May 13, 2025

hzy312 / knowledge-r1

IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent

Python 53 2 Updated May 13, 2025

piotrpiekos / MoSA

User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice routing providing a content-based sparse attention mechanism.

Python 14 Updated May 3, 2025

TsinghuaC3I / Awesome-RL-Reasoning-Recipes

Awesome RL Reasoning Recipes ("Triple R")

540 31 Updated May 8, 2025

NovaSky-AI / SkyRL

SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning

Python 281 21 Updated May 15, 2025

bigai-ai / ICE

The code for Consistent In-Context Editing, an approach for tuning language models through contextual distributions, overcoming the limitations of traditional fine-tuning methods that learn towards…

Python 29 2 Updated Apr 2, 2025

Infini-AI-Lab / MagicPIG

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 207 16 Updated Dec 16, 2024

li-plus / flash-preference

Accelerate LLM preference tuning via prefix sharing with a single line of code

Python 41 Updated Apr 30, 2025

AIoT-MLSys-Lab / D2O

[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Python 14 Updated Mar 18, 2025

hemingkx / Awesome-Efficient-Reasoning

Paper list for Efficient Reasoning.

435 14 Updated May 14, 2025

Eclipsess / Awesome-Efficient-Reasoning-LLMs

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

384 11 Updated May 13, 2025

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,478 48 Updated May 15, 2025

huangyuxiang03 / Locret

Python 13 Updated Oct 3, 2024

NVIDIA / Star-Attention

Efficient LLM Inference over Long Sequences

Python 372 19 Updated Apr 29, 2025

thu-ml / SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 549 34 Updated May 14, 2025

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1,493 105 Updated May 2, 2025

OswaldHe / HMT-pytorch

[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"

Python 69 5 Updated Feb 6, 2025

66RING / CritiPrefill

Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".

Python 14 1 Updated Sep 15, 2024

ByteDance-Seed / Seed-Thinking-v1.5

755 11 Updated Apr 20, 2025

facebookresearch / transformer-sequential

Trains Transformer model variants. Data isn't shuffled between batches.

Python 143 18 Updated Oct 5, 2022

OpenNLPLab / lightning-attention

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 286 21 Updated Feb 23, 2025

UCSB-NLP-Chang / KVLink

Python 14 Updated Apr 11, 2025

Workday / cpc

Python 12 Updated Jan 16, 2025

IsaacRe / vllm-kvcompress

KV cache compression for high-throughput LLM inference

Python 127 5 Updated Feb 5, 2025

DerrickYLJ / TidalDecode

[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Python 37 3 Updated Apr 18, 2025

HKUNLP / STRING

[ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"

Python 75 3 Updated Nov 25, 2024

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 2,948 222 Updated May 16, 2025

zhixuan-lin / forgetting-transformer

[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"

Python 101 3 Updated May 15, 2025

Dongfang Li crazyofapple

Lists (1)

🔮 Future ideas

Starred repositories

explainable-ml