-
Shanghai Jiao Tong University
- Shanghai
- https://aster2024.github.io/
- https://orcid.org/0009-0001-0699-9164
Highlights
- Pro
Stars
《EasyOffer》(<大模型面经合集>)是针对LLM宝宝们量身打造的大模型暑期实习Offer指南,主要记录大模型暑期实习和秋招准备的一些常见大厂手撕代码、大厂面经经验、常见大厂思考题等;小白一个,正在学习ing......有问题各位大佬随时指正,希望大家都能拿到心仪Offer!
Offline markdown to pdf, choose -> edit -> transform 🥂
a state-of-the-art-level open visual language model | 多模态预训练模型
Tips for releasing research code in Machine Learning (with official NeurIPS 2020 recommendations)
FlashMLA: Efficient MLA decoding kernels
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
A recipe for online RLHF and online iterative DPO.
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
APPS: Automated Programming Progress Standard (NeurIPS 2021)
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
Democratizing Reinforcement Learning for LLMs
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
The first differentially-private diffusion model for tabular data
Implementation of the paper: "FedTabDiff: Federated Learning of Diffusion Models for Synthetic Mixed-Type Tabular Data Generation"
official implementation of paper "Process Reward Model with Q-value Rankings"
WildEval / ZeroEval
Forked from allenai/WildBenchA simple unified framework for evaluating LLMs
Implementation of self-certainty as an extention of ZeroEval Project
Recipes to train reward model for RLHF.
No fortress, purely open ground. OpenManus is Coming.
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."