Stars
A simple viewer to see Yad2 listings on a graph.
Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
DSIR large-scale data selection framework for language model training
Author implementation of the paper "Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing"
Author implementation of Global Reasoning over Database Structures for Text-to-SQL Parsing
Efficient Scaling laws and collaborative pretraining.
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
Diverse Demonstrations Improve In-context Compositional Generalization
Modeling, training, eval, and inference code for OLMo
Leveraging Code to Improve In-context Learning for Semantic Parsing
Code for the paper "A high-performance speech neuroprosthesis"
Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"
COVR dataset for evaluation of compositional generalization
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data
Code to reproduce LREC Paper Simplifying Semantic Annotations of SMCalFlow
ML Collections is a library of Python Collections designed for ML use cases.
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
Code for the paper: Finding needles in a haystack:Sampling Structurally-diverse Training Sets from Synthetic Data forCompositional Generalization
[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"
linguistics tree drawing to SVG in python, aimed at Jupyter