kabicm

kabicm

25 followers · 142 following

Achievements

Highlights

Lists (1)

Sort

🔮 Future ideas

Stars

eth-easl / deltazip

Compression for Foundation Models

Jupyter Notebook 31 3 Updated Mar 26, 2025

eth-easl / dirigent

Dirigent: Lightweight Serverless Orchestration

Go 37 5 Updated Dec 8, 2024

eth-easl / gpu-util-interference

CUDA benchmarks for measuring GPU utilization and interference

Cuda 9 1 Updated Feb 11, 2025

drin / mohair-extension

A prototype of using ibis-substrait to compile against a substrait extension

Python 2 Updated Apr 11, 2023

eth-cscs / conflux

Distributed Communication-Optimal LU-factorization Algorithm

C++ 12 3 Updated Aug 1, 2021

brunoroca260894 / CSCS-Internship-2022

C++ 2 Updated Sep 5, 2022

substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,318 171 Updated May 21, 2025

RMGDFT / rmgdft

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.

C++ 48 13 Updated May 16, 2025

LazyVim / LazyVim

Neovim config for the lazy

Lua 20,899 1,477 Updated May 12, 2025

nengo / pytorch-spiking

Spiking neuron integration for PyTorch

Python 41 6 Updated Mar 18, 2025

google-research / google-research

Google Research

Jupyter Notebook 35,585 8,089 Updated May 13, 2025

zhuohan123 / terapipe

Python 72 5 Updated May 4, 2021

mpi4jax / mpi4jax

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 477 31 Updated Mar 18, 2025

sholtodouglas / scalingExperiments

Jupyter Notebook 59 3 Updated Mar 4, 2022

dfm / extending-jax

Extending JAX with custom C++ and CUDA code

Python 394 23 Updated Aug 18, 2024

google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers

Python 756 85 Updated Dec 16, 2023

google / flax

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,560 700 Updated May 19, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,891 4,509 Updated May 21, 2025

facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 9,253 1,367 Updated Mar 28, 2025

kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Python 6,332 886 Updated Jan 21, 2023

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,440 1,691 Updated May 19, 2025

alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.

Python 3,131 359 Updated Dec 9, 2023

stanford-futuredata / gavel

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Jupyter Notebook 128 32 Updated Jul 25, 2024

google / trax

Trax — Deep Learning with Clear Code and Speed

Python 8,207 826 Updated Apr 10, 2025

jbalma / mesh-transformer-mpi

ML-Perf HPC WG Implementation of Mesh-Tensorflow and (buildscripts) for Tensorflow with MPI

Python 4 1 Updated Oct 18, 2019

eth-cscs / Tiled-MM

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

C++ 32 10 Updated Apr 2, 2025

eth-cscs / COSTA

Distributed Communication-Optimal Shuffle and Transpose Algorithm

C++ 13 4 Updated May 6, 2025

eth-cscs / COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

C++ 205 29 Updated May 8, 2025

tunib-ai / parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Python 785 61 Updated Apr 24, 2023

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,161 905 Updated Mar 27, 2024