kabicm

kabicm

25 followers · 142 following

Achievements

Highlights

Lists (1)

Sort

🔮 Future ideas

Stars

91 results for source starred repositories

Clear filter

eth-cscs / pytorch-training

PyTorch training at CSCS

Jupyter Notebook 15 13 Updated Jun 4, 2025

eth-easl / deltazip

Compression for Foundation Models

Jupyter Notebook 32 3 Updated Mar 26, 2025

eth-easl / dirigent

Dirigent: Lightweight Serverless Orchestration

Go 37 5 Updated Dec 8, 2024

eth-easl / gpu-util-interference

CUDA benchmarks for measuring GPU utilization and interference

Cuda 10 1 Updated Feb 11, 2025

drin / mohair-extension

A prototype of using ibis-substrait to compile against a substrait extension

Python 2 Updated Apr 11, 2023

eth-cscs / conflux

Distributed Communication-Optimal LU-factorization Algorithm

C++ 12 3 Updated Aug 1, 2021

brunoroca260894 / CSCS-Internship-2022

C++ 2 Updated Sep 5, 2022

substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,326 171 Updated Jun 3, 2025

RMGDFT / rmgdft

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.

C++ 48 14 Updated Jun 4, 2025

LazyVim / LazyVim

Neovim config for the lazy

Lua 21,107 1,491 Updated May 12, 2025

nengo / pytorch-spiking

Spiking neuron integration for PyTorch

Python 41 6 Updated Mar 18, 2025

google-research / google-research

Google Research

Jupyter Notebook 35,693 8,096 Updated Jun 5, 2025

zhuohan123 / terapipe

Python 74 5 Updated May 4, 2021

mpi4jax / mpi4jax

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 478 31 Updated Mar 18, 2025

sholtodouglas / scalingExperiments

Jupyter Notebook 60 3 Updated Mar 4, 2022

dfm / extending-jax

Extending JAX with custom C++ and CUDA code

Python 395 23 Updated Aug 18, 2024

google / flax

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,602 705 Updated Jun 5, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,931 4,520 Updated Jun 5, 2025

facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 9,292 1,367 Updated Jun 4, 2025

kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Python 6,334 886 Updated Jan 21, 2023

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,686 1,721 Updated Jun 4, 2025

stanford-futuredata / gavel

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Jupyter Notebook 128 32 Updated Jul 25, 2024

google / trax

Trax — Deep Learning with Clear Code and Speed

Python 8,219 828 Updated Apr 10, 2025

jbalma / mesh-transformer-mpi

ML-Perf HPC WG Implementation of Mesh-Tensorflow and (buildscripts) for Tensorflow with MPI

Python 4 1 Updated Oct 18, 2019

eth-cscs / Tiled-MM

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

C++ 32 10 Updated Apr 2, 2025

eth-cscs / COSTA

Distributed Communication-Optimal Shuffle and Transpose Algorithm

C++ 13 4 Updated May 6, 2025

eth-cscs / COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

C++ 206 29 Updated May 8, 2025

tunib-ai / parallelformers

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Python 788 61 Updated Apr 24, 2023

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,183 904 Updated Mar 27, 2024

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,727 4,408 Updated Jun 5, 2025