8000 kabicm / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View kabicm's full-sized avatar

Highlights

  • Pro

Block or report kabicm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
91 results for source starred repositories
Clear filter

PyTorch training at CSCS

Jupyter Notebook 15 13 Updated Jun 4, 2025

Compression for Foundation Models

Jupyter Notebook 32 3 Updated Mar 26, 2025

Dirigent: Lightweight Serverless Orchestration

Go 37 5 Updated Dec 8, 2024

CUDA benchmarks for measuring GPU utilization and interference

Cuda 10 1 Updated Feb 11, 2025

A prototype of using ibis-substrait to compile against a substrait extension

Python 2 Updated Apr 11, 2023

Distributed Communication-Optimal LU-factorization Algorithm

C++ 12 3 Updated Aug 1, 2021

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,326 171 Updated Jun 3, 2025

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.

C++ 48 14 Updated Jun 4, 2025

Neovim config for the lazy

Lua 21,107 1,491 Updated May 12, 2025

Spiking neuron integration for PyTorch

Python 41 6 Updated Mar 18, 2025

Google Research

Jupyter Notebook 35,693 8,096 Updated Jun 5, 2025
Python 74 5 Updated May 4, 2021

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 478 31 Updated Mar 18, 2025
Jupyter Notebook 60 3 Updated Mar 4, 2022

Extending JAX with custom C++ and CUDA code

Python 395 23 Updated Aug 18, 2024

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,602 705 Updated Jun 5, 2025

Making large AI models cheaper, faster and more accessible

Python 40,931 4,520 Updated Jun 5, 2025

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 9,292 1,367 Updated Jun 4, 2025

Model parallel transformers in JAX and Haiku

Python 6,334 886 Updated Jan 21, 2023

Fast and memory-efficient exact attention

Python 17,686 1,721 Updated Jun 4, 2025

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Jupyter Notebook 128 32 Updated Jul 25, 2024

Trax — Deep Learning with Clear Code and Speed

Python 8,219 828 Updated Apr 10, 2025

ML-Perf HPC WG Implementation of Mesh-Tensorflow and (buildscripts) for Tensorflow with MPI

Python 4 1 Updated Oct 18, 2019

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

C++ 32 10 Updated Apr 2, 2025

Distributed Communication-Optimal Shuffle and Transpose Algorithm

C++ 13 4 Updated May 6, 2025

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

C++ 206 29 Updated May 8, 2025

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Python 788 61 Updated Apr 24, 2023

Transformer related optimization, including BERT, GPT

C++ 6,183 904 Updated Mar 27, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,727 4,408 Updated Jun 5, 2025
Next
0