8000 kabicm / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View kabicm's full-sized avatar

Highlights

  • Pro

Block or report kabicm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Compression for Foundation Models

Jupyter Notebook 31 3 Updated Mar 26, 2025

Dirigent: Lightweight Serverless Orchestration

Go 37 5 Updated Dec 8, 2024

CUDA benchmarks for measuring GPU utilization and interference

Cuda 9 1 Updated Feb 11, 2025

A prototype of using ibis-substrait to compile against a substrait extension

Python 2 Updated Apr 11, 2023

Distributed Communication-Optimal LU-factorization Algorithm

C++ 12 3 Updated Aug 1, 2021

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,318 171 Updated May 21, 2025

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.

C++ 48 13 Updated May 16, 2025

Neovim config for the lazy

Lua 20,899 1,477 Updated May 12, 2025

Spiking neuron integration for PyTorch

Python 41 6 Updated Mar 18, 2025

Google Research

Jupyter Notebook 35,585 8,089 Updated May 13, 2025
Python 72 5 Updated May 4, 2021

Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python ⚡

Python 477 31 Updated Mar 18, 2025
Jupyter Notebook 59 3 Updated Mar 4, 2022

Extending JAX with custom C++ and CUDA code

Python 394 23 Updated Aug 18, 2024

Long Range Arena for Benchmarking Efficient Transformers

Python 756 85 Updated Dec 16, 2023

Flax is a neural network library for JAX that is designed for flexibility.

Jupyter Notebook 6,560 700 Updated May 19, 2025

Making large AI models cheaper, faster and more accessible

Python 40,891 4,509 Updated May 21, 2025

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Python 9,253 1,367 Updated Mar 28, 2025

Model parallel transformers in JAX and Haiku

Python 6,332 886 Updated Jan 21, 2023

Fast and memory-efficient exact attention

Python 17,440 1,691 Updated May 19, 2025

Training and serving large-scale neural networks with auto parallelization.

Python 3,131 359 Updated Dec 9, 2023

Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020

Jupyter Notebook 128 32 Updated Jul 25, 2024

Trax — Deep Learning with Clear Code and Speed

Python 8,207 826 Updated Apr 10, 2025

ML-Perf HPC WG Implementation of Mesh-Tensorflow and (buildscripts) for Tensorflow with MPI

Python 4 1 Updated Oct 18, 2019

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

C++ 32 10 Updated Apr 2, 2025

Distributed Communication-Optimal Shuffle and Transpose Algorithm

C++ 13 4 Updated May 6, 2025

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

C++ 205 29 Updated May 8, 2025

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Python 785 61 Updated Apr 24, 2023

Transformer related optimization, including BERT, GPT

C++ 6,161 905 Updated Mar 27, 2024
Next
0