Stars
Measures the latency between CPU cores
A community-oriented list of useful NUMA-related libraries, tools, and other resources
Loop Kernel Analysis and Performance Modeling Toolkit
Parallel solvers for sparse linear systems featuring multigrid methods.
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
A tutorial on RDMA based programming using code examples
Large Language Model Text Generation Inference
Universal LLM Deployment Engine with ML Compilation
High Performance Linpack for Next-Generation AMD HPC Accelerators
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Port of OpenAI's Whisper model in C/C++
Scheduler for sub-node tasks for HPC systems with batch scheduling
Repository contains scripts to run Graph500 benchmark on Salomon cluster
The simplest way to run LLaMA on your local machine
High accuracy RAG for answering questions from scientific documents with citations
Zstandard - Fast real-time compression algorithm
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
The book "Performance Analysis and Tuning on Modern CPU"
A benchmark for low-level CPU micro-architectural features
🦜🔗 Build context-aware reasoning applications