8000 windz0629 / Starred · GitHub

More Web Proxy on the site http://driver.im/

windz0629

Follow

🎯

Focusing

windz0629

🎯

Focusing

Follow

Interested in smart things, including robotics, autonomous cars, etc.

2 followers · 0 following

Baidu
Beijing

Stars

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,497 158 Updated Jul 3, 2025

dpilger26 / NumCpp

C++ implementation of the Python Numpy library

C++ 3,834 579 Updated May 30, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 851 70 Updated Sep 4, 2024

vdumoulin / conv_arithmetic

A technical report on convolution arithmetic in the context of deep learning

TeX 14,384 2,296 Updated Jun 8, 2023

CodedotAl / code-clippy-vscode

Forked from hieunc229/copilot-clone

VSCode extension for code suggestion

JavaScript 481 47 Updated Jul 1, 2023

NVIDIA-AI-IOT / tensorrt_plugin_generator

A simple tool that can generate TensorRT plugin code quickly.

Python 232 36 Updated Jul 11, 2023

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,654 381 Updated Apr 1, 2025

Heteroflow / Heteroflow

Concurrent CPU-GPU Programming using Task Models

C++ 103 11 Updated Dec 19, 2019

alibaba / heterogeneity-aware-lowering-and-optimization

heterogeneity-aware-lowering-and-optimization

C++ 255 75 Updated Jan 20, 2024

pku-liang / AMOS

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 112 12 Updated Oct 26, 2022

masahi / tvm-cutlass-eval

Python 40 7 Updated Mar 31, 2022

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,452 641 Updated Jul 2, 2025

NVIDIA / DL4AGX

Deep Learning tools and applications for NVIDIA AGX platforms.

Shell 231 48 Updated Jun 19, 2025

NVIDIA / thrust

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,975 763 Updated Feb 8, 2024

xmartlabs / cuda-calculator

Forked from karthikeyann/cuda-calculator

Online CUDA Occupancy Calculator

CoffeeScript 77 12 Updated Oct 12, 2021

intel / intel-cmt-cat

User space software for Intel(R) Resource Director Technology

C 719 187 Updated Jun 24, 2025

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,794 364 Updated Jul 4, 2025

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 17,106 3,318 Updated Jul 4, 2025

keylase / nvidia-patch

This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.

Python 4,092 321 Updated Jul 4, 2025

open-mmlab / mmcv

OpenMMLab Computer Vision Foundation

Python 6,174 1,701 Updated Apr 25, 2025

autowarefoundation / autoware

Autoware - the world's leading open-source software project for autonomous driving

Dockerfile 10,137 3,289 Updated Jul 3, 2025

gunthercox / ChatterBot

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Python 14,358 4,471 Updated Jul 1, 2025

dblalock / bolt

10x faster matrix and vector operations

C++ 2,491 174 Updated Oct 12, 2022

dnouri / cuda-convnet

My fork of Alex Krizhevsky's cuda-convnet from 2013 where I added dropout, among other features.

Cuda 260 147 Updated Jan 23, 2015

benhoyt / protothreads-cpp

Protothread.h, a C++ port of Adam Dunkels' protothreads library

C++ 196 42 Updated Sep 15, 2023

wjakob / tbb

Intel TBB with CMake build system

C++ 387 168 Updated Jun 24, 2022

isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 43,871 5,493 Updated May 8, 2025

ArekSredzki / dstar-lite

A basic implementation of the D* lite algorithm

C++ 123 43 Updated Sep 30, 2015

ucla-rlcourse / DeepRL-Tutorials

Forked from qfettes/DeepRL-Tutorials

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

Jupyter Notebook 255 91 Updated Oct 1, 2020

ucla-rlcourse / RLexample

Some basic examples of playing with RL

Python 1,243 303 Updated Jan 9, 2025

0