8000 windz0629 / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View windz0629's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Baidu
  • Beijing

Block or report windz0629

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tile primitives for speedy kernels

Cuda 2,497 158 Updated Jul 3, 2025

C++ implementation of the Python Numpy library

C++ 3,834 579 Updated May 30, 2025

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 851 70 Updated Sep 4, 2024

A technical report on convolution arithmetic in the context of deep learning

TeX 14,384 2,296 Updated Jun 8, 2023

VSCode extension for code suggestion

JavaScript 481 47 Updated Jul 1, 2023

A simple tool that can generate TensorRT plugin code quickly.

Python 232 36 Updated Jul 11, 2023

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,654 381 Updated Apr 1, 2025

Concurrent CPU-GPU Programming using Task Models

C++ 103 11 Updated Dec 19, 2019

heterogeneity-aware-lowering-and-optimization

C++ 255 75 Updated Jan 20, 2024

Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators

Python 112 12 Updated Oct 26, 2022
Python 40 7 Updated Mar 31, 2022

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

C++ 5,452 641 Updated Jul 2, 2025

Deep Learning tools and applications for NVIDIA AGX platforms.

Shell 231 48 Updated Jun 19, 2025

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,975 763 Updated Feb 8, 2024

Online CUDA Occupancy Calculator

CoffeeScript 77 12 Updated Oct 12, 2021

User space software for Intel(R) Resource Director Technology

C 719 187 Updated Jun 24, 2025

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

Python 2,794 364 Updated Jul 4, 2025

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ 17,106 3,318 Updated Jul 4, 2025

This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.

Python 4,092 321 Updated Jul 4, 2025

OpenMMLab Computer Vision Foundation

Python 6,174 1,701 Updated Apr 25, 2025

Autoware - the world's leading open-source software project for autonomous driving

Dockerfile 10,137 3,289 Updated Jul 3, 2025

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Python 14,358 4,471 Updated Jul 1, 2025

10x faster matrix and vector operations

C++ 2,491 174 Updated Oct 12, 2022

My fork of Alex Krizhevsky's cuda-convnet from 2013 where I added dropout, among other features.

Cuda 260 147 Updated Jan 23, 2015

Protothread.h, a C++ port of Adam Dunkels' protothreads library

C++ 196 42 Updated Sep 15, 2023

Intel TBB with CMake build system

C++ 387 168 Updated Jun 24, 2022

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 43,871 5,493 Updated May 8, 2025

A basic implementation of the D* lite algorithm

C++ 123 43 Updated Sep 30, 2015

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch

Jupyter Notebook 255 91 Updated Oct 1, 2020

Some basic examples of playing with RL

Python 1,243 303 Updated Jan 9, 2025
Next
0