8000 kennykguo (Kenny Guo) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View kennykguo's full-sized avatar
♠️
♠️

Highlights

  • Pro

Block or report kennykguo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pipelined FPGA design to accelerate encoder-only transformer inference, implemented on a Xilinx board.

Python 7 Updated May 3, 2025

From the Transistor to the Web Browser, a rough outline for a 12 week course

C++ 229 8 Updated May 21, 2024

A detailed guide to the xv6 code.

269 30 Updated May 2, 2023

📚 Learn to write an embedded OS in Rust 🦀

Rust 14,285 837 Updated Feb 10, 2024

A driving dataset for the development and validation of fused pose estimators and mapping algorithms

Jupyter Notebook 603 123 Updated May 20, 2024

An experimental pure-Rust x86 bootloader

Rust 1,509 215 Updated Jun 6, 2025

minimal cross-platform standalone C headers

C 8,417 568 Updated Jun 30, 2025

Efficient Triton Kernels for LLM Training

Python 5,287 361 Updated Jul 2, 2025

Just coding 100 CUDA kernels

Cuda 1 Updated May 25, 2025

LSi - Autonomous RL Agent for Microgrid Management using Proximal Policy Optimization - https://autogrid-dashboard.vercel.app/dashboard

Jupyter Notebook 3 Updated Jun 20, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,637 872 Updated Apr 29, 2025

Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation

1,564 166 Updated May 29, 2025

A 16-bit RISC CPU with 32 instructions built with Digital for running on an FPGA.

Verilog 122 18 Updated Sep 24, 2022

Transformer Architecture written with CUDA, C++ and LibTorch.

C++ 7 Updated May 1, 2025

Here's all my Python/Numba (CUDA) code for the encoder block I made :)

Python 65 10 Updated Apr 28, 2025

An open source GPU based off of the AMD Southern Islands ISA.

Verilog 1,187 247 Updated Sep 25, 2017

OpenSource GPU, in Verilog, loosely based on RISC-V ISA

SystemVerilog 1,023 117 Updated Nov 22, 2024

bare metal neural network

Python 4 Updated Dec 9, 2024

Rendering rudimentary 3D meshes on a DE1-SoC FPGA by use of a VGA display using verilog.

Verilog 2 1 Updated Dec 3, 2023

Ghidra Plugin for AMD's F32 Processor

C 5 Updated Apr 6, 2025

build-once run-anywhere c library

C 19,512 690 Updated May 21, 2025

Collection of leaked system prompts

11,171 1,441 Updated Jun 28, 2025
C 601 202 Updated Aug 12, 2014

C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library

C 7,144 1,712 Updated Jun 10, 2025

Marvin: A Minimalist GPU-only N-Dimensional ConvNets Framework

C++ 425 137 Updated Mar 21, 2018

Repository for the book

Python 1,817 694 Updated May 20, 2024

Fast and memory-efficient exact attention

Python 18,118 1,778 Updated Jul 2, 2025

Development repository for the Triton language and compiler

MLIR 16,003 2,086 Updated Jul 2, 2025

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 558 30 Updated Jun 17, 2025
Next
0