8000 GitHub - maitiSoutrik/100-days-of-cuda: My journey through the 100 Days of CUDA Challenge
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

maitiSoutrik/100-days-of-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

100 Days of CUDA Challenge

Deploy to Jetson Nano

This repository tracks my journey through the 100 Days of CUDA Challenge. Each day, I'll be coding CUDA kernels and documenting my progress.

About the Challenge

The 100 Days of CUDA Challenge is about consistently coding CUDA kernels for 100 days without any gaps. The challenge encourages learning and practicing GPU programming using NVIDIA's CUDA platform.

Resources

Hardware

I'll be developing code on my laptop and running it on a Jetson Nano for testing and execution.

Progress

Day Date Description Code
1 2025-03-10 Getting Started with CUDA - Vector Addition Link
2 2025-03-11 Matrix Addition in CUDA Link
3 2025-03-12 Matrix Multiplication in CUDA Link
4 2025-03-13 Parallel Reduction - Partial Sum Link
5 2025-03-14 Layer Normalization in CUDA Link
6 2025-03-15 Matrix Transpose with CPU/GPU Benchmarking Link
7 2025-03-16 1D & 2D Convolution in CUDA Link
8 2025-03-17 Parallel Prefix Sum - Exclusive Scan Link
9 2025-03-18 Flash Attention Forward Pass Link
10 2025-03-19 Sparse Matrix-Vector Multiplication (SpMV) Link
11 2025-03-20 Merge Sort with CUDA Link
12 2025-03-21 Breadth-First Search (BFS) with CUDA Link
13 2025-03-22 Optimized BFS with Shared Memory Link
14 2025-03-23 Fractional Hausdorff Distance (FHD) for Image Processing Link
15 2025-03-24 Convolutional Neural Network (CNN) in CUDA Link
16 2025-03-25 Parallel Particle System Simulation Link
17 2025-03-26 Naive Bayes Classifier Training Link
18 2025-03-27 Matrix Multiplication using CUBLAS Link
19 2025-03-28 Fast Fourier Transform (FFT) Implementation Link
20 2025-03-29 Monte Carlo Option Pricing with CUDA Link
21 2025-03-30 Particle Swarm Optimization (PSO) with CUDA Link
22 2025-03-31 CUDA-accelerated Reinforcement Learning (Q-Learning) Link
23 2025-04-01 Genetic Algorithm Optimization with CUDA Link
24 2025-04-02 Gated Linear Unit (GLU) Implementation Link
25 2025-04-03 Parallel Point Cloud PassThrough Filter Link
26 2025-04-04 Kernel Density Estimation (KDE) Link
27 2025-04-05 Mirror Descent (STE) for Quantization Link
28 2025-04-06 Mini-Batch SGD for Linear Regression Link
29 2025-04-07 K-Means Assignment Step (File Input) Link
30 2025-04-08 Headless Camera Processing (Grayscale + Avg Intensity) Link
31 2025-04-09 2D Heat Simulation (Basic vs Shared Memory) Link
32 2025-04-10 CUDA Streams for Overlap (Matrix Multiply) Link
33 2025-04-11 Parallel Reduction Optimization (Warp Shuffle) Link
34 2025-04-12 Point Cloud Voxel Grid Filter (Atomics) Link
35 2025-04-13 Kalman Filter Prediction Step (cuBLAS) Link
36 2025-04-14 SpMV with cuSPARSE Link
37 2025-04-15 Simple NN Forward Pass (GEMM + Activation) Link
38 2025-04-16 Batch Normalization Kernel (Forward Pass) Link
39 2025-04-17 Thrust Library Basics Link
40 2025-04-18 Image Interpolation (Texture Memory) Link
41 2025-04-19 Parallel Radix Sort (Basic Single Pass) Link
42 2025-04-20 N-Body Simulation Optimization (Shared Memory) Link
43 2025-04-21 Simple cuDNN Convolution (Forward) Link
44 2025-04-22 Occupancy Grid Mapping Update Link
45 2025-04-23 Optical Flow Gradient Step (Lucas-Kanade) Link
46 2025-04-24 Simple Backpropagation Step (Fully Connected Layer) Link
47 2025-04-25 Dynamic Parallelism (Simple Example) Link
48 2025-04-26 Parallel AABB Collision Detection Link
49 2025-04-27 Mini-Project: Perception Pipeline (Grayscale -> Blur -> Sobel -> Reduction) Link
50 2025-04-28 Unit Testing CUDA Kernels with Google Test Link
51 2025-04-29 Exploring TensorRT (Simple ONNX Inference) Link
52 2025-04-30 Minimal GRU (minGRU) with Parallel Scan Link
53 2025-05-01 Bidirectional LSTM Implementation Link
54 2025-05-02 AdaHessian Optimizer Kernel Link
55 2025-05-03 Quantization Comparison (FP32/FP16/SimFP8) Link
56 2025-05-04 Mish Activation Function Benchmark Link
57 2025-05-05 Conjugate Gradient Method (CGM) using cuBLAS Link
58 2025-05-06 Bitonic Sort with Shared Memory Optimization Link
59 2025-05-07 Basic Ray Tracing with CUDA Link
60 2025-05-08 Muon Optimization - Newton-Schulz Iteration Link
61 2025-05-09 Fisher Information Matrix Link
62 2025-05-10 Batched Vector L2 Norm (Shared Memory Reduction) Link
63 2025-05-11 Parallel Markov Chain Clustering for Robot Localization Link
64 2025-05-12 Spectral Normalization in GANs (cuBLAS Power Iteration) Link
65 2025-05-13 GEGLU Activation Function Implementation Link
66 2025-05-14 GPU-Accelerated MFCC Feature Extraction Link
67 2025-05-15 SwiGLU Activation and Gradient Computation Link
68 2025-05-16 LoRA Implementation and Benchmarking Link
69 2025-05-17 Parallel Password Cracking (FNV-1a) Link
70 2025-05-18 Mean Squared Error (MSE) Calculation Link
71 2025-05-19 Group Normalization Forward Pass Link
72 2025-05-20 Total Variation Distance (TVD) Loss Link
73 2025-05-21 1D Rotary Positional Embedding (RoPE) Link
74 2025-05-22 2D Rotary Positional Embeddings (RoPE-2D) in CUDA Link
75 2025-05-23 Fused Linear Transformation and Softmax Cross-Entropy Loss Link
76 2025-05-24 Contrastive Loss (Forward & Backward) Link
77 2025-05-25 Huber Loss Implementation in CUDA Link
78 2025-05-26 Dynamic Tanh (DyT) Operation Link
79 2025-05-28 Upper Triangular Matrix Multiplication Link
80 2025-05-29 Matrix Multiplication with Swish Activation and Scaling Link
81 2025-05-30 Generalized Jensen-Shannon Divergence Loss (Forward & Backward) Link
82 2025-05-31 Negative Cosine Similarity (Cosine Distance) Link
83 2025-05-31 Minimum Reduction Over a Specific Dimension Link
84 2025-06-01 Cumulative Product (Prefix Product / Scan) Link
85 2025-06-02 Tensor-Matrix Multiplication Link
86 2025-06-03 Hard Sigmoid Activation Function Link
87 2025-06-03 Softplus Activation Function Link
88 2025-06-05 Warp-Level Programming - Warp Sum Reduction Link
89 2025-06-05 Memory Coalescing Demonstration Link
90 2025-06-06 Frobenius Norm in CUDA Link
91 2025-06-07 Hinge Loss Implementation in CUDA Link
92 2025-06-08 ELU Activation Function Implementation in CUDA Link
93 2025-06-09 RMS Normalization Implementation in CUDA Link
94 2025-06-10 CUDA Implementation of Forward and Simplified Reverse Diffusion Steps Link
95 2025-06-12 Barnsley Fern Fractal Generator (PPM Output) Link
96 2025-06-15 Product Reduction along a Tensor Dimension (Bugfix & Test) Link

Rules

  1. Code CUDA kernels consistently for 100 days without any gaps
  2. Document what I did each day
  3. Every 10 days, I can claim a badge from the challenge
  4. No code, no badge, no challenge

About

My journey through the 100 Days of CUDA Challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0