⚡Software Engineer | Systems Optimization Enthusiast | Performance Tuner🚀
- 💻 Software engineer focused on optimizing runtime performance
- ⚙️ Specialize in systems optimization, memory management, concurrency, and parallel programming
- 🚀 Expertise in cache management, multi-threading, and performance enhancement in distributed systems
- 🎓 Master of Engineering in Computer Science from Cornell University
- 🛠 Operating Systems, Concurrency Programming, Thread Programming
- ⚡ Parallel Programming and Distributed Systems
- Languages: C++11/17/20, C, Python, Java, Shell Script
- Systems and Databases: Linux, FreeBSD, UNIX, MacOS, Windows, MySQL, PostgreSQL, SQL
- Tools and APIs: Git, GitHub, Vim, GDB, Valgrind, gprof, GCP, Vim, STL, OpenMP, OpenMPI, PyTorch, GCP, Jenkins, Android NDK, numpy, Matplotlib, Pandas, ReactJs
- Skills: Algorithms, Data Structure, Thread Programming, Concurrency Programming, Parallel Programming, Multi-Threading, System Optimization, Memory Management
My work on projects like cache replacement policies, thread management systems, and optimizing data structures for operating system components has been recognized for its innovation and impact on system efficiency. I'm also proud of my contributions to open-source projects, where I've applied my expertise to tackle complex challenges in system performance and reliability.
- egos-2000: A minimal operating system (2K LOC) on QEMU and a RISC-V board
- Netgraph Epochization for FreeBSD (🚧 Work in Progress): Re-engineers the kernel’s Netgraph packet path to be lock-free with epoch-based reclamation, slashing contention and scaling cleanly across modern multi-core CPUs.
- mini-migration: Mini-Migration is a Cross-platform resumable file-transfer tool. C++17 core, Objective-C++ macOS layer; built for Apple Backup & Migration workflows. Status:
v0.31
(feature-complete, frozen) → now targeting v0.5 - Mini Malloc - Memory Allocator Implementation: A comprehensive memory allocation library implementation featuring multiple levels of sophistication, from basic first-fit allocation to security-enhanced allocators with extensive debugging capabilities.
- UltraSIMD: Ultra-fast SIMD library delivering 18.7× speedups through AVX-512 optimization, supporting F32/F16/I8/BF16 data types with 100% accuracy across 175 test cases.
- Cache Replacement Policies: This repository contains a comprehensive implementation of various cache replacement algorithms written in C.
- DSAlib
- HazardLFQ / EBRLFQ: A header-only, hazard-pointer–protected, and epoch-based reclamation, lock-free queue for C++20.
- RedLockTree: Header-only C++17 red-black tree with per-node locks—parallel look-ups, serialized writers, and a built-in stress test for heavy-load correctness.
- Edge-Buffer micro-DFS: Distributed File System: micro-DFS is a minimalist, log-structured distributed file system designed for educational purposes and edge computing scenarios. Built as a reference implementation, it demonstrates core distributed systems concepts with a focus on simplicity, performance, and reliability
- Gossip Protocol: Gossip protocol implementation in C++
- Concurrent Webserver: An implementation of a concurrent web server in C.
- Distributed Raft-based Chat Server: This project implements a distributed chat server using the Raft consensus algorithm for leader election and log replication. It features a simple key-value state machine, handles client commands, and maintains consistency across multiple nodes. The server is built with C++ and utilizes socket programming for network communication.
- Distributed Word Count System: Distributed word-count on a client–server model: the client leverages an LRU cache, thread-pooled BFS and parallel file processing to gather word counts, then sends file lists over TCP to a server that runs an in-process MapReduce—OpenMP-accelerated mappers, two-stage mutex-protected reduction and a final aggregation.
- Harmony: A Python-like programming language for testing and experimenting with concurrent programs.
- CUDA Renderer: The project involves creating a parallel renderer in CUDA for drawing colored circles.
- HazardLFQ / EBRLFQ: A header-only, hazard-pointer–protected, and epoch-based reclamation, lock-free queue for C++20.
- Parallel Wandering Salesperson (🚧 Work in Progress): Branch-and-Bound Wandering Salesman solver in C with OpenMP, leveraging a shared-memory parallel model for fast, multi-core search.
- MPI Travelling-Salesman Solver: Blazing-fast branch-and-bound TSP solver (≤ 18 cities) in single-file C using the MPI message-passing model.
- Trainium-MLAccel(🚧 Work in Progress): High-performance ML kernels for AWS Trainium, optimized vector ops, fused conv+maxpool, and data streaming tiling to maximize throughput and hardware utilization.
- PicoGPT(🚧 Work in Progress): A hands-on fork of NanoGPT with FlashAttention-2 CUDA kernels, INT8/INT4 GPTQ quantization, paged KV-cache reuse, and continuous batching, turning a tiny Shakespeare model into a full-speed GPU LLM inference demo.
- RedLockTree: Header-only C++17 red-black tree with per-node locks—parallel look-ups, serialized writers, and a built-in stress test for heavy-load correctness.
- Harmony: A Python-like programming language for testing and experimenting with concurrent programs.
- MiniProject: A GraphQL-based full-stack application integrating Go (backend), Vue.js (frontend), and MongoDB (database) using Docker for containerized deployment.
- Game 2048/8192: A browser-based 2048/8192 puzzle with an Auto-Run AI that uses iterative-deepening Minimax and α–β pruning to quickly evaluate moves under a time budget, then executes the best move repeatedly until you stop or win.
I'm always open to collaborating on projects and sharing knowledge with fellow developers. Feel free to reach out to me:
- Email: ethan.huang.ih@gmail.com
- LinkedIn: ethanhuang-ih
- GitHub: EthanCornell
Let's connect and build something amazing together!
- 🏍️ I'm a passionate motorcyclist.
- 🏍️ I've owned more than 20 motorcycles.
- 🌍 I've ridden across more than 20 cities!