8000 GitHub - TechHara/gunzip: Pure Rust implementation of Gunzip from scratch in 1000 lines
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

TechHara/gunzip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gunzip

This repository features a pure Rust implementation of gunzip (decompression only), created from scratch with approximately 1000 lines of code for educational purposes. Explore branches 1 through 25 for a step-by-step understanding of the code. The main branch reflects the latest version.

The following roughly summarizes each stage

  • branch 1: main() function and skeletal structure
  • branch 2: bitread module for reading bits from a byte stream
  • branch 3: gzip header & footer for parsing the metadata and checksum
  • branch 4: inflate for block type 0 (uncompressed) data
  • branch 5: codebook and huffman_decoder modules for decoding Huffman codes
  • branch 6: lz77 and sliding_window modules for decompressing LZ77-encoded data
  • branch 7: inflate for block type 1 and 2 (compressed) data using fixed or dynamic Huffman codes
  • branch 8: checksum_write module for verifying the decompressed data
  • branch 9: performance optimization
  • branch 10: multithread support
  • branch 11: streaming support
  • branch 12: memory optimization
  • branch 13: bug fix to reject non-compliant .gz file
  • branch 14: minor optimization for block type 0
  • branch 15: performance optimization
  • branch 16: performance optimization by replacing dynamic dispatch with static dispatch
  • branch 17: performance optimization
  • branch 18: performance optimization and remove dead code
  • branch 19: performance optimization
  • branch 20: reduce buffer size and hence cache misses
  • branch 21: improve implementation for decode and HuffmanDecoder
  • branch 22: remove unnecessary heap allocation
  • branch 23: recycle heap allocation
  • branch 24: optimize write to stdout
  • branch 25: fix bug when codelength + extrabits > 24

Features

  • written from scratch in pure Rust without unsafe code
  • supports streaming, i.e., the decompressor implements Read trait
  • supports multithreading (two threads)
  • competitive in performance with other popular implementations (see Benchmark below)

Usage

Library

use gunzip::{SingleThreadDecompressor, MultiThreadDecompressor};

fn main() -> std::io::Result<()> {
    let reader = std::fs::File::open("data.bin.gz")?;
    let mut writer = std::fs::File::create("data.bin")?;
    let mut decompressor = SingleThreadDecompressor::new(reader);
    // or MultiThreadDecompressor::new(reader);
    std::io::copy(&mut decompressor, &mut writer)?;
    Ok(())
}

See gunzip.rs as an example.

Executable

# Usage: target/release/gunzip [-t]
# 	Decompresses .gz file read from stdin and outputs to stdout
# 	-t: employ two threads

# single thread
$ target/release/gunzip < compressed.gz > decompressed

# two threads
$ target/release/gunzip -t < compressed.gz > decompressed

Benchmark

The following result is obtained by running this benchmark script. To isolate IO bottleneck, the input/output files are placed in a ramdisk.

# create a ramdisk
mkdir /ramdisk
mount -o size=4G -t tmpfs none /ramdisk
cd /ramdisk

# run benchmark script
curl https://gist.githubusercontent.com/TechHara/500a1f7b8cc08af86856f707c04b0023/raw/b668d419373bb138f71cfe467f0f9e573aaaf89f/gunzip_benchmark.sh | bash

Contributing

You are welcome to contribute by submitting a PR for bug fixes or enhancements. See here for detailed documentations.

Ports

You can also contribute by porting this code to a different language so that more people can learn from it.

C++

Go

About

Pure Rust implementation of Gunzip from scratch in 1000 lines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0