GitHub - afeinberg/alz

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
sample		sample
src		src
CMakeLists.txt		CMakeLists.txt
README		README
clean_cmake.sh		clean_cmake.sh

Repository files navigation

Quick explanation:

* I used C++ with some C++0x/C++11 features. While I've worked
  primarily in Java for the last 2.5 years, I felt using C++ for this
  project would be more challenging and also more relevant for the
  position at hand.

* The algorithm itself is modified from one used by zlib:
  http://www.gzip.org/algorithm.txt 

* I only had a chance to test this under Linux. This will not work
  under Windows, as I am making use of read()/write() directly.

  Only external requirements are cmake and Boost. From boost, I am
  only using the memory pool class (it's a header, no shared libraries
  required):

  http://www.boost.org/doc/libs/1_47_0/libs/pool/doc/index.html

  (I did try plain malloc first and it had noticeable performance
  overhead.  Unfortunately, I did not have time to use jemalloc or
  tcmalloc, which possibly would have made the use of memory pool --
  for the hash table chains -- un-needed).
  
* To compile, run: 

  cmake . && make

  Output will be: alz_encode and alz_decode in ./src subdirectory

  To compile unit tests, download and compile google-test (by running
  cmake . && make in the google-test directory), and then run:

  GTEST_ROOT=/path/to/gtest/ cmake .
  make

  To run the unit tests:
  make test ;
  # (from the root directory of the project)      
  ./src/file_io_test # doesn't run by default as part of make_test, as it's
                     #  more of an integration test

* Note: compilation speed could be improved by creating a separate
  library and then (statically) linking both the executable and the
  unit tests against that library. This is what I've done with
  previous personal projects where I've used CMAke, e.g.,
  https://github.com/afeinberg/af-concurrent-cc
  
  Another substantial improvement in compilation speed would be to use
  less inlining, but it would come at the cost of performance (especially
  in the string matching code and bit extraction code, which gets called
  many times in a loop) as some tests I've done showed.
  
  Initially I thought just -O3 would suffice to unroll the loops, but
  I found very substantial (20-30%) improvement when I passed the
  -funroll-loops flag explicitly (together with -O3).

* I first wanted to make sure I implemented the algorithm correctly,
  so I used Source/Sink abstractions (inspired by briefly reading
  through Google snappy source code earlier) -- that allowed me to
  first test everything using in-memory byte arrays

* file_io.h / file_io.cc implements Source/Sink for files

* bit_stream.h implements input / output bit streams which respect 
  network byte order (by writing character by character) over the
  Sink / Source abstractions.

  Note: bit_stream is purely inline, as it's called many times inside
  loops. I am also using templates to specify types and bit counts
  once in both InBitStream and OutBitStream, as to allow the compiler
  to unroll the loops within those methods.

  As you may note, OutBitStream does modifications on a buffer that is
  then (via the Sink abstraction) memcpy'd out to yet another buffer
  which is then written to disk via write(). My first hunch was to
  optimize this further by having the OutBitStream operate directly on
  the buffer (like InBitStream does), however, I found that this
  yielded little performance difference (especially since this is far
  from the bottleneck in the encoder).
  
* encoder.cc / encoder.h and decoder.cc / decoder.h should be self
  explanatory 

* Anything ending with _test is a unit test.