#simmer Avi Bryant
Simmer is a streaming aggregation tool. It can be used in several contexts to incrementally and efficiently summarize large volumes of data using a fixed amount of memory. Some of the ways it can be used include:
- As a filter in a unix pipeline processing logs or other text files
- As a combiner and reducer in Hadoop streaming jobs
- As a statsd-style metrics service over UDP, optionally backed by Redis
Some of the aggregations it supports include:
- counts of unique values
- exponentially decaying values
- top k most frequent values
- percentiles
- min-hash signatures
Simmer is commutative and associative, which is to say that you can always use simmer to combine simmer's output.
It was inspired in part by Hadoop streaming's Aggregate package, but uses the probabalistic aggregation algorithms from Twitter's Algebird.
###To build:
rake
###To run:
bin/simmer < /path/to/data.tsv
###To run listening on UDP and writing to Redis on every 10 updates to a key:
target/simmer -u 8000 -r localhost:6379 -f 10