8000 [libzstd] Optimize ZSTD_insertBt1() for repetitive data by terrelln · Pull Request #1635 · facebook/zstd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[libzstd] Optimize ZSTD_insertBt1() for repetitive data #1635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2019

Conversation

terrelln
Copy link
Contributor
@terrelln terrelln commented Jun 6, 2019

We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.

This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.

Before:

> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.34
strategy=2
0.36
strategy=3
0.41
strategy=4
0.93
strategy=5
1.33
strategy=6
8.41
strategy=7
31.03
strategy=8
31.13
strategy=9
123.05

After:

> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35

At level 19 with multithreading the compressed size of silesia.tar regresses 300 bytes, and enwik8 regresses 100 bytes. In single threaded mode enwik8 is also within 100 bytes, and I didn't test silesia.tar. The compression speed is not significantly effected.

The regression tests show changes in the order of single digit number of bytes.

Fixes Issue #1634.

terrelln and others added 2 commits June 5, 2019 20:34
We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.

This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.

```
> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35
```

At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes.
In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`.

Fixes Issue facebook#1634.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0