8000 decodecorpus by ephiepark · Pull Request #1664 · facebook/zstd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

decodecorpus #1664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 1, 2019
Merged

decodecorpus #1664

merged 7 commits into from
Jul 1, 2019

Conversation

ephiepark
Copy link
Contributor

Decoder was enforcing a constraint that is more strict than the spec. Sequence section for a compressed block can be less than the constraint that was enforced by decoder. When a block with rle mode for all (literal length, match length, and offset) is followed by another block with repeat mode for all (literal length, match length, and offset), the second block's sequence section can be 2 bytes (1 byte for number of sequences and 1 byte for compression mode for each field).

This pull request includes 2 changes.

  1. fixing the decoder code to remove the constraint.
  2. Updating decodecorpus code to generate repeat after rle mode.

Run ./decodecorpus -ptestfiles -otestfiles -n1 -s1236947365 -v to generate an input that has an issue before the fix.

Copy link
Contributor
@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, but I'll let Yann take a look.

How probable is it that decodecorpus generates a block that triggers this bug? Starting on a random seed, approximately how long does it take to generate an offending block?

@ephiepark
Copy link
Contributor Author

@terrelln
It takes a few minutes. I would say less than 10 minutes. They are cases where sequences section is 4 bytes, which does cause issues. But I don't think I have ever seen sequences section being <= 3 so far. I added a unittest to cover that though.

@Cyan4973
Copy link
Contributor
Cyan4973 commented Jun 28, 2019

A targeted test case for an extreme outlier,
and a generic test case generator which reaches the target issue in a reasonable time frame,
it's the right mix I believe.

CHECK_Z( ZSTD_decompressStream(zds, &outBuff, &inBuff) );
}

{ XXH64_state_t xxhStateIn, xxhStateOut;
8000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
if all you want to do is compare decompressed and decodedBuffer content and ensure they are equal, you could simply memcmp() them.

outBuff.pos = 0;

while (inBuff.pos < inBuff.size) {
CHECK_Z( ZSTD_decompressStream(zds, &outBuff, &inBuff) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's slightly stronger than CHECK_Z() :
presuming the frame is complete (is it ?), you want a return value 0, to indicate that the decoder has reached the end of frame. You also want to ensure it has consumed the entire input (inBuff.pos == inBuff.compressedSize).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a complete frame.

@Cyan4973
Copy link
Contributor
Cyan4973 commented Jun 28, 2019

Looks good !
Just some minor comments on the unit test case.

@Cyan4973 Cyan4973 merged commit 4d611ca into facebook:dev Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0