tr: calculate complement set early #6340

jalil-salame · 2024-05-04T14:26:28Z

Fixes #6163 and adds a test to verify that a regression is not caused.

Instead of inverting the conditions to check (e.g. delete characters not present in set1) invert set1 when passed the complement flag (-c, -C, --complement). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX).

This fixes issue 6163 because it was caused by a combination of the -c and -t flag. -c is the abovementioned complement flag and -t/--truncate-set1 truncates set1 to the length of set2. What happened in issue 6163 is that set1={b'Y'} and set2={b'Z'}, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be {b'Y'}, but the complement of {b'Y'}, that is U \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}, thus it is truncated to {0}.

We can verify this by doing: printf '\0' | tr -c -t Y Z, which prints Z to stdout as expected.

Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.

github-actions · 2024-05-04T14:55:16Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

jalil-salame · 2024-05-04T14:56:20Z

The MacOS failures look unrelated to the tr changes, and appear to be the same as the ones present in the last commit.

BenWiederhake

This looks really good! Just two nitpicks, and a worry about a potentially new bug: tr -dts asdf qwe, see below.

src/uu/tr/src/operation.rs

tests/by-util/test_tr.rs

BenWiederhake · 2024-05-04T15:45:31Z

And you're right, the CI failure on Mac is a known issue: #6275 and #6333. I don't have a mac, and don't know why those fail, but something seems to be off there.
If you know how to fix it that would be fantastic, but in either case it's out-of-scope for this PR.

jalil-salame · 2024-05-04T15:52:17Z

And you're right, the CI failure on Mac is a known issue: #6275 and #6333. I don't have a mac, and don't know why those fail, but something seems to be off there. If you know how to fix it that would be fantastic, but in either case it's out-of-scope for this PR.

I also don't have a Mac so my hands are tied. My comment was to make it easier to review c:

An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)

Fixes uutils#6163 and adds a test to verify that a regression is not caused. Instead of inverting the conditions to check (e.g. delete characters **not** present in set1) invert set1 when passed the complement flag (`-c`, `-C`, `--complement`). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX). This fixes issue 6163 because it was caused by a combination of the `-c` and `-t` flag. `-c` is the abovementioned complement flag and `-t`/`--truncate-set1` truncates set1 to the length of set2. What happened in issue 6163 is that `set1={b'Y'}` and `set2={b'Z'}`, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be `{b'Y'}`, but the complement of `{b'Y'}`, that is `U \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}`, thus it is truncated to `{0}`. We can verify this by doing: `printf '\0' | tr -c -t Y Z`, which prints `Z` to stdout as expected. Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.

An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)

BenWiederhake

LGTM!

github-actions · 2024-05-04T16:59:44Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

BenWiederhake · 2024-05-04T17:13:52Z

Ah, I missed the "spelling mistakes". Personally I believe the spellchecker is configured too strictly. For this PR, please somehow make the spellchecker stop complaining about these:

./tests/by-util/test_tr.rs:1342:19 - Unknown word (asdfqqwweerr)
./tests/by-util/test_tr.rs:1345:23 - Unknown word (qwerr)
./tests/by-util/test_tr.rs:1352:19 - Unknown word (asdfqwer)
./tests/by-util/test_tr.rs:1355:23 - Unknown word (qwer)
./tests/by-util/test_tr.rs:1362:19 - Unknown word (aassddffqwer)
./tests/by-util/test_tr.rs:1365:23 - Unknown word (asdfqwer)

You'll find in the same file examples how to use spell-checker:disable-line and spell-checker:disable-next-line.

jalil-salame · 2024-05-04T17:15:14Z

Ah, I missed the "spelling mistakes". Personally I believe the spellchecker is configured too strictly. For this PR, please somehow make the spellchecker stop complaining about these:
./tests/by-util/test_tr.rs:1342:19 - Unknown word (asdfqqwweerr)
./tests/by-util/test_tr.rs:1345:23 - Unknown word (qwerr)
./tests/by-util/test_tr.rs:1352:19 - Unknown word (asdfqwer)
./tests/by-util/test_tr.rs:1355:23 - Unknown word (qwer)
./tests/by-util/test_tr.rs:1362:19 - Unknown word (aassddffqwer)
./tests/by-util/test_tr.rs:1365:23 - Unknown word (asdfqwer)
You'll find in the same file examples how to use spell-checker:disable-line and spell-checker:disable-next-line.

Those are new, you didn't miss them, I did T-T

An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)

BenWiederhake · 2024-05-04T17:18:17Z

No worries, all is good ^^

jalil-salame · 2024-05-04T17:26:07Z

I gave it a try to improving perf by rewriting the set from a Vec<u8> to a custom BiSetU8 ([u32; 8]), but then I realized that TranslateOperation is order sensitive: "abc" != "cba", so it can't be done as early as possible. The complexity of the code increases, so I'd rather keep it as a Vec<u8> while I give it some more thought.

I think delaying the translation from args to sets might help, but it doesn't seem necessary to me.

Is there a benchmark framework used to test uutils? I don't think the current implementation of tr is much slower than GNU's.

BenWiederhake · 2024-05-04T17:41:18Z

Yes, the TranslateOperation is order-sensitive. I was mostly thinking about the process of computing the complement itself. And I totally agree, let's keep it as-is for now, and worry about performance later. (And when you do, MEASURE first! I feel like this would make a difference, but feelings aren't reliable enough.)

I'm not aware of any consistent benchmarking scheme. However, there are a few BENCHMARKING.md files for some of the tools. (But again, that's out-of-scope for this PR, please open a separate PR for that.)

github-actions · 2024-05-04T17:47:30Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

BenWiederhake

Jup, still LGTM

jalil-salame mentioned this pull request May 4, 2024

tr: different behavior than GNU with echo "BB%t"|tr -c -d -s Bt "B" #6163

Closed

BenWiederhake requested changes May 4, 2024

View reviewed changes

src/uu/tr/src/operation.rs Show resolved Hide resolved

src/uu/tr/src/operation.rs Outdated Show resolved Hide resolved

tests/by-util/test_tr.rs Show resolved Hide resolved

jalil-salame force-pushed the tr-fix-6163 branch from 7fdbcc1 to 4daf482 Compare May 4, 2024 16:20

jalil-salame force-pushed the tr-fix-6163 branch from 4daf482 to 36c6a03 Compare May 4, 2024 16:21

jalil-salame requested a review from BenWiederhake May 4, 2024 16:22

BenWiederhake approved these changes May 4, 2024

View reviewed changes

BenWiederhake mentioned this pull request May 4, 2024

tr: Need to rewrite solve_set_characters and TranslateOperationComplement #6133

Closed

jalil-salame force-pushed the tr-fix-6163 branch from 36c6a03 to 9d4b517 Compare May 4, 2024 17:18

BenWiederhake approved these changes May 4, 2024

View reviewed changes

BenWiederhake merged commit ff1a03c into uutils:main May 4, 2024

jalil-salame deleted the tr-fix-6163 branch May 4, 2024 17:56

BrewTestBot mentioned this pull request Jun 23, 2024

uutils-coreutils 0.0.27 Homebrew/homebrew-core#175400

Merged

moonfruit mentioned this pull request Jun 24, 2024

uutils-selected 0.0.27 moonfruit/homebrew-tap#114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tr: calculate complement set early #6340

tr: calculate complement set early #6340

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tr: calculate complement set early #6340

tr: calculate complement set early #6340

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!