-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
tr: calculate complement set early #6340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GNU testsuite comparison:
|
All reactions
Sorry, something went wrong.
The MacOS failures look unrelated to the |
All reactions
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! Just two nitpicks, and a worry about a potentially new bug: tr -dts asdf qwe
, see below.
Sorry, something went wrong.
All reactions
And you're right, the CI failure on Mac is a known issue: #6275 and #6333. I don't have a mac, and don't know why those fail, but something seems to be off there. |
All reactions
Sorry, something went wrong.
I also don't have a Mac so my hands are tied. My comment was to make it easier to review c: |
All reactions
-
👍 1 reaction
Sorry, something went wrong.
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
Fixes uutils#6163 and adds a test to verify that a regression is not caused. Instead of inverting the conditions to check (e.g. delete characters **not** present in set1) invert set1 when passed the complement flag (`-c`, `-C`, `--complement`). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX). This fixes issue 6163 because it was caused by a combination of the `-c` and `-t` flag. `-c` is the abovementioned complement flag and `-t`/`--truncate-set1` truncates set1 to the length of set2. What happened in issue 6163 is that `set1={b'Y'}` and `set2={b'Z'}`, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be `{b'Y'}`, but the complement of `{b'Y'}`, that is `U \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}`, thus it is truncated to `{0}`. We can verify this by doing: `printf '\0' | tr -c -t Y Z`, which prints `Z` to stdout as expected. Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Sorry, something went wrong.
All reactions
GNU testsuite comparison:
|
All reactions
Sorry, something went wrong.
Ah, I missed the "spelling mistakes". Personally I believe the spellchecker is configured too strictly. For this PR, please somehow make the spellchecker stop complaining about these:
You'll find in the same file examples how to use |
All reactions
Sorry, something went wrong.
Those are new, you didn't miss them, I did T-T |
All reactions
Sorry, something went wrong.
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
No worries, all is good ^^ |
All reactions
Sorry, something went wrong.
I gave it a try to improving perf by rewriting the set from a I think delaying the translation from args to sets might help, but it doesn't seem necessary to me. Is there a benchmark framework used to test |
All reactions
Sorry, something went wrong.
Yes, the TranslateOperation is order-sensitive. I was mostly thinking about the process of computing the complement itself. And I totally agree, let's keep it as-is for now, and worry about performance later. (And when you do, MEASURE first! I feel like this would make a difference, but feelings aren't reliable enough.) I'm not aware of any consistent benchmarking scheme. However, there are a few |
All reactions
Sorry, something went wrong.
GNU testsuite comparison:
|
All reactions
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jup, still LGTM
Sorry, something went wrong.
All reactions
BenWiederhake
Successfully merging this pull request may close these issues.
tr: different behavior than GNU withecho "BB%t"|tr -c -d -s Bt "B"
Fixes #6163 and adds a test to verify that a regression is not caused.
Instead of inverting the conditions to check (e.g. delete characters not present in set1) invert set1 when passed the complement flag (
-c
,-C
,--complement
). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX).This fixes issue 6163 because it was caused by a combination of the
-c
and-t
flag.-c
is the abovementioned complement flag and-t
/--truncate-set1
truncates set1 to the length of set2. What happened in issue 6163 is thatset1={b'Y'}
andset2={b'Z'}
, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be{b'Y'}
, but the complement of{b'Y'}
, that isU \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}
, thus it is truncated to{0}
.We can verify this by doing:
printf '\0' | tr -c -t Y Z
, which printsZ
to stdout as expected.Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.