-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
cut: add whitespace option for separating fields #4232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not
could you please add a comment in
docs/src/extensions.md
about this difference
btw, do you know where are the cut tests in freebsd? |
I think they're here: https://github.com/freebsd/freebsd-src/tree/373ffc62c158e52cde86a5b934ab4a51307f9f2e/usr.bin/cut/tests Though this is not very extensive. |
and some here too, but not much |
GNU testsuite comparison:
|
Hi @sylvestre, is there anything else that needs to be addressed in the PR? |
GNU testsuite comparison:
|
This is not related to this PR and seems to be failing in other PRs as well |
The CICD style/lint failures come from the code that I didn't modify. Not sure why they are failing in this PR but not in other PRs. As for spelling error, should I rename |
for multispace, add it to the top of the file in the ignore spell list :) |
done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool feature! I left a few comments for a bit cleaner code.
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more small improvements and then I think this is ready!
GNU testsuite comparison:
|
Hi @tertsdiepraam just wanted to check in to see if anything else needs to be addressed with this PR to be merged. |
Yeah I think it's good enough now as far as adding the feature goes. I do think |
Oops forgot to squash again. @sylvestre I don't have permissions to remove the merge commit. Could you try to remove it from |
TLDR: "-w" flag in cut is an extension option implemented in FreeBSD. This option proves to be useful when cutting fields separated by one or more whitespaces. GNU coreutils does not currently implement this option. This PR implements -w option and adds test cases. The performance difference compared to FreeBSD's cut implementation is about 9x.
Example use case: "ls" or "ps" returns columns separated by multiple inconsistent whitespaces. We can use -w option to select columns, e.g. "ps | cut -w -f1" to print only pids.
Implementation: For best performance, I created a new function (cut_fields_whitespace) dedicated for -w option. Performance of "cut -f" before and after this PR is essentially the same, as show below:
Before
After
The performance of "cut -w -f" compared to FreeBSD is about 9x faster: