8000 Fix matching regex word-boundary (\b) in strings replace by davidwendt · Pull Request #9997 · rapidsai/cudf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix matching regex word-boundary (\b) in strings replace #9997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 20, 2022

Conversation

davidwendt
Copy link
Contributor

Closes #9950

Fixes matching a single word-boundary (BOW) regex pattern. This pattern will match word boundaries and not any actual characters. This means the (begin,end) position values will be equal. The replace code was always expecting begin < end character range to replace. The logic has been updated to allow for this case.

Additional gtests have been added that include a single \b pattern character.

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Jan 7, 2022
@davidwendt davidwendt requested a review from a team as a code owner January 7, 2022 18:16
@davidwendt davidwendt self-assigned this Jan 7, 2022
@davidwendt davidwendt requested review from bdice and mythrocks January 7, 2022 18:16
@codecov
Copy link
codecov bot commented Jan 7, 2022

Codecov Report

Merging #9997 (8585e56) into branch-22.02 (967a333) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head 8585e56 differs from pull request most recent head 00ffb8d. Consider uploading reports for the commit 00ffb8d to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9997      +/-   ##
================================================
- Coverage         10.49%   10.40%   -0.09%     
================================================
  Files               119      119              
  Lines             20305    20556     +251     
================================================
+ Hits               2130     2139       +9     
- Misses            18175    18417     +242     
Impacted Files Coverage Δ
python/custreamz/custreamz/kafka.py 29.16% <0.00%> (-0.63%) ⬇️
python/dask_cudf/dask_cudf/sorting.py 92.66% <0.00%> (-0.25%) ⬇️
python/dask_cudf/dask_cudf/core.py 70.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/dtypes.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/scalar.py 0.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ff5f12...00ffb8d. Read the comment docs.

Sorry, something went wrong.

@davidwendt davidwendt changed the title Fix matching regex word-boundary (BOW) in strings replace Fix matching regex word-boundary (\b) in strings replace Jan 7, 2022
Copy link
Contributor
@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - some minor improvements to constness might be possible, but those are non-blocking.

@davidwendt
Copy link
Contributor Author

rerun tests

8000
Copy link
Contributor
@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@galipremsagar
Copy link
Contributor

rerun tests

1 similar comment
@vyasr
Copy link
Contributor
vyasr commented Jan 20, 2022

rerun tests

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 13429ff into rapidsai:branch-22.02 Jan 20, 2022
@davidwendt davidwendt deleted the bug-replace-regex-bow branch January 20, 2022 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support replacing word boundaries in regexp replace in way that is compatible with Python/Java
6 participants
0