8000 stripping punctuation from OCR results before reference matching by EmilyFitzgerald · Pull Request #4 · rbturnbull/hespi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

stripping punctuation from OCR results before reference matching #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 6, 2024

Conversation

EmilyFitzgerald
Copy link
Collaborator

Update to Hespi to remove punctuation from the OCR results for the Family, Genus, and Species fields.

  • Specific punctuation marks being stripped are: !"#$%&'*+,-./:;<=>?@\^_`{|}~
  • Also includes a strip spaces to remove any whitespace from before or after the text.

Branch also includes changes to the csv column order code, to ensure column types are grouped together and divider markers are in the correct place (and an update to the tests to ensure they still pass).

Have confirmed tests pass, coverage at 100%

Commits:

commit name commit description
stripping punctuation code to remove punctuation
saved file no apparent change - reference list file required saving though
update strip_punctuation to left side code initially right strip, changed to strip from both sides
fixing csv ordering updated label_sort_key so matches label fields
updated util file moved image files to end of csv report
updated column order test changed test to match the new column order

@rbturnbull rbturnbull self-assigned this Aug 23, 2024
@rbturnbull rbturnbull self-requested a review August 23, 2024 05:28
@rbturnbull rbturnbull merged commit fdb78a6 into main Sep 6, 2024
4 checks passed
@rbturnbull rbturnbull deleted the july_24 branch September 6, 2024 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0