fix to find Korean Stopwords #138

galaxytemple · 2022-05-06T04:40:18Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

In English, All words including StopWords are separated by spacing. We can simply find Stopwords using the HashSet.
But Korean StopWords are attached at a word without a space, so it requires another logic.

For example,
English : Nice to meet you ( 'to' and 'you' are Stopsword)
Korea : 만나서 반가워요 ( '요' is Stopsword )

You know, Aho-Corasick algorithm has been widely used for string matching due to its advantage of matching multiple string patterns in a single pass
So it's suitable

Any other comments?

lababidi · 2022-05-09T20:43:09Z

@galaxytemple you'll need to add ahocorasick and write a test please

galaxytemple · 2022-05-10T04:56:34Z

@lababidi Thank you for your feedback. I added pyahocorasick and tests/test_stopwords.py

barrust · 2022-09-02T17:32:27Z

This PR looks good to me (all tests passed)! If there are no concerns I can merge and push a new version

fix to find Korean Stopwords

61b460a

add pyahocorasick and test code

fcb25d7

Sanghoon Kim added 2 commits May 10, 2022 14:59

fix tc

6db5af0

add comments on TestStopWords

525b94c

barrust merged commit 79ff10d into goose3:master Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix to find Korean Stopwords #138

fix to find Korean Stopwords #138

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix to find Korean Stopwords #138

fix to find Korean Stopwords #138

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!