Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy package.
Based on code posted by mtbr at his blog entry WordNet-based lemmatizer
sudo gem install lemmatizer
require "lemmatizer"
lem = Lemmatizer.new
p lem.lemma("dogs", :noun ) # => "dog"
p lem.lemma("hired", :verb ) # => "hire"
p lem.lemma("hotter", :adj ) # => "hot"
# when part-of-speech symbol is not specified as the second argument,
# lemmatizer tries :verb, :noun, :adj, and :adv one by one in this order.
p lem.lemma("fired") # => "fire"
p lem.lemma("slow") # => "slow"
# Lemmatizer leaves alone words that its dictionary does not contain.
# This keeps proper names such as "James" intact.
p lem.lemma("MacBooks", :noun) # => "MacBooks"
# If an inflected form is included as a lemma in the word index,
# lemmatizer may not give an expected result.
p lem.lemma("higher", :adj) # => "higher" not "high"!
# The above has to happen because "higher" is itself an entry word listed in dict/index.adj .
# Modify dict/index.{noun|verb|adj|adv} if necessary.
- Yoichiro Hasebe mailto:yohasebe@gmail.com
Thanks for assistance and contributions:
- Vladimir Ivic http://vladimirivic.com
Licensed under the MIT license.