[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1118647.1118653dlproceedingsArticle/Chapter ViewAbstractPublication PagesmplConference Proceedingsconference-collections
Article
Free access

Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

Published: 11 July 2002 Publication History

Abstract

We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs that are both orthographically and semantically similar, where orthographic similarity is measured in terms of minimum edit distance, and semantic similarity is measured in terms of mutual information. The procedure does not rely on a morpheme concatenation model, nor on distributional properties of word substrings (such as affix frequency). Experiments with German and English input give encouraging results, both in terms of precision (proportion of good pairs found at various cutoff points of the ranked list), and in terms of a qualitative analysis of the types of morphological patterns discovered by the algorithm.

References

[1]
A. Albright and B. Hayes. 1999. An automated learner for phonology and morphology. UCLA manuscript.
[2]
M. Baroni. 2000. Distributional cues in morpheme discovery: A computational model and empirical evidence. Ph.D. dissertation, UCLA.
[3]
M. Baroni, J. Matiasek and H. Trost. 2002. Wordform-and class-based prediction of the components of German nominal compounds in an AAC system. To appear in Proceedings of COLING 2002.
[4]
P. Brown, P. Della Pietra, P. DeSouza, J. Lai, and R. Mercer. 1990. Class-based n-gram models of natural language. Computational Linguistics, 18:467--479.
[5]
K. Church and P. Hanks. 1989. Word association norms, mutual information, and lexicography. Proceedings of ACL 27, 76--83.
[6]
J. Goldsmith. 2001. Unsupervised learning of the morphology of a natural language. Computational Linguistics, 27:153--198.
[7]
C. Jacquemin. 1997. Guessing morphology from terms and corpora. Proceedings of SIGIR 97, 156--265.
[8]
D. Jurafsky and J. Martin. 2000. Speech and Language Processing. Prentice-Hall, Upper Saddle River, NJ.
[9]
L. Karttunen, K. Gaál, and A. Kempe. 1997. Xerox Finite-State Tool Xerox Research Centre Europe, Grenoble.
[10]
H. Kučera and N. Francis. 1967. Computational analysis of present-day American English. Brown University Press, Providence, RI.
[11]
C. Manning and H. Schütze. 1999. Foundations of statistical natural language processing. MIT Press, Cambridge, MASS.
[12]
S. Neuvel. 2002. Whole word morphologizer. Expanding the word-based lexicon: A non-stochastic computational approach. Brain and Language, in press.
[13]
R. Rosenfeld. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language, 10:187--228.
[14]
P. Schone and D. Jurafsky. 2000. Knowldedge-free induction of morphology using latent semantic analysis. Proceedings of the Conference on Computational Natural Language Learning.
[15]
M. Snover and M. Brent. 2001. A Bayesian model for morpheme and paradigm identification. Proceedings of ACL 39, 482-490.
[16]
D. Yarowksy and R. Wicentowski. 2000. Minimally supervised morphological analysis by multimodal alignment. Proceedings of ACL 38, 207--216.

Cited By

View all
  • (2013)Effective and Robust Query-Based StemmingACM Transactions on Information Systems10.1145/2536736.253673831:4(1-29)Online publication date: 1-Nov-2013
  • (2013)Orthogonality and OrthographySelected Papers of the 7th International Conference on Quantum Interaction - Volume 836910.1007/978-3-642-54943-4_4(34-46)Online publication date: 25-Jul-2013
  • (2013)Unsupervised segmentation for different types of morphological processes using multiple sequence alignmentProceedings of the First international conference on Statistical Language and Speech Processing10.1007/978-3-642-39593-2_14(152-163)Online publication date: 29-Jul-2013
  • Show More Cited By
  1. Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      MPL '02: Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
      July 2002
      82 pages
      • Program Chair:
      • Mike Maxwell

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 11 July 2002

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 03 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2013)Effective and Robust Query-Based StemmingACM Transactions on Information Systems10.1145/2536736.253673831:4(1-29)Online publication date: 1-Nov-2013
      • (2013)Orthogonality and OrthographySelected Papers of the 7th International Conference on Quantum Interaction - Volume 836910.1007/978-3-642-54943-4_4(34-46)Online publication date: 25-Jul-2013
      • (2013)Unsupervised segmentation for different types of morphological processes using multiple sequence alignmentProceedings of the First international conference on Statistical Language and Speech Processing10.1007/978-3-642-39593-2_14(152-163)Online publication date: 29-Jul-2013
      • (2012)Arabic retrieval revisitedProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 210.5555/2390665.2390719(218-222)Online publication date: 8-Jul-2012
      • (2011)Discovering morphological paradigms from plain text using a Dirichlet process mixture modelProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145504(616-627)Online publication date: 27-Jul-2011
      • (2010)Predicting the semantic compositionality of prefix verbsProceedings of the 2010 Conference on Empirical Methods in Natural Language Processing10.5555/1870658.1870687(293-303)Online publication date: 9-Oct-2010
      • (2009)Unsupervised morphological segmentation and clustering with document boundariesProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 210.5555/1699571.1699600(668-677)Online publication date: 6-Aug-2009
      • (2008)AllomorfessorProceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access10.5555/1813809.1813959(975-982)Online publication date: 17-Sep-2008
      • (2008)Acquistion of the morphological structure of the lexicon based on lexical similarity and formal analogyProceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing10.5555/1627328.1627329(1-8)Online publication date: 24-Aug-2008
      • (2008)Division of Spanish Words into Morphemes with a Genetic AlgorithmProceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems10.1007/978-3-540-69858-6_4(19-26)Online publication date: 24-Jun-2008
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media