[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2390948.2391051dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Learning syntactic categories using paradigmatic representations of word context

Published: 12 July 2012 Publication History

Abstract

We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.

References

[1]
B. Ambridge and E. V. M. Lieven, 2011. Child Language Acquisition: Contrasting Theoretical Approaches, chapter 6.1. Cambridge University Press.
[2]
D. Arthur and S. Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027--1035. Society for Industrial and Applied Mathematics.
[3]
Taylor Berg-Kirkpatrick and Dan Klein. 2010. Phylogenetic grammar induction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1288--1297, Uppsala, Sweden, July. Association for Computational Linguistics.
[4]
Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 582--590, Los Angeles, California, June. Association for Computational Linguistics.
[5]
C. Biemann. 2006. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 7--12. Association for Computational Linguistics.
[6]
Phil Blunsom and Trevor Cohn. 2011. A hierarchical pitman-yor process hmm for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 865--874, Portland, Oregon, USA, June. Association for Computational Linguistics.
[7]
Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Comput. Linguist., 18: 467--479, December.
[8]
D. Chandler. 2007. Semiotics: the basics. The Basics Series. Routledge.
[9]
Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised pos induction: how far have we come? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 575--584, Stroudsburg, PA, USA. Association for Computational Linguistics.
[10]
Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2011. A bayesian mixture model for pos induction using multiple features. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 638--647, Edinburgh, Scotland, UK., July. Association for Computational Linguistics.
[11]
Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on Applied natural language processing, ANLC '88, pages 136--143, Stroudsburg, PA, USA. Association for Computational Linguistics.
[12]
Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, EACL '03, pages 59--66, Stroudsburg, PA, USA. Association for Computational Linguistics.
[13]
Mathias Creutz and Krista Lagus. 2005. Inducing the morphological lexicon of a natural language from unannotated text. In Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pages 106--113, Espoo, Finland, June.
[14]
D. Freudenthal, J. M. Pine, and F. Gobet. 2005. On the resolution of ambiguities in the extraction of syntactic categories through chunking. Cognitive Systems Research, 6(1): 17--25.
[15]
Kuzman Ganchev, João Graça, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. J. Mach. Learn. Res., 99: 2001--2049, August.
[16]
Jianfeng Gao and Mark Johnson. 2008. A comparison of bayesian estimators for unsupervised hidden markov model pos taggers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 344--352, Stroudsburg, PA, USA. Association for Computational Linguistics.
[17]
Amir Globerson, Gal Chechik, Fernando Pereira, and Naftali Tishby. 2007. Euclidean embedding of cooccurrence data. J. Mach. Learn. Res., 8: 2265--2295, December.
[18]
Sharon Goldwater and Tom Griffiths. 2007. A fully bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 744--751, Prague, Czech Republic, June. Association for Computational Linguistics.
[19]
David Graff, Roni Rosenfeld, and Doug Paul. 1995. Csr-iii text. Linguistic Data Consortium, Philadelphia.
[20]
Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL '06, pages 320--327, Stroudsburg, PA, USA. Association for Computational Linguistics.
[21]
Mark Johnson. 2007. Why doesn't EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 296--305, Prague, Czech Republic, June. Association for Computational Linguistics.
[22]
Michael Lamar, Yariv Maron, and Elie Bienenstock. 2010a. Latent-descriptor clustering for unsupervised pos induction. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 799--809, Stroudsburg, PA, USA. Association for Computational Linguistics.
[23]
Michael Lamar, Yariv Maron, Mark Johnson, and Elie Bienenstock. 2010b. Svd and clustering for unsupervised pos tagging. In Proceedings of the ACL 2010 Conference Short Papers, pages 215--219, Uppsala, Sweden, July. Association for Computational Linguistics.
[24]
Yoong Keok Lee, Aria Haghighi, and Regina Barzilay. 2010. Simple type-level unsupervised pos tagging. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 853--861, Stroudsburg, PA, USA. Association for Computational Linguistics.
[25]
Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor. 1999. Treebank-3. Linguistic Data Consortium, Philadelphia.
[26]
Yariv Maron, Michael Lamar, and Elie Bienenstock. 2010. Sphere embedding: An application to part-of-speech induction. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1567--1575.
[27]
Bernard Merialdo. 1994. Tagging english text with a probabilistic model. Comput. Linguist., 20: 155--171, June.
[28]
T. H. Mintz. 2003. Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1): 91--117.
[29]
M. Redington, N. Crater, and S. Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4): 425--469.
[30]
A. Rosenberg and J. Hirschberg. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 410--420.
[31]
Magnus Sahlgren. 2006. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University.
[32]
Hinrich Schütze. 1995. Distributional part-of-speech tagging. In Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics, EACL '95, pages 141--148, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
[33]
Andreas Stolcke. 2002. Srilm-an extensible language modeling toolkit. In Proceedings International Conference on Spoken Language Processing, pages 257--286, November.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
July 2012
1573 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 12 July 2012

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 149
    Total Downloads
  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media