More Web Proxy on the site http://driver.im/

research-article

Free access

Learning syntactic categories using paradigmatic representations of word context

Authors:

Mehmet Ali Yatbaz,

Deniz YuretAuthors Info & Claims

EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Pages 940 - 951

Published: 12 July 2012 Publication History

Abstract

We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.

References

[1]

B. Ambridge and E. V. M. Lieven, 2011. Child Language Acquisition: Contrasting Theoretical Approaches, chapter 6.1. Cambridge University Press.

[2]

D. Arthur and S. Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027--1035. Society for Industrial and Applied Mathematics.

Digital Library

[3]

Taylor Berg-Kirkpatrick and Dan Klein. 2010. Phylogenetic grammar induction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1288--1297, Uppsala, Sweden, July. Association for Computational Linguistics.

Digital Library

[4]

Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 582--590, Los Angeles, California, June. Association for Computational Linguistics.

Digital Library

[5]

C. Biemann. 2006. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 7--12. Association for Computational Linguistics.

Digital Library

[6]

Phil Blunsom and Trevor Cohn. 2011. A hierarchical pitman-yor process hmm for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 865--874, Portland, Oregon, USA, June. Association for Computational Linguistics.

Digital Library

[7]

Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Comput. Linguist., 18: 467--479, December.

Digital Library

[8]

D. Chandler. 2007. Semiotics: the basics. The Basics Series. Routledge.

[9]

Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised pos induction: how far have we come? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 575--584, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[10]

Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2011. A bayesian mixture model for pos induction using multiple features. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 638--647, Edinburgh, Scotland, UK., July. Association for Computational Linguistics.

Digital Library

[11]

Kenneth Ward Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the second conference on Applied natural language processing, ANLC '88, pages 136--143, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[12]

Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, EACL '03, pages 59--66, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[13]

Mathias Creutz and Krista Lagus. 2005. Inducing the morphological lexicon of a natural language from unannotated text. In Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pages 106--113, Espoo, Finland, June.

[14]

D. Freudenthal, J. M. Pine, and F. Gobet. 2005. On the resolution of ambiguities in the extraction of syntactic categories through chunking. Cognitive Systems Research, 6(1): 17--25.

Digital Library

[15]

Kuzman Ganchev, João Graça, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. J. Mach. Learn. Res., 99: 2001--2049, August.

Digital Library

[16]

Jianfeng Gao and Mark Johnson. 2008. A comparison of bayesian estimators for unsupervised hidden markov model pos taggers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 344--352, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[17]

Amir Globerson, Gal Chechik, Fernando Pereira, and Naftali Tishby. 2007. Euclidean embedding of cooccurrence data. J. Mach. Learn. Res., 8: 2265--2295, December.

Digital Library

[18]

Sharon Goldwater and Tom Griffiths. 2007. A fully bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 744--751, Prague, Czech Republic, June. Association for Computational Linguistics.

[19]

David Graff, Roni Rosenfeld, and Doug Paul. 1995. Csr-iii text. Linguistic Data Consortium, Philadelphia.

[20]

Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL '06, pages 320--327, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[21]

Mark Johnson. 2007. Why doesn't EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 296--305, Prague, Czech Republic, June. Association for Computational Linguistics.

[22]

Michael Lamar, Yariv Maron, and Elie Bienenstock. 2010a. Latent-descriptor clustering for unsupervised pos induction. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 799--809, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[23]

Michael Lamar, Yariv Maron, Mark Johnson, and Elie Bienenstock. 2010b. Svd and clustering for unsupervised pos tagging. In Proceedings of the ACL 2010 Conference Short Papers, pages 215--219, Uppsala, Sweden, July. Association for Computational Linguistics.

Digital Library

[24]

Yoong Keok Lee, Aria Haghighi, and Regina Barzilay. 2010. Simple type-level unsupervised pos tagging. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 853--861, Stroudsburg, PA, USA. Association for Computational Linguistics.

Digital Library

[25]

Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor. 1999. Treebank-3. Linguistic Data Consortium, Philadelphia.

[26]

Yariv Maron, Michael Lamar, and Elie Bienenstock. 2010. Sphere embedding: An application to part-of-speech induction. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1567--1575.

[27]

Bernard Merialdo. 1994. Tagging english text with a probabilistic model. Comput. Linguist., 20: 155--171, June.

Digital Library

[28]

T. H. Mintz. 2003. Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1): 91--117.

[29]

M. Redington, N. Crater, and S. Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4): 425--469.

[30]

A. Rosenberg and J. Hirschberg. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 410--420.

[31]

Magnus Sahlgren. 2006. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University.

[32]

Hinrich Schütze. 1995. Distributional part-of-speech tagging. In Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics, EACL '95, pages 141--148, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.

Digital Library

[33]

Andreas Stolcke. 2002. Srilm-an extensible language modeling toolkit. In Proceedings International Conference on Spoken Language Processing, pages 257--286, November.

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Automatic extraction of the multiple semantic and syntactic categories of words
AIAP'07: Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications

A single unsupervised algorithm called lexical context deconvolution (LCD) is proposed to discover the semantic categories (senses) of polysemous words (those with more than one meaning) and the syntactic categories (parts of speech) of ambiguous words (...
Improving word representations via global context and multiple word prototypes
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

July 2012

1573 pages

General Chair:
Jun'ichi Tsujii
Microsoft Research Asia
,
Program Chairs:
James Henderson
Xerox Research Centre Europe
,
Marius Pasca
Google

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 12 July 2012

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
149
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents