[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Learning Subjective Language

Published: 01 September 2004 Publication History

Abstract

Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. Clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identified using distributional similarity. The features are also examined working together in concert. The features, generated from different data sets using different procedures, exhibit consistency in performance in that they all do better and worse on the same data sets. In addition, this article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinion piece recognition (a type of text categorization and genre detection) to demonstrate the utility of the knowledge acquired in this article.

References

[1]
Agrawal, Rakesh, Sridhar Rajagopalan, Ramakrishnan Srikant, and Yirong Xu. 2003. Mining newsgroups using networks arising from social behavior. In Proceedings of the 12th International World Wide Web Conference (WWW2003), Budapest, May 20-24.
[2]
Alvarado, Sergio J., Michael G. Dyer, and Margot Flowers. 1986. Editorial comprehension in oped through argument units. In Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), Philadelphia, August 11-15, pages 250-256.
[3]
Anderson, Clifford W. and George C. McMaster. 1989. Quantification of rewriting by the Brothers Grimm: A comparison of successive versions of three tales. Computers and the Humanities, 23(4-5):341-346.
[4]
Aone, Chinatsu, Mila Ramos-Santacruz, and William J. Niehaus. 2000. Assentor: An NLP-based solution to e-mail monitoring. In Proceedings of the 12th Innovative Applications of Artificial Intelligence Conference (IAAI-2000), Austin, TX, August 1-3, pages 945-950.
[5]
Argamon, Shlomo, Moshe Koppel, and Galit Avneri. 1998. Routing documents according to style. In Proceedings of the First International Workshop on Innovative Internet Information Systems (IIIS-98), Pisa, Italy, June 8-9.
[6]
Banfield, Ann. 1982. Unspeakable Sentences. Routledge and Kegan Paul, Boston.
[7]
Barzilay, Regina, Michael Collins, Julia Hirschberg, and Steve Whittaker. 2000. The rules behind roles: Identifying speaker role in radio broadcasts. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, July 30-August 3, pages 679-684.
[8]
Biber, Douglas. 1993. Co-occurrrence patterns among collocations: A tool for corpus-based lexical knowledge acquisition. Computational Linguistics, 19(3):531-538.
[9]
Brill, Eric. 1992. A simple rule-based part of speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (ANLP-92), Trenton, Italy, April 1-3 pages 152-155.
[10]
Bruce, Rebecca and Janyce Wiebe. 1999. Recognizing subjectivity: A case study of manual tagging. Natural Language Engineering, 5(2):187-205.
[11]
Carbonell, Jaime G. 1979. Subjective Understanding: Computer Models of Belief Systems. Ph.D. thesis, and Technical Report no. 150, Department of Computer Science, Yale University, New Haven, CT.
[12]
Carlson, Lynn, Daniel Marcu, and Mary Ellen Okurowski. 2001. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Proceedings of the Second SIG dial Workshop on Discourse and Dialogue (SIGdial-2001), Aalborg, Denmark, September 1-2, pages 30-39.
[13]
Chatman, Seymour. 1978. Story and Discourse: Narrative Structure in Fiction and Film. Cornell University Press, Ithaca, NY.
[14]
Church, Kenneth W. and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16:22-29.
[15]
Cohn, Dorrit. 1978. Transparent Minds: Narrative Modes for Representing Consciousness in Fiction. Princeton University Press, Princeton, NJ.
[16]
Copeck, Terry, Kim Barker, Sylvain Delisle, and Stan Szpakowicz. 2000. Automating the measurement of linguistic features to help classify texts as technical. In Proceedings of the Seventh Conference on Automatic NLP (TALN-2000), Lausanne, Switzerland, October 16-18, pages 101-110.
[17]
Dagan, Ido, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL-94), Las Cruces, NM, June 27-30, pages 272-278.
[18]
Dave, Kushal, Steve Lawrence, and David M. Pennock. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of produce reviews. In Proceedings of the 12th International World Wide Web Conference (WWW2003), Budapest, May 20-24.
[19]
Dole¿el, Lubomir. 1973. Narrative Modes in Czech Literature. University of Toronto Press, Toronto, Ontario, Canada.
[20]
Dyer, Michael G. 1982. Affect processing for narratives. In Proceedings of the Second National Conference on Artificial Intelligence (AAAI-82), Pittsburgh, August 18-20, pages 265-268.
[21]
Everitt, Brian S. 1977. The Analysis of Contingency Tables. Chapman and Hall, London.
[22]
Fludernik, Monika. 1993. The Fictions of Language and the Languages of Fiction. Routledge, London.
[23]
Fodor, Janet Dean. 1979. The Linguistic Description of Opaque Contexts, volume 13 of Outstanding Dissertations in Linguistics. Garland, New York and London.
[24]
General-Inquirer, The. 2000. Available at http://www.wjh.harvard.edu/~ inquirer/spreadsheet_guide.htm.
[25]
Gordon, Andrew, Abe Kazemzadeh, Anish Nair, and Milena Petrova. 2003. Recognizing expressions of commonsense psychology in English text. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan, July 7-12, pages 208-215.
[26]
Hart, Roderick P. 1984. Systematic analysis of political discourse: The development of diction. In K. Sanders et al., editors, Political Communication Yearbook: 1984. Southern Illinois University Press, Carbondale, pages 97-134.
[27]
Hatzivassiloglou, Vasileios and Kathy McKeown. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97), Madrid, July 12, pages 174-181.
[28]
Heise, David. 2000. Affect control theory. Available at http://www.indiana.edu/socpsy/ACT/ index.htm.
[29]
Hindle, Don. 1990. Noun classification from predicate-argument structures. In Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics (ACL-90), Pittsburgh, June 6-9, pages 268-275.
[30]
Hovy, Eduard. 1987. Generating Natural Language under Pragmatic Constraints. Ph.D. thesis, Yale University, New Haven, CT.
[31]
Karlgren, Jussi and Douglass Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the Fifteenth International Conference on Computational Linguistics (COLING-94), pages 1071-1075.
[32]
Karp, Daniel, Yves Schabes, Martin Zaidel, and Dania Egedi. 1994. A freely available wide coverage morphological analyzer for English. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), Nantes, France pages 922-928.
[33]
Kaufer, David. 2000. Flaming: A white paper. Available at www.eudora.com.
[34]
Kessler, Brett, Geoffrey Nunberg, and Hinrich Schütze. 1997. Automatic detection of text genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97), Madrid, July 7-12, pages 32-38.
[35]
Kuroda, S.-Y. 1973. Where epistemology, style and grammar meet: A case study from the Japanese. In P. Kiparsky and S. Anderson, editors, A Festschrift for Morris Halle. Holt, Rinehart & Winston, New York, pages 377-391.
[36]
Kuroda, S.-Y. 1976. Reflections on the foundations of narrative theory--from a linguistic point of view. In T. A. van Dijk, editor, Pragmatics of Language and Literature. North-Holland, Amsterdam, pages 107-140.
[37]
Lee, Lillian. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, MD, pages 25-32.
[38]
Lee, Lillian and Fernando Pereira. 1999. Distributional similarity models: Clustering vs. nearest neighbors. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, MD, pages 33-40.
[39]
Lehnert, Wendy G., Michael Dyer, Peter Johnson, C. J. Yang, and Steve Harley. 1983. BORIS: An Experiment in In-Depth Understanding of Narratives. Artificial Intelligence, 20:15-62.
[40]
Lin, Dekang. 1998. Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-98), Montreal, August 10-14, pages 768-773.
[41]
Lin, Dekang. 1999. Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, MD, pages 317-324.
[42]
Litman, Diane J. and Rebecca J. Passonneau. 1995. Combining multiple knowledge sources for discourse segmentation. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL-95), Cambridge, MA, June 26-30, pages 108-115.
[43]
Macleod, Catherine, Ralph Grishman, and Adam Meyers. 1998. Complex syntax reference manual. Technical report, New York University.
[44]
Marcu, Daniel, Magdalena Romera, and Estibaliz Amorrortu. 1999. Experiments in constructing a corpus of discourse trees: Problems, annotation choices, issues. In Proceedings of the International Workshop on Levels of Representation in Discourse (LORID-99), Edinburgh, July 6-9 pages 71-78.
[45]
Marcus, Mitch, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313-330.
[46]
Mitchell, Tom. 1997. Machine Learning. McGraw-Hill, Boston.
[47]
Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), Philadelphia, July 6-7, pages 79-86.
[48]
Quirk, Randolph, Sidney Greenbaum, Geoffry Leech, and Jan Svartvik. 1985. A Comprehensive Grammar of the English Language. Longman, New York.
[49]
Riloff, Ellen and Rosie Jones. 1999. Learning dictionaries for information extraction by multi-level Bootstrapping. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-1999), Orlando, FL, July 18-22, pages 474-479.
[50]
Riloff, Ellen and Janyce Wiebe. 2003. Learning extraction patterns for subjective expressions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2003), Sapporo, Japan, July 11-12, pages 105-112.
[51]
Riloff, Ellen, Janyce Wiebe, and Theresa Wilson. 2003. Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003), Edmonton, Alberta, Canada, May 31-June 1, pages 25-32.
[52]
Sack, Warren. 1995. Representing and recognizing point of view. In Proceedings of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, page 152.
[53]
Samuel, Ken, Sandra Carberry, and K. Vijay-Shanker. 1998. Dialogue act tagging with transformation-based learning. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-98), Montreal, August 10-14, pages 1150-1156.
[54]
Smajda, Frank. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143-177.
[55]
Spertus, Ellen. 1997. Smokey: Automatic recognition of hostile messages. In Proceedings of the Ninth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-97), Providence, RI, July 27-31, pages 1058-1065.
[56]
Stein, Dieter and Susan Wright, editors. 1995. Subjectivity and Subjectivisation. Cambridge University Press, Cambridge.
[57]
Terveen, Loren, Will Hill, Brian Amento, David McDonald, and Josh Creter. 1997. Building task-specific interfaces to high volume conversational data. In Proceedings of the Conference on Human Factors in Computing Systems (CHI-97), Los Angeles, April 18-23, pages 226-233.
[58]
Teufel, Simone and Marc Moens. 2000. What's yours and what's mine: Determining intellectual attribution in scientific texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the Workshop on Very Large Corpora (EMNLP/VLC-2000), Hong Kong, October 7-8, pages 9-17.
[59]
Tong, Richard. 2001. An operational system for detecting and tracking opinions in on-line discussions. In Working Notes of the SIGIR Workshop on Operational Text Classification, New Orleans, September 9-13, pages 1-6.
[60]
Turney, Peter. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Philadelphia, July 7-12, pages 417-424.
[61]
Uspensky, Boris. 1973. A Poetics of Composition. University of California Press, Berkeley, and Los Angeles.
[62]
van Dijk, Teun A. 1988. News as Discourse. Erlbaum, Hillsdale, NJ.
[63]
Weeber, Marc, Rein Vos, and R. Harald Baayen. 2000. Extracting the lowest-frequency words: Pitfalls and possibilities. Computational Linguistics, 26(3):301-317.
[64]
Wiebe, Janyce and Theresa Wilson. 2002. Learning to disambiguate potentially subjective expressions. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002), Taipei, Taiwan, pages 112-118.
[65]
Wiebe, Janyce. 1990. Recognizing Subjective Sentences: A Computational Investigation of Narrative Text. Ph.D. thesis, State University of New York at Buffalo.
[66]
Wiebe, Janyce. 1994. Tracking point of view in narrative. Computational Linguistics, 20(2):233-287.
[67]
Wiebe, Janyce. 2000. Learning subjective adjectives from corpora. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, July 30-August 3, pages 735-740.
[68]
Wiebe, Janyce, Eric Breck, Chris Buckley, Claire Cardie, Paul Davis, Bruce Fraser, Diane Litman, David Pierce, Ellen Riloff, Theresa Wilson, David Day, and Mark Maybury. 2003. Recognizing and organizing opinions expressed in the world press. In Working Notes of the AAAI Spring Symposium in New Directions in Question Answering, Palo Alto, CA, pages 12-19.
[69]
Wiebe, Janyce, Rebecca Bruce, Matthew Bell, Melanie Martin, and Theresa Wilson. 2001. A corpus study of evaluative and speculative language. In Proceedings of the Second ACL SIGdial Workshop on Discourse and Dialogue (SIGdial-2001), Aalborg, Denmark, September 1-2, pages 186-195.
[70]
Wiebe, Janyce, Rebecca Bruce, and Thomas O'Hara. 1999. Development and use of a gold standard data set for subjectivity classifications. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), College Park, MD, pages 246-253.
[71]
Wiebe, Janyce, Kenneth McKeever, and Rebecca Bruce. 1998. Mapping collocational properties into machine learning features. In Proceedings of the Sixth Workshop on Very Large Corpora (WVLC-98), Montreal, August 15-16, pages 225-233.
[72]
Wiebe, Janyce and William J. Rapaport. 1986. Representing de re and de dicto belief reports in discourse and narrative. Proceedings of the IEEE, 74:1405-1413.
[73]
Wiebe, Janyce and William J. Rapaport. 1988. A computational theory of perspective and reference in narrative. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics (ACL-88), Buffalo, NY, pages 131-138.
[74]
Wiebe, Janyce M. and William J. Rapaport. 1991. References in narrative text. Noûs, 25(4):457-486.
[75]
Wiebe, Janyce, Theresa Wilson, and Matthew Bell. 2001. Identifying collocations for recognizing opinions. In Proceedings of the ACL-01 Workshop on Collocation: Computational Extraction, Analysis, and Exploitation, Toulouse, France, July 7, pages 24-31.
[76]
Wilson, Theresa and Janyce Wiebe. 2003. Annotating opinions in the world press. In Proceedings of the Fourth SIGdial Workshop on Discourse and Dialogue (SIGdial-2003), Sapporo, Japan, July 5-6, pages 13-22.
[77]
Yu, Hong and Vasileios Hatzivassiloglou. 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2003), Sapporo, Japan, July 11-12, pages 129-136.

Cited By

View all
  • (2024)A Comparative Study of Different Pre-trained Language Models for Sentiment Analysis of Human-Computer Negotiation DialogueKnowledge Science, Engineering and Management10.1007/978-981-97-5501-1_23(301-317)Online publication date: 16-Aug-2024
  • (2023)Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in ItalyComputers in Biology and Medicine10.1016/j.compbiomed.2023.106876158:COnline publication date: 1-May-2023
  • (2022)Video Summarization using Text Subjectivity ClassificationProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3539637.3556998(133-141)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Linguistics
Computational Linguistics  Volume 30, Issue 3
September 2004
145 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 September 2004
Published in COLI Volume 30, Issue 3

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Comparative Study of Different Pre-trained Language Models for Sentiment Analysis of Human-Computer Negotiation DialogueKnowledge Science, Engineering and Management10.1007/978-981-97-5501-1_23(301-317)Online publication date: 16-Aug-2024
  • (2023)Lexicon-based sentiment analysis to detect opinions and attitude towards COVID-19 vaccines on Twitter in ItalyComputers in Biology and Medicine10.1016/j.compbiomed.2023.106876158:COnline publication date: 1-May-2023
  • (2022)Video Summarization using Text Subjectivity ClassificationProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3539637.3556998(133-141)Online publication date: 7-Nov-2022
  • (2022)Data is about detailProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI10.1145/3522664.3528604(145-156)Online publication date: 16-May-2022
  • (2021)Bias detection in Wikipedia articles. A study on Polish and English Datasets.IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology10.1145/3486622.3494007(589-594)Online publication date: 14-Dec-2021
  • (2021)SubjectivITA: An Italian Corpus for Subjectivity Detection in NewspapersExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-030-85251-1_4(40-52)Online publication date: 21-Sep-2021
  • (2020)Analysis of the Subjectivity Level in Fake News FragmentsProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430978(233-240)Online publication date: 30-Nov-2020
  • (2020)Don’t Let Me Be Misunderstood:Comparing Intentions and Perceptions in Online DiscussionsProceedings of The Web Conference 202010.1145/3366423.3380273(2066-2077)Online publication date: 20-Apr-2020
  • (2020)Location-based Sentiment Analyses and Visualization of Twitter Election DataDigital Government: Research and Practice10.1145/33399091:2(1-19)Online publication date: 9-Apr-2020
  • (2020)Comparing mobile apps by identifying ‘Hot’ featuresFuture Generation Computer Systems10.1016/j.future.2018.02.008107:C(659-669)Online publication date: 1-Jul-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media