[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/645319.649277guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Genre Classification and Domain Transfer for Information Filtering

Published: 25 March 2002 Publication History

Abstract

The World Wide Web is a vast repository of information, but the sheer volume makes it difficult to identify useful documents. We identify document genre is an important factor in retrieving useful documents and focus on the novel document genre dimension of subjectivity. We investigate three approaches to automatically classifying documents by genre: traditional bag of words techniques, part-of-speech statistics, and hand-crafted shallow linguistic features. We are particularly interested in domain transfer: how well the learned classifiers generalize from the training corpus to a new document corpus. Our experiments demonstrate that the part-of-speech approach is better than traditional bag of words techniques, particularly in the domain transfer conditions.

References

[1]
Shlomo Argamon, Moshe Koppel, and Galit Avneri. Routing documents according to style. In First International Workshop on Innovative Information Systems , 1998.
[2]
Eric Brill. Some advances in transformation-based parts of speech tagging. In AAAI , 1994.
[3]
Maya Dimitrova, Aidan Finn, Nicholas Kushmerick, and Barry Smyth. Web genre visualisation. Submitted to Conference on Human Factors in Computing Systems, 2002.
[4]
J. Karlgren. Stylistic experiments in information retrieval. In T. Strzalkowski, editor, Natural Language Information Retrieval . Kluwer, 1999.
[5]
Jussi Karlgren, Ivan Bretan, Johan Dewe, Anders Hallberg, and Niklas Wolkert. Iterative information retrieval using fast clustering and usage-specific genres. In Eight DELOS workshop on User Interfaces in Digital Libraries , pages 85-92, Stockholm, Sweden, 1998.
[6]
Ross Quinlan. C4.5: Programs for Machine Learning . Morgan Kaufman, 1993.
[7]
A. Rauber and A. Muller-Kogler. Integrating automatic genre analysis into digital libraries. In First ACM-IEEE Joint Conf on Digital Libraries , 2001.
[8]
Janyce M. Wiebe. Learning subjective adjectives from corpora. In AAAI , 2000.

Cited By

View all
  • (2016)Hierarchical Label Propagation and Discovery for Machine Generated EmailProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835780(317-326)Online publication date: 8-Feb-2016
  • (2015)Going In-DepthProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788599(2109-2118)Online publication date: 10-Aug-2015
  • (2012)Locational relativity and domain constraints in spatial questionsProceedings of the 20th International Conference on Advances in Geographic Information Systems10.1145/2424321.2424350(219-228)Online publication date: 6-Nov-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
March 2002
362 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 March 2002

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Hierarchical Label Propagation and Discovery for Machine Generated EmailProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835780(317-326)Online publication date: 8-Feb-2016
  • (2015)Going In-DepthProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788599(2109-2118)Online publication date: 10-Aug-2015
  • (2012)Locational relativity and domain constraints in spatial questionsProceedings of the 20th International Conference on Advances in Geographic Information Systems10.1145/2424321.2424350(219-228)Online publication date: 6-Nov-2012
  • (2009)Improving product review search experiences on general search enginesProceedings of the 11th International Conference on Electronic Commerce10.1145/1593254.1593269(107-116)Online publication date: 12-Aug-2009
  • (2009)A survey on sentiment detection of reviewsExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.02.06336:7(10760-10773)Online publication date: 1-Sep-2009
  • (2008)Opinion Mining and Sentiment AnalysisFoundations and Trends in Information Retrieval10.1561/15000000112:1-2(1-135)Online publication date: 1-Jan-2008
  • (2007)Automatic classification of web search resultsProceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers10.5555/1780653.1780671(65-74)Online publication date: 10-Dec-2007
  • (2007)Filtering product reviews from web search resultsProceedings of the 2007 ACM symposium on Document engineering10.1145/1284420.1284467(196-198)Online publication date: 28-Aug-2007
  • (2004)Mining and summarizing customer reviewsProceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1014052.1014073(168-177)Online publication date: 22-Aug-2004
  • (2004)Classifying racist texts using a support vector machineProceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1008992.1009074(468-469)Online publication date: 25-Jul-2004
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media