[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1610230.1610237dlproceedingsArticle/Chapter ViewAbstractPublication PagesfeatureengConference Proceedingsconference-collections
research-article
Free access

Feature-based segmentation of narrative documents

Published: 29 June 2005 Publication History

Abstract

In this paper we examine topic segmentation of narrative documents, which are characterized by long passages of text with few headings. We first present results suggesting that previous topic segmentation approaches are not appropriate for narrative text. We then present a feature-based method that combines features from diverse sources as well as learned features. Applied to narrative books and encyclopedia articles, our method shows results that are significantly better than previous segmentation approaches. An analysis of individual features is also provided and the benefit of generalization using outside resources is shown.

References

[1]
Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine Learning, 34:177--210.
[2]
Kenneth R. Beesley and Lauri Karttunen. 2003. Finite State Morphology... CSLI Publications, Palo Alto, CA.
[3]
Thorsten Brants, Francine Chen, and Ioannis Tsochantaridis. 2002. Topic-based document segmentation with probabilistic latent semantic analysis. In Proceedings of CIKM, pg. 211--218.
[4]
Thorsten Brants. 2000. TnT -- a statistical part-of-speech tagger. In Proceedings of the Applied NLP Conference.
[5]
Freddy Choi. 2000. Improving the efficiency of speech interfaces for text navigation. In Proceedings of IEEE Colloquium: Speech and Language Processing for Disabled and Elderly People.
[6]
Nello Cristianini and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines... Cambridge University Press.
[7]
Thomas Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10:1895--1923.
[8]
Marti A. Hearst. 1994. Multi-paragraph segmentation of expository text. In Meeting of ACL, pg. 9--16.
[9]
Thorsten Joachims, 1999. Advances in Kernel Methods - Support Vector Learning, chapter Making large-Scale SVM Learning Practical. MIT-Press.
[10]
Hideki Kozima and Teiji Furugori. 1994. Segmenting narrative text into coherent scenes. In Literary and Linguistic Computing, volume 9, pg. 13--19.
[11]
Hideki Kozima. 1993. Text segmentation based on similarity between words. In Meeting of ACL, pg. 286--288.
[12]
Hang Li and Kenji Yamanishi. 2000. Topic analysis using a finite mixture model. In Proceedings of Joint SIGDAT Conference of EMNLP and Very Large Corpora, pg. 35--44.
[13]
Hajime Mochizuki, Takeo Honda, and Manabu Okumura. 1998. Text segmentation with multiple surface linguistic cues. In COLING-ACL, pg. 881--885.
[14]
Lev Pevzner and Marti Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, pg. 19--36.
[15]
Jeffrey Reynar. 1999. Statistical models for topic segmentation. In Proceedings of ACL, pg. 357--364.
[16]
Nicola Stokes, Joe Carthy, and Alex Smeaton. 2002. Segmenting broadcast news streams using lexical chains. In Proceedings of Starting AI Researchers Symposium, (STAIRS 2002), pg. 145--154.
[17]
Ian H. Witten and Eibe Frank. 2000. Data Mining: Practical machine learning tools with Java implementations... Morgan Kaufmann.

Cited By

View all
  • (2014)Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visual DocumentsProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578753(217-224)Online publication date: 1-Apr-2014
  • (2013)Segmentation strategies for passage retrieval in audio-visual documentsProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484237(1143-1143)Online publication date: 28-Jul-2013
  • (2010)Evaluating hierarchical discourse segmentationHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858141(993-1001)Online publication date: 2-Jun-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
FeatureEng '05: Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
June 2005
82 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 29 June 2005

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)4
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visual DocumentsProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578753(217-224)Online publication date: 1-Apr-2014
  • (2013)Segmentation strategies for passage retrieval in audio-visual documentsProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484237(1143-1143)Online publication date: 28-Jul-2013
  • (2010)Evaluating hierarchical discourse segmentationHuman Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1857999.1858141(993-1001)Online publication date: 2-Jun-2010
  • (2010)Linear text segmentation using classification techniquesProceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India10.1145/1858378.1858436(1-4)Online publication date: 16-Sep-2010
  • (2009)An analysis of quantitative aspects in the evaluation of thematic segmentation algorithmsProceedings of the 7th SIGdial Workshop on Discourse and Dialogue10.5555/1654595.1654622(144-151)Online publication date: 15-Jul-2009
  • (2006)Word distributions for thematic segmentation in a support vector machine approachProceedings of the Tenth Conference on Computational Natural Language Learning10.5555/1596276.1596296(101-108)Online publication date: 8-Jun-2006

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media