[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/980691.980714dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Text segmentation with multiple surface linguistic cues

Published: 10 August 1998 Publication History

Abstract

In general, a certain range of sentences in a text, is widely assumed to form a coherent unit which is called a discourse segment. Identifying the segment boundaries is a first step to recognize the structure of a text. In this paper, we describe a method for identifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues, though our experiments might be small-scale. We also present a method of training the weights for multiple linguistic cues automatically without the overfitting problem.

References

[1]
D. Carter. 1994. Improving Language Models by Clustering Training Sentences. Proc. of the 4th Conference on Applied Natural Language Processing, pages 59--64.
[2]
R. Cohen. 1987. Analyzing the structure of argumentative discourse. Computational Linguistics, 13: 11--24.
[3]
W. A. Gale, K. W. Church, and D. Yarowsky. 1992. Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs. In Proc. of the 30th Annual Meeting of the Association for Computational Linguistics, pages 249--256.
[4]
B. J. Grosz and C. L. Sidner. 1986. Attention, intention, and the structure of discourse. Computational Linguistics, 12(3): 175--204.
[5]
H. A. K. Halliday and R. Hasan. 1976. Cohesion in English. Longman.
[6]
M. A. Hearst. 1994. Multi-Paragraph Segmentation of Expository Texts. In Proc. of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 9--16.
[7]
R. Iyer, M. Ostendorf, and J. R. Rohlicek. 1994. Language modeling with sentence-level mixtures. In Proc. of the Human Language Technology Workshop 1994, pages 82--87.
[8]
J. D. Jobson. 1991. Applied Multivariate Data Analysis Volume I: Regression and Experimental Design. Springer-Verlag.
[9]
H. Kozima. 1993. Text segmentation based on similarity between words'. In Proc. of the 31st Annual Meeting of the Association for Computational Linguistics, pages 286--288.
[10]
S. Kurohashi and M. Nagao. 1994. Automatic Detection of Discourse Structure by Checking Surfce Information in Sentence. In Proc. of the 15th International Conference on Computational Linguistics, pages 1123--1127.
[11]
D. J. Litman and R. J. Passonneau. 1995. Combining Multiple Knowledge Sources for Discourse. In Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics.
[12]
S. W. McRoy. 1992. Using multiple knowledge sources for word sense discrimination. Computational Linguistics, 18(1): 1--30.
[13]
J. Morris and G. Hirst. 1991. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics, 17(1): 21--48.
[14]
K. Nagano. 1986. Bunsho-ron Sousetsu. Asakura. in Japanese.
[15]
M. Okumura and T. Honda. 1994. Word sense disambiguation and text segmentation based on lexical cohesion. In Proc. of the 15th International Conference on Computational Linguistics, pages 755--761.
[16]
Y. Oono and M. Hamanishi. 1981. Kadokawa Ruigo Shin Jiten. Kadokawa. in Japanese.
[17]
R. J. Passonneau and D. J. Litman. 1993. Intention based Segmentation: Human Reliability and Correlation with Linguistic Cues. In 31st Annual Meeting of the Association for Computational Linguistics, pages 148--155.
[18]
M. Rayner, D. Carter, V. Digalakis, and P. Price. 1994. Combining knowledge sources to reorder n-best speech hypothesis lists. In Proc. of the Human Language technology Workshop 1994, pages 271--221.
[19]
D. Schiffren. 1987. Discourse Markers. Cambridge University Press.
[20]
I. Seiho, 1992. Kosoa no taikei, pages 51--122. National Language Research Institute.
[21]
K. Tokoro. 1987. Gendaibun Rhetoric Dokukaihou. Takumi. in Japanese.
[22]
H. Watanabe. 1996. A Method for Abstracting Newspaper Articles by Using Surface Clues. In Proc. of the 16th International Conference on Computational Linguistics, pages 974--979.
[23]
S. M. Weiss and C. Kulikowski. 1991. Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann.

Cited By

View all
  • (2008)Text segmentation with LDA-based Fisher kernelProceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers10.5555/1557690.1557768(269-272)Online publication date: 16-Jun-2008
  • (2007)Topic segmentation with shared topic detection and alignment of multiple documentsProceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1277741.1277778(199-206)Online publication date: 23-Jul-2007
  • (2005)Feature-based segmentation of narrative documentsProceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing10.5555/1610230.1610237(32-39)Online publication date: 29-Jun-2005
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
August 1998
768 pages

Sponsors

  • Government of Canada
  • Université de Montréal

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 10 August 1998

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2008)Text segmentation with LDA-based Fisher kernelProceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers10.5555/1557690.1557768(269-272)Online publication date: 16-Jun-2008
  • (2007)Topic segmentation with shared topic detection and alignment of multiple documentsProceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1277741.1277778(199-206)Online publication date: 23-Jul-2007
  • (2005)Feature-based segmentation of narrative documentsProceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing10.5555/1610230.1610237(32-39)Online publication date: 29-Jun-2005
  • (2003)Discourse segmentation of multi-party conversationProceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 110.3115/1075096.1075167(562-569)Online publication date: 7-Jul-2003
  • (2003)Domain-independent text segmentation using anisotropic diffusion and dynamic programmingProceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval10.1145/860435.860494(322-329)Online publication date: 28-Jul-2003
  • (2000)Advances in domain independent linear text segmentationProceedings of the 1st North American chapter of the Association for Computational Linguistics conference10.5555/974305.974309(26-33)Online publication date: 29-Apr-2000
  • (2000)Passage-level document retrieval using lexical chainsContent-Based Multimedia Information Access - Volume 110.5555/2835865.2835917(491-506)Online publication date: 12-Apr-2000

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media