[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1645953.1646170acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Text segmentation via topic modeling: an analytical study

Published: 02 November 2009 Publication History

Abstract

In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a text into semantically coherent segments. A major benefit of the proposed approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications like segment retrieval and discourse analysis. The new approach outperforms a standard baseline method and yields significantly better performance than most of the available unsupervised methods on a benchmark dataset.

References

[1]
D. Beeferman, A. Berger, and J. Lafferty. Statistical models for text segmentation. Machine Learning, 34(1-3):177--210, 1999.
[2]
Y. Bestgen. Improving text segmentation using latent semantic analysis: A reanalysis of Choi, Wiemer-Hastings, and Moore (2001). Computational Linguistics, 32(1):5--12, 2006.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. In T. G. Dietterich, S. Becker, and Z. Ghahramani editors, Advances in Neural Information Processing Systems (NIPS), volume 14, pages 601--608, Cambridge, MA, 2002. MIT Press.
[4]
T. Brants, F. Chen, and I. Tsochantaridis. Topic-based document segmentation with probabilistic latent semantic analysis. In Proceedings of the International Conference on Information and Knowledge Management, pages 211--218, McLean, Virginia, U.S.A., 2002. ACM.
[5]
F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. In Proceedings of EMNLP, pages 109--117, Pittsburgh, PA, U.S.A., 2001.
[6]
F. Y. Y. Choi. Advances in domain independant linear text segmentation. In Proceedings of the Conference of North American Chapter of the ACL, pages 26--33, Seattle, WA, U.S.A., 2000.
[7]
P. Fragkou, V. Petridis, and A. Kehagias. A dynamic programming algorithm for linear text segmentation. Journal of Intelligent Information System, 23(2):179--197, 2004.
[8]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101 (supl 1):5228--5235, 2004.
[9]
M. Hearst. TextTiling: Segmenting texts into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64, 1997.
[10]
A. Heidel, H. an Chang, and L. shan Lee. Language model adaptation using latent Dirichlet allocation and an efficient topic inference algorithm. In Proceedings of EuroSpeech, Antwerp, Belgium, 2007.
[11]
D. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Machine Learning Research, 5:361--397, 2004.
[12]
H. Misra, O. Cappé, and F. Yvon. Using LDA to detect semantically incoherent documents. In Proceedings of CoNLL, pages 41--48, Manchester, U.K., 2008.
[13]
J. C. Reynar. Topic Segmentation: Algorithms and Applications. PhD thesis, University of Pennsylvania, 1998.
[14]
Q. Sun, R. Li, D. Luo, and S. Wu. Text segmentation with LDA-based Fisher kernel. In Proceedings of ACL-08: HLT, Short Papers, pages 269--272, Columbus, Ohio, June 2008.
[15]
M. Utiyama and H. Isahara. A statistical model for domain-independent text segmentation. In Meeting of the Association for Computational Linguistics, pages 491--498, Bergen, Norway, 2001.

Cited By

View all
  • (2024)Methods for Solving the Problem of Topic Segmentation of Texts Based on Knowledge GraphsJournal of Computer and Systems Sciences International10.1134/S106423072470047363:4(642-662)Online publication date: 24-Nov-2024
  • (2024)Enhanced Document Segmentation at Paragraph Level2024 6th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT)10.1109/ECNCT63103.2024.10704470(528-535)Online publication date: 19-Jul-2024
  • (2024)Coherence Graphs: Bridging the Gap in Text Segmentation with Unsupervised LearningNatural Language Processing and Information Systems10.1007/978-3-031-70242-6_14(139-149)Online publication date: 20-Sep-2024
  • Show More Cited By

Index Terms

  1. Text segmentation via topic modeling: an analytical study

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
    November 2009
    2162 pages
    ISBN:9781605585123
    DOI:10.1145/1645953
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic programming
    2. latent dirichlet allocation
    3. text segmentation
    4. unsupervised topic modeling

    Qualifiers

    • Poster

    Conference

    CIKM '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Methods for Solving the Problem of Topic Segmentation of Texts Based on Knowledge GraphsJournal of Computer and Systems Sciences International10.1134/S106423072470047363:4(642-662)Online publication date: 24-Nov-2024
    • (2024)Enhanced Document Segmentation at Paragraph Level2024 6th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT)10.1109/ECNCT63103.2024.10704470(528-535)Online publication date: 19-Jul-2024
    • (2024)Coherence Graphs: Bridging the Gap in Text Segmentation with Unsupervised LearningNatural Language Processing and Information Systems10.1007/978-3-031-70242-6_14(139-149)Online publication date: 20-Sep-2024
    • (2023)Algorithmic Segmentation of Job Ads Using Textual AnalysisComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_23(287-300)Online publication date: 26-Feb-2023
    • (2023)DRIP: Segmenting individual requirements from software requirement documentsSoftware: Practice and Experience10.1002/spe.330354:5(842-874)Online publication date: 19-Dec-2023
    • (2022)Neural Text Segmentation and its Application to Sentiment AnalysisIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298336034:2(828-842)Online publication date: 1-Feb-2022
    • (2022)The Influnce Stemmer Truncating and Statistical in Mapping Students Research Trends with Latent Dirichlet Allocation (LDA)2022 6th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE)10.1109/ICITISEE57756.2022.10057809(57-62)Online publication date: 13-Dec-2022
    • (2022)A Probabilistic Topic Model based on Short Distance Co-occurrencesExpert Systems with Applications10.1016/j.eswa.2022.116518(116518)Online publication date: Jan-2022
    • (2022)EDU-Capsule: aspect-based sentiment analysis at clause levelKnowledge and Information Systems10.1007/s10115-022-01797-z65:2(517-541)Online publication date: 6-Dec-2022
    • (2022)An Analysis of Various Text Segmentation ApproachesProceedings of International Conference on Intelligent Cyber-Physical Systems10.1007/978-981-16-7136-4_22(285-302)Online publication date: 24-Jan-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media