[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Exploring the Structure of Broadcast News for Topic Segmentation

  • Conference paper
Human Language Technology. Challenges of the Information Society (LTC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Included in the following conference series:

Abstract

This paper describes our on-going work toward the improvement of Broadcast News story segmentation module. We have tried to improve our baseline algorithm by further exploring the typical structure of a broadcast news show, first by training a CART and then by integrating it in a 2-stage algorithm that is able to deal with shows with double anchors. In order to deal with shows with a thematic anchor, a more complex approach is adopted including a topic classification stage. The automatic segmentation is currently being compared with the manual segmentation done by a professional media watch company. The results are very promising so far, specially taking into account that no video information is used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., Caseiro, D.: Broadcast news subtitling system in Portuguese. In: Proc. ICASSP 2008, Las Vegas, March, pp. 1561–1564 (2008)

    Google Scholar 

  2. Ostendorf, M., Favre, B., Grishman, R., Hakkani-Tüur, D., Harper, M., Hillard, D., Hirschberg, J., Ji, H., Kahn, J., Liu, Y., Maskey, S., Matusov, E., Ney, H., Rosenberg, A., Shriberg, E., Wang, W., Woofers, C.: Speech segmentation and spoken document processing. IEEE Signal Processing Magazine 25(3), 59–69 (2008)

    Article  Google Scholar 

  3. Rosenberg, A., Sharifi, M., Hirschberg, J.: Varying input segmentation for story boundary detection in english, arabic, and mandarin broadcast news. In: Proc. Interspeech 2008, Antwerp, Belgium, September 2007, pp. 2589–2592 (2007)

    Google Scholar 

  4. Kozima, H.: Text segmentation based on similarity between words. In: 31st Annual Meeting of the ACL, Columbus, Ohio, USA, June 1993, pp. 286–288 (1993)

    Google Scholar 

  5. Passonneau, R., Litman, D.: Discourse segmentation by human and automated means. Comput. Linguist. 23(1), 103–139 (1997)

    Google Scholar 

  6. Hearst, M.: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)

    Google Scholar 

  7. Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: 41st Annual Meeting of ACL, Sapporo, Japan, July 2003, pp. 562–569 (2003)

    Google Scholar 

  8. Beeferman, D., Berger, A., Laffert, J.: Statistical models for text segmentation. Machine Learning 31(1-3), 177–210 (1999)

    Article  MATH  Google Scholar 

  9. Shriberg, E., Stolcke, A., Hakkani-Tür, D.: Prosody based automatic segmentation of speech into sentences and topics. Speech Communication 32(1-2), 127–154 (2000)

    Article  Google Scholar 

  10. Rosenberg, A., Hirschberg, J.: Story segmentation of broadcast news in english, mandarin and arabic. In: HLT/NAACL 2006, New York, USA, June 2006, pp. 125–128 (2006)

    Google Scholar 

  11. Tür, G., Hakkani-Tür, D., Stolcke, A., Shriberg, E.: Integrating prosodic and lexical cues for automatic topic segmentation. Computational Linguistics 27, 31–57 (2001)

    Article  Google Scholar 

  12. Wayne, C.L.: Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In: LREC 2000, Athens, Greece, May 2000, pp. 1487–1494 (2000)

    Google Scholar 

  13. Levow, G.A.: Assessing prosodic and text features for segmentation of mandarin broadcast news. In: HLT/NAACL 2004, Boston, Massachusetts, USA, May 2004, pp. 28–32 (2004)

    Google Scholar 

  14. Palmer, D., Reichman, M., Yaich, E.: Feature selection for trainable multilingual broadcast news segmentation. In: HLT/NAACL 2004, Boston, Massachusetts, USA, May 2004, pp. 89–92 (2004)

    Google Scholar 

  15. Barzilay, R., Collins, M., Hirschberg, J., Whittaker, S.: The rules behind roles: Identifying speaker role in radio broadcast. In: Proc. AAAI 2000, Austin, USA, July 2000, pp. 679–684 (2000)

    Google Scholar 

  16. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees, Wadsworth, NY (1983)

    Google Scholar 

  17. Trancoso, I., Neto, J., Meinedo, H., Amaral, R.: Evaluation of an alert system for selective dissemination of broadcast news. In: Proc. Eurospeech 2003, Geneva, Switzerland, September 2003, pp. 1257–1260 (2003)

    Google Scholar 

  18. Amaral, R., Meinedo, H., Caseiro, D., Trancoso, I., Neto, J.: Automatic vs. manual topic segmentation and indexation in broadcast news. In: Proc. IV Jornadas en Tecnologia del Habla, Zaragoza, Spain, November 2006, pp. 123–128 (2006)

    Google Scholar 

  19. Smeaton, A., Over, P., Kraaij, W.: Trecvid: evaluating the effectiveness of information retrieval tasks on digital video. In: MULTIMEDIA 2004: Proceedings of the 12th annual ACM international conference on Multimedia, pp. 652–655. ACM Press, New York (2004)

    Chapter  Google Scholar 

  20. Meinedo, H., Neto, J.: Automatic speech annotation and transcription in a broadcast news task. In: Proc. MSDR 2003, Hong Kong, April 2003, pp. 95–100 (2003)

    Google Scholar 

  21. Amaral, R., Trancoso, I.: Topic segmentation and indexation in a media watch system. In: Proc. Interspeech 2008, Brisbane, Australia, September 2008, pp. 2183–2186 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amaral, R., Trancoso, I. (2009). Exploring the Structure of Broadcast News for Topic Segmentation. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics