Abstract
This paper describes our on-going work toward the improvement of Broadcast News story segmentation module. We have tried to improve our baseline algorithm by further exploring the typical structure of a broadcast news show, first by training a CART and then by integrating it in a 2-stage algorithm that is able to deal with shows with double anchors. In order to deal with shows with a thematic anchor, a more complex approach is adopted including a topic classification stage. The automatic segmentation is currently being compared with the manual segmentation done by a professional media watch company. The results are very promising so far, specially taking into account that no video information is used.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Neto, J., Meinedo, H., Viveiros, M., Cassaca, R., Martins, C., Caseiro, D.: Broadcast news subtitling system in Portuguese. In: Proc. ICASSP 2008, Las Vegas, March, pp. 1561–1564 (2008)
Ostendorf, M., Favre, B., Grishman, R., Hakkani-Tüur, D., Harper, M., Hillard, D., Hirschberg, J., Ji, H., Kahn, J., Liu, Y., Maskey, S., Matusov, E., Ney, H., Rosenberg, A., Shriberg, E., Wang, W., Woofers, C.: Speech segmentation and spoken document processing. IEEE Signal Processing Magazine 25(3), 59–69 (2008)
Rosenberg, A., Sharifi, M., Hirschberg, J.: Varying input segmentation for story boundary detection in english, arabic, and mandarin broadcast news. In: Proc. Interspeech 2008, Antwerp, Belgium, September 2007, pp. 2589–2592 (2007)
Kozima, H.: Text segmentation based on similarity between words. In: 31st Annual Meeting of the ACL, Columbus, Ohio, USA, June 1993, pp. 286–288 (1993)
Passonneau, R., Litman, D.: Discourse segmentation by human and automated means. Comput. Linguist. 23(1), 103–139 (1997)
Hearst, M.: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics 23(1), 33–64 (1997)
Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: 41st Annual Meeting of ACL, Sapporo, Japan, July 2003, pp. 562–569 (2003)
Beeferman, D., Berger, A., Laffert, J.: Statistical models for text segmentation. Machine Learning 31(1-3), 177–210 (1999)
Shriberg, E., Stolcke, A., Hakkani-Tür, D.: Prosody based automatic segmentation of speech into sentences and topics. Speech Communication 32(1-2), 127–154 (2000)
Rosenberg, A., Hirschberg, J.: Story segmentation of broadcast news in english, mandarin and arabic. In: HLT/NAACL 2006, New York, USA, June 2006, pp. 125–128 (2006)
Tür, G., Hakkani-Tür, D., Stolcke, A., Shriberg, E.: Integrating prosodic and lexical cues for automatic topic segmentation. Computational Linguistics 27, 31–57 (2001)
Wayne, C.L.: Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In: LREC 2000, Athens, Greece, May 2000, pp. 1487–1494 (2000)
Levow, G.A.: Assessing prosodic and text features for segmentation of mandarin broadcast news. In: HLT/NAACL 2004, Boston, Massachusetts, USA, May 2004, pp. 28–32 (2004)
Palmer, D., Reichman, M., Yaich, E.: Feature selection for trainable multilingual broadcast news segmentation. In: HLT/NAACL 2004, Boston, Massachusetts, USA, May 2004, pp. 89–92 (2004)
Barzilay, R., Collins, M., Hirschberg, J., Whittaker, S.: The rules behind roles: Identifying speaker role in radio broadcast. In: Proc. AAAI 2000, Austin, USA, July 2000, pp. 679–684 (2000)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees, Wadsworth, NY (1983)
Trancoso, I., Neto, J., Meinedo, H., Amaral, R.: Evaluation of an alert system for selective dissemination of broadcast news. In: Proc. Eurospeech 2003, Geneva, Switzerland, September 2003, pp. 1257–1260 (2003)
Amaral, R., Meinedo, H., Caseiro, D., Trancoso, I., Neto, J.: Automatic vs. manual topic segmentation and indexation in broadcast news. In: Proc. IV Jornadas en Tecnologia del Habla, Zaragoza, Spain, November 2006, pp. 123–128 (2006)
Smeaton, A., Over, P., Kraaij, W.: Trecvid: evaluating the effectiveness of information retrieval tasks on digital video. In: MULTIMEDIA 2004: Proceedings of the 12th annual ACM international conference on Multimedia, pp. 652–655. ACM Press, New York (2004)
Meinedo, H., Neto, J.: Automatic speech annotation and transcription in a broadcast news task. In: Proc. MSDR 2003, Hong Kong, April 2003, pp. 95–100 (2003)
Amaral, R., Trancoso, I.: Topic segmentation and indexation in a media watch system. In: Proc. Interspeech 2008, Brisbane, Australia, September 2008, pp. 2183–2186 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amaral, R., Trancoso, I. (2009). Exploring the Structure of Broadcast News for Topic Segmentation. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)