Abstract
Segmentation of television news videos into programs and stories (after removing advertisements) is a necessary first step for news broadcast analysis. Existing methods have used manually defined presentation styles as an important feature for such segmentation. Manually defined presentation styles make algorithms channel specific and hampers scalability for large number of channels. In this work, we advocate the usebility of overlay text for automatic characterization of broadcast presentation styles. This automatic characterization will minimize the manual intervention required in developing the scalable solutions for television news broadcast segmentation. To this end, we introduce three novel features solely derived from position and content of overlay text bands. These are Bag of Bands (BoB), BoB Templates (BoBT) and Text-based Semantic Similarity (TSS). The BoB features characterize on-screen distribution of text bands and are used with classifiers for advertisement detection. The BoBT features characterize co-occurrence of text bands. Thereby modeling the presentation styles of video shots. Sequences of BoBT features are modeled using Conditional Random Fields (CRFs) for identifying program boundaries. Sequences of features derived from semantic similarity (TSS) between consecutive shots and BoBT feature are used with CRFs for story segmentation. Performances of the proposed features are validated on 360 hours of broadcast data recorded from three Indian English news channels. Benchmark on baseline methods has shown better performance of our proposal.
Similar content being viewed by others
Notes
A manual analysis of our dataset reveals text IUs to be about 87% of total IUs
Supplementary material can be accessed using http://tiny.cc/boTB
References
An E, Ji A, Ng E (2019) Large scale video classification using both visual and audio features on youtube-8 m dataset
Browne P, Czirjek C, Gurrin C, Jarina R, Lee H, Marlow S, McDonald K, Murphy N, O’Connor N E, Smeaton A F et al (2002) Dublin city university video track experiments for trec 2002. In: The Eleventh Text Retrieval Conference. NIST
Chaisorn L, Chua T-S, Koh C-K, Zhao Y, Xu H, Feng H, Tian Q (2003) A two-level multi-modal approach for story segmentation of large news video corpus. In: TRECVID conference,(gaithersburg, washington dc, november 2003). published on-line at http://www.nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html
Charlet D, Damnati G, Bouchekif A, Douib A (2015) Fusion of speaker and lexical information for topic segmentation: A co-segmentation approach. In: International Conference on Acoustics, Speech and Signal Processing. IEEE, pp 5261–5265
Chatzis S P, Demiris Y (2013) The infinite-order conditional random field model for sequential data modeling. IEEE Trans Pattern Anal Mach Intell 35 (6):1523–1534
Chen L, Shen J, Wang W, Ni B (2015) Video object segmentation via dense trajectories. IEEE Trans Multimed 17(12):2225–2234
Chua T-S, Chang S-F, Chaisorn L, Hsu W (2004) Story boundary detection in large broadcast news video archives: techniques, experience and trends. In: International conference on Multimedia. ACM, pp 656–659
Claveau V, Lefèvre S (2015) Topic segmentation of TV-streams by watershed transform and vectorization. Comput Speech Lang 29(1):63–80
Cristianini N, Kandola J, Elisseeff A, Shawe-Taylor J (2006) On kernel target alignment. In: Innovations in Machine Learning. Springer, pp 205–256
Dietterich T G (2002) Machine learning for sequential data: A review. In: Structural, syntactic, and statistical pattern recognition. Springer, pp 15–30
Dimitrova N, Agnihotri L, Wei G (2000) Video classification based on hmm using text and faces. In: European Signal Processing Conference. IEEE, pp 1–4
Direkoglu C, O’Connor N E (2018) Temporal segmentation and recognition of team activities in sports. Mach Vis Appl 29(5):891–913
Duygulu P, yu Chen M, Hauptmann A (2004) Comparison and combination of two novel commercial detection methods. In: International Conference on Multimedia and Expo, vol 2. IEEE, pp 1267–1270
Feng B, Chen Z, Zheng R, Xu B (2014) Multiple style exploration for story unit segmentation of broadcast news video. Multimed Syst 20(4):347–361
Feng B, Ding P, Chen J, Bai J, Xu S, Xu B (2012) Multi-modal information fusion for news story segmentation in broadcast video. In: International Conference on Acoustics, Speech and Signal Processing, pp 1417–1420
Ghosh H, Kopparapu S K, Chattopadhyay T, Khare A, Wattamwar S S, Gorai A, Pandharipande M (2010) Multimodal indexing of multilingual news video. International Journal of Digital Multimedia Broadcasting
Gunter B (2015) The cognitive impact of television news: production attributes and information reception. Springer
Hachten W A, Scotton J F (2015) The world news prism: Digital, social and interactive. Wiley
Hua X-S, Lu L, Zhang H-J (2005) Robust learning-based TV commercial detection. In: International Conference on Multimedia and Expo. IEEE, pp 48–52
IP Television Magazine (2018) Content Aggregators. http://www.iptvmagazine.com/iptvmagazine_directory_content_aggregator.html, Online; accessed September
Jindal A, Tiwari A, Ghosh H (2011) Efficient and language independent news story segmentation for telecast news videos. In: International Symposium on Multimedia. IEEE, pp 458–463
Kannao R, Guha P (2016) Generic TV advertisement detection using progressively balanced perceptron trees. In: Indian Conference on Computer Vision, Graphics and Image Processing. ACM, pp 164–172
Kannao R, Guha P (2015) Overlay text extraction from TV news broadcast. In: Annual IEEE India Conference. IEEE, pp 1–6
Kannao R, Guha P (2016) Story segmentation in TV news broadcast videos. In: International Conference on Pattern Recognition. IEEE
Kannao R, Guha P (2016) TV commercial detection using success based locally weighted kernel combination. In: Multimedia Modeling. Springer, pp 793–805
Kannao R, Guha P (2017) Success based locally weighted multiple kernel combination. Pattern Recogn 68(4):38–51. https://doi.org/10.1016/j.patcog.2017.02.029
Kannao R, Guha P (2019) Segmenting with style: detecting program and story boundaries in TV news broadcast videos. Multimed Tools Appl 78 (22):31925–31957
Kim J W, Cho S-H (2014) Effectively detecting topic boundaries in a news video by using wikipedia. Int J Softw Eng Appl 8(6):229–240
Kim W, Park J, Kim C (2010) A novel method for efficient indoor–outdoor image classification. Signal Process Syst 61(3):251–258
Kraaij W, Smeaton A F, Over P (2004) TRECVid 2004 - an overview. Technical Report, http://doras.dcu.ie/411/1/trecvid_2004_3.pdf
Lafferty J D, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp 282–289
Li H, Jou B, Ellis J G, Morozoff D, Chang S-F (2013) News rover: Exploring topical structures and serendipity in heterogeneous multimedia news. In: International conference on Multimedia. ACM, pp 449–450
Lienhart R (2003) Video OCR: A survey and practitioner’s guide. In: Rosenfeld, Azriel, Daniel D D, DeMenthon (eds) Video Mining, The Springer International Series in Video Computing, vol 6. Springer US, pp 155–183
Liu N, Zhao Y, Zhu Z, Lu H (2011) Exploiting visual-audio-textual characteristics for automatic TV commercial block detection and segmentation. IEEE Trans Multimed 13(5):961–973
Liu Z, Wang Y (2018) TV news story segmentation using deep neural network. In: International Conference on Multimedia & Expo Workshops . IEEE, pp 19–24
Lu X, Leung C-C, Xie L, Ma B, Li H (2013) Broadcast news story segmentation using latent topics on data manifold. In: International Conference on Acoustics, Speech and Signal Processing. IEEE, pp 8465–8469
Misra H, Hopfgartner F, Goyal A, Punitha P, Jose J M (2010) TV news story segmentation based on semantic coherence and content similarity. In: Advances in Multimedia Modeling. Springer, pp 347–357 Montes GómezA,Temporalactivitydetectioninuntrimmedvideoswithrecurrent neuralnetworks.B.S.thesis,UniversitatPolitècnicadeCatalunya,2016.
Mühling M, Ewerth R, Stadelmann T, ZöfelC, Shi B, Freisleben B (2007) University of Marburg at TRECVid 2007: Shot boundary detection and high level feature extraction. In: TREC Video Retrieval Evaluation - 2007. NIST. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.4.org.html.NIST
Nakamura Y, Kanade T (1997) Semantic analysis for video contents extractionspotting by association in news video. In: International conference on Multimedia. ACM, pp 393–401
Perebinossoff P, Gross B, Gross LS (2005) Programming for TV, radio, and the internet: strategy, development, and evaluation. Taylor & Francis
Quśenot GM, Moraru D, Ayache S, Charhad M, el Guironnet M, Carminati L, Mulhem P, ome Gensel J, Pellerin D, Besacier L (2004) Clips-lis-lsr-labri experiments at TRECVid 2004. In: TREC Video Retrieval Evaluation - 2004. NIST. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.4.org.html.NIST
Renoust B, Le D-D, Satoh SI (2016) Visual analytics of political networks from face-tracking of news video. IEEE Trans Multimed 18(11):2184–2195
Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Shen J, Peng J, Shao L (2018) Submodular trajectories for better motion segmentation in videos. IEEE Trans Image Process 27(6):2688–2700
Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: Seven years of TRECVid activity. Comput Vis Image Underst 114(4):411–418
Smeaton AF, Over P, KraaijW(2006) Evaluation campaigns and TRECVid. In: InternationalWorkshop on Multimedia Information Retrieval. ACM, pp 321–330
Smola AJ, Vishwanathan S (2003) Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems, pp 585–592
Su X, Lan Y,Wan R, Qin Y (2009) A fast incremental clustering algorithm. In: International Symposium on Information Processing, pp 175–178
Trojahn TH, Goularte R (2021) Temporal video scene segmentation using deep-learning. Multimed Tools Appl:1–27
Volkmer T, Tahahoghi SMM, Williams HE (2004) RMIT university at TRECVid 2004. In: TREC Video Retrieval Evaluation - 2004. NIST. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.4.org.html.NIST
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3395–3402
Wang W, Shen J, Porikli F, Yang R (2018) Semi-supervised video object segmentation with supertrajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence
Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49
Wang W, Shen J, Yang R, Porikli F (2018) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell (1) 20–33
Wang X, Zheng S, Zhang C, Li R, Gui L (2021) R-yolo: A real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3):888
Wikipedia(2016)Dayparting—Wikipedia,the free encyclopedia.https://en.wikipedia.org/wiki/Dayparting,[Online;accessedJanuary-2017]
Wu J, Kuang Z, Wang L, Zhang W, Wu G (2020) Context-aware rcnn: A baseline for action detection in videos. In: European Conference on Computer Vision. Springer, pp 440–456
Wu X, Satoh S (2013) Ultrahigh-speed TV commercial detection, extraction and matching. IEEE Trans Circ Syst Video Technol 23(6):1054–1069
Xu S, Feng B, Chen Z, Xu B (2013) A general framework of video segmentation to logical unit based on conditional random fields. In: International conference on multimedia retrieval. ACM, pp 247–254
Xu Z, Hu J, Deng W (2016) Recurrent convolutional neural network for video classification. In: IEEE International Conference on Multimedia and Expo. IEEE, pp 1–6
X.Wang, Z.Guo (2008) A novel real-time commercial detection scheme. In: International Conference on Innovative Computing Information and Control, pp 536–536
Zhang L, Zhu Z, Zhao Y (2007) Robust commercial detection system. In: International Conference on Multimedia and Expo. IEEE, pp 587–590
Zhou H,Hermans T,Karandikar A V,Rehg J M(2010)Movie genre classification via scene categorization.In: International conference on Multimedia. ACM,pp747–750
Zlitni T, Bouaziz B, Mahdi W (2015) Automatic topics segmentation for TV news video using prior knowledge. Multimed Tools Appl:1–28
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kannao, R., Guha, P. & Chaudhuri, B. Only overlay text: novel features for TV news broadcast video segmentation. Multimed Tools Appl 81, 30493–30517 (2022). https://doi.org/10.1007/s11042-022-12917-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12917-w