[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Only overlay text: novel features for TV news broadcast video segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Segmentation of television news videos into programs and stories (after removing advertisements) is a necessary first step for news broadcast analysis. Existing methods have used manually defined presentation styles as an important feature for such segmentation. Manually defined presentation styles make algorithms channel specific and hampers scalability for large number of channels. In this work, we advocate the usebility of overlay text for automatic characterization of broadcast presentation styles. This automatic characterization will minimize the manual intervention required in developing the scalable solutions for television news broadcast segmentation. To this end, we introduce three novel features solely derived from position and content of overlay text bands. These are Bag of Bands (BoB), BoB Templates (BoBT) and Text-based Semantic Similarity (TSS). The BoB features characterize on-screen distribution of text bands and are used with classifiers for advertisement detection. The BoBT features characterize co-occurrence of text bands. Thereby modeling the presentation styles of video shots. Sequences of BoBT features are modeled using Conditional Random Fields (CRFs) for identifying program boundaries. Sequences of features derived from semantic similarity (TSS) between consecutive shots and BoBT feature are used with CRFs for story segmentation. Performances of the proposed features are validated on 360 hours of broadcast data recorded from three Indian English news channels. Benchmark on baseline methods has shown better performance of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. A manual analysis of our dataset reveals text IUs to be about 87% of total IUs

  2. Supplementary material can be accessed using http://tiny.cc/boTB

References

  1. An E, Ji A, Ng E (2019) Large scale video classification using both visual and audio features on youtube-8 m dataset

  2. Browne P, Czirjek C, Gurrin C, Jarina R, Lee H, Marlow S, McDonald K, Murphy N, O’Connor N E, Smeaton A F et al (2002) Dublin city university video track experiments for trec 2002. In: The Eleventh Text Retrieval Conference. NIST

  3. Chaisorn L, Chua T-S, Koh C-K, Zhao Y, Xu H, Feng H, Tian Q (2003) A two-level multi-modal approach for story segmentation of large news video corpus. In: TRECVID conference,(gaithersburg, washington dc, november 2003). published on-line at http://www.nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html

  4. Charlet D, Damnati G, Bouchekif A, Douib A (2015) Fusion of speaker and lexical information for topic segmentation: A co-segmentation approach. In: International Conference on Acoustics, Speech and Signal Processing. IEEE, pp 5261–5265

  5. Chatzis S P, Demiris Y (2013) The infinite-order conditional random field model for sequential data modeling. IEEE Trans Pattern Anal Mach Intell 35 (6):1523–1534

    Article  Google Scholar 

  6. Chen L, Shen J, Wang W, Ni B (2015) Video object segmentation via dense trajectories. IEEE Trans Multimed 17(12):2225–2234

    Article  Google Scholar 

  7. Chua T-S, Chang S-F, Chaisorn L, Hsu W (2004) Story boundary detection in large broadcast news video archives: techniques, experience and trends. In: International conference on Multimedia. ACM, pp 656–659

  8. Claveau V, Lefèvre S (2015) Topic segmentation of TV-streams by watershed transform and vectorization. Comput Speech Lang 29(1):63–80

    Article  Google Scholar 

  9. Cristianini N, Kandola J, Elisseeff A, Shawe-Taylor J (2006) On kernel target alignment. In: Innovations in Machine Learning. Springer, pp 205–256

  10. Dietterich T G (2002) Machine learning for sequential data: A review. In: Structural, syntactic, and statistical pattern recognition. Springer, pp 15–30

  11. Dimitrova N, Agnihotri L, Wei G (2000) Video classification based on hmm using text and faces. In: European Signal Processing Conference. IEEE, pp 1–4

  12. Direkoglu C, O’Connor N E (2018) Temporal segmentation and recognition of team activities in sports. Mach Vis Appl 29(5):891–913

    Article  Google Scholar 

  13. Duygulu P, yu Chen M, Hauptmann A (2004) Comparison and combination of two novel commercial detection methods. In: International Conference on Multimedia and Expo, vol 2. IEEE, pp 1267–1270

  14. Feng B, Chen Z, Zheng R, Xu B (2014) Multiple style exploration for story unit segmentation of broadcast news video. Multimed Syst 20(4):347–361

    Article  Google Scholar 

  15. Feng B, Ding P, Chen J, Bai J, Xu S, Xu B (2012) Multi-modal information fusion for news story segmentation in broadcast video. In: International Conference on Acoustics, Speech and Signal Processing, pp 1417–1420

  16. Ghosh H, Kopparapu S K, Chattopadhyay T, Khare A, Wattamwar S S, Gorai A, Pandharipande M (2010) Multimodal indexing of multilingual news video. International Journal of Digital Multimedia Broadcasting

  17. Gunter B (2015) The cognitive impact of television news: production attributes and information reception. Springer

  18. Hachten W A, Scotton J F (2015) The world news prism: Digital, social and interactive. Wiley

  19. Hua X-S, Lu L, Zhang H-J (2005) Robust learning-based TV commercial detection. In: International Conference on Multimedia and Expo. IEEE, pp 48–52

  20. IP Television Magazine (2018) Content Aggregators. http://www.iptvmagazine.com/iptvmagazine_directory_content_aggregator.html, Online; accessed September

  21. Jindal A, Tiwari A, Ghosh H (2011) Efficient and language independent news story segmentation for telecast news videos. In: International Symposium on Multimedia. IEEE, pp 458–463

  22. Kannao R, Guha P (2016) Generic TV advertisement detection using progressively balanced perceptron trees. In: Indian Conference on Computer Vision, Graphics and Image Processing. ACM, pp 164–172

  23. Kannao R, Guha P (2015) Overlay text extraction from TV news broadcast. In: Annual IEEE India Conference. IEEE, pp 1–6

  24. Kannao R, Guha P (2016) Story segmentation in TV news broadcast videos. In: International Conference on Pattern Recognition. IEEE

  25. Kannao R, Guha P (2016) TV commercial detection using success based locally weighted kernel combination. In: Multimedia Modeling. Springer, pp 793–805

  26. Kannao R, Guha P (2017) Success based locally weighted multiple kernel combination. Pattern Recogn 68(4):38–51. https://doi.org/10.1016/j.patcog.2017.02.029

    Article  Google Scholar 

  27. Kannao R, Guha P (2019) Segmenting with style: detecting program and story boundaries in TV news broadcast videos. Multimed Tools Appl 78 (22):31925–31957

    Article  Google Scholar 

  28. Kim J W, Cho S-H (2014) Effectively detecting topic boundaries in a news video by using wikipedia. Int J Softw Eng Appl 8(6):229–240

    Google Scholar 

  29. Kim W, Park J, Kim C (2010) A novel method for efficient indoor–outdoor image classification. Signal Process Syst 61(3):251–258

    Article  Google Scholar 

  30. Kraaij W, Smeaton A F, Over P (2004) TRECVid 2004 - an overview. Technical Report, http://doras.dcu.ie/411/1/trecvid_2004_3.pdf

  31. Lafferty J D, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp 282–289

  32. Li H, Jou B, Ellis J G, Morozoff D, Chang S-F (2013) News rover: Exploring topical structures and serendipity in heterogeneous multimedia news. In: International conference on Multimedia. ACM, pp 449–450

  33. Lienhart R (2003) Video OCR: A survey and practitioner’s guide. In: Rosenfeld, Azriel, Daniel D D, DeMenthon (eds) Video Mining, The Springer International Series in Video Computing, vol 6. Springer US, pp 155–183

  34. Liu N, Zhao Y, Zhu Z, Lu H (2011) Exploiting visual-audio-textual characteristics for automatic TV commercial block detection and segmentation. IEEE Trans Multimed 13(5):961–973

    Article  Google Scholar 

  35. Liu Z, Wang Y (2018) TV news story segmentation using deep neural network. In: International Conference on Multimedia & Expo Workshops . IEEE, pp 19–24

  36. Lu X, Leung C-C, Xie L, Ma B, Li H (2013) Broadcast news story segmentation using latent topics on data manifold. In: International Conference on Acoustics, Speech and Signal Processing. IEEE, pp 8465–8469

  37. Misra H, Hopfgartner F, Goyal A, Punitha P, Jose J M (2010) TV news story segmentation based on semantic coherence and content similarity. In: Advances in Multimedia Modeling. Springer, pp 347–357 Montes GómezA,Temporalactivitydetectioninuntrimmedvideoswithrecurrent neuralnetworks.B.S.thesis,UniversitatPolitècnicadeCatalunya,2016.

  38. Mühling M, Ewerth R, Stadelmann T, ZöfelC, Shi B, Freisleben B (2007) University of Marburg at TRECVid 2007: Shot boundary detection and high level feature extraction. In: TREC Video Retrieval Evaluation - 2007. NIST. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.4.org.html.NIST

  39. Nakamura Y, Kanade T (1997) Semantic analysis for video contents extractionspotting by association in news video. In: International conference on Multimedia. ACM, pp 393–401

  40. Perebinossoff P, Gross B, Gross LS (2005) Programming for TV, radio, and the internet: strategy, development, and evaluation. Taylor & Francis

  41. Quśenot GM, Moraru D, Ayache S, Charhad M, el Guironnet M, Carminati L, Mulhem P, ome Gensel J, Pellerin D, Besacier L (2004) Clips-lis-lsr-labri experiments at TRECVid 2004. In: TREC Video Retrieval Evaluation - 2004. NIST. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.4.org.html.NIST

  42. Renoust B, Le D-D, Satoh SI (2016) Visual analytics of political networks from face-tracking of news video. IEEE Trans Multimed 18(11):2184–2195

    Google Scholar 

  43. Salton G, Wong A, Yang C-S (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    MATH  Google Scholar 

  44. Shen J, Peng J, Shao L (2018) Submodular trajectories for better motion segmentation in videos. IEEE Trans Image Process 27(6):2688–2700

    Google Scholar 

  45. Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: Seven years of TRECVid activity. Comput Vis Image Underst 114(4):411–418

    Google Scholar 

  46. Smeaton AF, Over P, KraaijW(2006) Evaluation campaigns and TRECVid. In: InternationalWorkshop on Multimedia Information Retrieval. ACM, pp 321–330

  47. Smola AJ, Vishwanathan S (2003) Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems, pp 585–592

  48. Su X, Lan Y,Wan R, Qin Y (2009) A fast incremental clustering algorithm. In: International Symposium on Information Processing, pp 175–178

  49. Trojahn TH, Goularte R (2021) Temporal video scene segmentation using deep-learning. Multimed Tools Appl:1–27

  50. Volkmer T, Tahahoghi SMM, Williams HE (2004) RMIT university at TRECVid 2004. In: TREC Video Retrieval Evaluation - 2004. NIST. http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.4.org.html.NIST

  51. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3395–3402

  52. Wang W, Shen J, Porikli F, Yang R (2018) Semi-supervised video object segmentation with supertrajectories. IEEE Transactions on Pattern Analysis and Machine Intelligence

  53. Wang W, Shen J, Shao L (2018) Video salient object detection via fully convolutional networks. IEEE Trans Image Process 27(1):38–49

    Google Scholar 

  54. Wang W, Shen J, Yang R, Porikli F (2018) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell (1) 20–33

  55. Wang X, Zheng S, Zhang C, Li R, Gui L (2021) R-yolo: A real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3):888

    Article  Google Scholar 

  56. Wikipedia(2016)Dayparting—Wikipedia,the free encyclopedia.https://en.wikipedia.org/wiki/Dayparting,[Online;accessedJanuary-2017]

  57. Wu J, Kuang Z, Wang L, Zhang W, Wu G (2020) Context-aware rcnn: A baseline for action detection in videos. In: European Conference on Computer Vision. Springer, pp 440–456

  58. Wu X, Satoh S (2013) Ultrahigh-speed TV commercial detection, extraction and matching. IEEE Trans Circ Syst Video Technol 23(6):1054–1069

    Google Scholar 

  59. Xu S, Feng B, Chen Z, Xu B (2013) A general framework of video segmentation to logical unit based on conditional random fields. In: International conference on multimedia retrieval. ACM, pp 247–254

  60. Xu Z, Hu J, Deng W (2016) Recurrent convolutional neural network for video classification. In: IEEE International Conference on Multimedia and Expo. IEEE, pp 1–6

  61. X.Wang, Z.Guo (2008) A novel real-time commercial detection scheme. In: International Conference on Innovative Computing Information and Control, pp 536–536

  62. Zhang L, Zhu Z, Zhao Y (2007) Robust commercial detection system. In: International Conference on Multimedia and Expo. IEEE, pp 587–590

  63. Zhou H,Hermans T,Karandikar A V,Rehg J M(2010)Movie genre classification via scene categorization.In: International conference on Multimedia. ACM,pp747–750

  64. Zlitni T, Bouaziz B, Mahdi W (2015) Automatic topics segmentation for TV news video using prior knowledge. Multimed Tools Appl:1–28

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raghvendra Kannao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kannao, R., Guha, P. & Chaudhuri, B. Only overlay text: novel features for TV news broadcast video segmentation. Multimed Tools Appl 81, 30493–30517 (2022). https://doi.org/10.1007/s11042-022-12917-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12917-w

Keywords

Navigation