[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2013054348A2 - A method and system for differentiating textual information embedded in streaming news video - Google Patents

A method and system for differentiating textual information embedded in streaming news video Download PDF

Info

Publication number
WO2013054348A2
WO2013054348A2 PCT/IN2012/000504 IN2012000504W WO2013054348A2 WO 2013054348 A2 WO2013054348 A2 WO 2013054348A2 IN 2012000504 W IN2012000504 W IN 2012000504W WO 2013054348 A2 WO2013054348 A2 WO 2013054348A2
Authority
WO
WIPO (PCT)
Prior art keywords
characters
news video
streaming
information embedded
textual information
Prior art date
Application number
PCT/IN2012/000504
Other languages
French (fr)
Other versions
WO2013054348A3 (en
Inventor
Tanushyam Chattopadhyay
Original Assignee
Tata Consultancy Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Limited filed Critical Tata Consultancy Services Limited
Priority to EP12840798.8A priority Critical patent/EP2734956A4/en
Priority to US14/233,727 priority patent/US20140163969A1/en
Publication of WO2013054348A2 publication Critical patent/WO2013054348A2/en
Publication of WO2013054348A3 publication Critical patent/WO2013054348A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • G06V30/1448Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/268Lexical context
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • H04N21/278Content descriptor database or directory service for end-user access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4886Data services, e.g. news ticker for displaying a ticker, e.g. scrolling banner for news, stock exchange, weather data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/09Recognition of logos
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to broadcasting and telecommunications. Particularly, the application relates to a statistical approach for differentiating textual information embedded in a streaming news video. More particularly the application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
  • one major challenge of the day is to extract the context from the video.
  • One method of extracting the context is to recognize the text embedded on the video.
  • Video optical character recognition is a method to recognize the text from the video.
  • a typical streaming news video may contain a combination of textual region, video of the news reader or the regions showing videos and images of the event the anchor is speaking about.
  • the textual regions may be further classified in various groups, such as breaking news, ticker news or the details about the breaking news, channel name, date and time of the program, stock updates/ ticker etc.
  • the existing methods and systems are not capable of providing a light weight approach for differentiating the textual information embedded in a streaming news video.
  • the existing methods and systems particularly are not capable of providing a light weight approach for classifying the texts of streaming news video without any language model or natural language processing (NLP) based approach.
  • NLP natural language processing
  • US2009100454A by Weber et al. teaches about the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms, based on natural language processing (NLP) approach.
  • Weber et al. describes a method for news video summarization. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
  • US2002152245A by McCaskey et al. teaches about an apparatus and method for receiving daily data feeds of news article text and news images, particularly web publications of news paper content.
  • the patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
  • Luo et al. in "Semantic Entity-Relationship Model for Large-Scale Multimedia News Exploration and Recommendation" teaches about a novel framework for multimedia news exploration and analysis, particularly web publishing of news. Luo et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
  • Kankanhalli et al. in "Video modeling using strata-based annotation" aims to achieve efficient browsing and retrieval.
  • Kankanhalli et al. focuses on segmenting the contextual information into chunks rather than dividing physically contiguous frames into shots, as is traditionally done.
  • Kankanhalli et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
  • Bouaziz et al. in "A New Video Images Text Localization Approach Based on a Fast Hough Transform" teaches about a fast Hough transformation based approach for automatic video frames text localization.
  • Bouaziz et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
  • the above mentioned prior arts fail to disclose an efficient method and system for textual information differentiation embedded in a streaming news video.
  • the prior art also fail to disclose about a method and system for differentiating textual information embedded in a streaming news video which could simplify the indexing and facilitate the annotation of the said news video.
  • the primary objective of the present application is to provide a method and system for differentiating textual information embedded in a streaming news video.
  • Another objective of the application is to enable a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
  • Another objective of the application is to provide a method and system for computing the frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video.
  • Another objective of the application is to provide a method and system for computing the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video.
  • the present application provides a method and system for differentiating textual information embedded in a streaming news video.
  • a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
  • the frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video is computed.
  • the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video is computed.
  • the textual information may include breaking news, ticker news or the details about the breaking news, channel name and date and time of the show.
  • the above said method and system are preferably a method and system for differentiating textual information embedded in a streaming news video but also can be used for many other applications, which may be obvious to a person skilled in the art.
  • Figure 1 shows prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
  • Figure 2 shows flow diagram of the process for differentiating textual information embedded in a streaming news video.
  • the present application provides a method for differentiating textual information embedded in at least one streaming news video, characterized by simplified indexing and annotation of the said streaming news video, the method comprising processor implemented steps of: a. computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video;
  • the present application provides a system for differentiating textual information embedded in at least one streaming news video, the system comprising of: a. at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters; and
  • At least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
  • FIG. 1 is a prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
  • the process starts at the step 102, the text containing regions in the streaming video are obtained using preprocessing of the streaming news video.
  • the channel identification information is obtained using channel logo detection.
  • the channel logo is segregated from the remaining information embedded in the said streaming news video.
  • the optical character recognition technique is applied on each segregated textual segments the said streaming news video.
  • FIG. 2 is a flow diagram of the process for differentiating textual information embedded in a streaming news video.
  • the process starts at the step 202, the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video is computed.
  • the ratio of the frequency of occurrence of the said characters is computed.
  • the process ends at the step 206, a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters is defined for differentiating the textual information embedded in the said streaming news video.
  • a method and system for differentiating textual information embedded in a streaming news video.
  • the method is characterized by simplified indexing and annotation of the said streaming news video.
  • the identification information of the channel streaming the news video is obtained by channel logo detection techniques available in prior art.
  • the text containing regions are also to be identified.
  • the text containing regions in the streaming video are obtained using preprocessing of the said streaming news video, wherein the detected channel logo is segregated from the remaining information embedded in the said streaming news video.
  • the remaining information embedded in the said streaming news video may contain breaking news, news text, stock update or date and time of the said streaming news video.
  • the frequency of occurrence of optically recognized characters in the textual information is computed.
  • the said characters embedded in said streaming news video are selected from the group comprising of upper case characters, lower case characters, special character or numerical characters.
  • the textual information is selected from the group comprising of breaking news, ticker news or the details about the breaking news, channel name and date and time of the show.
  • the ratio of the frequency of occurrence of the said characters is computed and a set of rules is defined to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
  • the set of rules are defined by adding at least one tolerance factor to the said thresholds and the said tolerance factor is obtained from the standard deviation of the observed statistics.
  • the threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus in the Table 1. According to the Table 1, the textual information embedded in the said streaming news video is differentiated as breaking news if the frequency of occurrence of the upper case characters is greater than 90%.
  • Textual information embedded in the said streaming news video is differentiated as date and time information if the frequency of occurrence of the numerical characters is greater than 50% but the ratio of numerical characters and upper case characters is greater than 3 times.
  • Textual information embedded in the said streaming news video is differentiated as Stock update if the frequency of occurrence of the upper case and lower case characters are greater than 40% and the ratio of numerical characters and upper case characters is lying near 1 with a range of 0.2 variations.
  • Textual information embedded in the said streaming news video is differentiated as news details if the frequency of occurrence of the lower case characters is greater than 60%.
  • Table 1 A threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus.
  • the date, time and channel identification information is further used as a time stamp for indexing of the said streaming news video and furthermore they are being used to fetch additional related more information from internet for indexing of the said streaming news video.
  • the system for differentiating textual information embedded in at least one streaming news video comprising of at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters, and at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
  • the methodology and techniques described with respect to the exemplary embodiments can be performed using a machine or other computing device within which a set of instructions, when executed, may cause the machine to perform any one or more of the methodologies discussed above.
  • the machine operates as a standalone device.
  • the machine may be connected (e.g., using a network) to other machines.
  • the machine may operate in the capacity of a server or a client user machine in a server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
  • the machine may comprise a server computer, a client user computer, a personal computer (PC), a tablet PC, a laptop computer, a desktop computer, a control system, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • PC personal computer
  • tablet PC tablet PC
  • laptop computer a laptop computer
  • desktop computer a control system
  • network router, switch or bridge any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
  • the term "machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • the machine may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU, or both), a main memory and a static memory, which communicate with each other via a bus.
  • the machine may further include a video display unit (e.g., a liquid crystal displays (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)).
  • the machine may include an input device (e.g., a keyboard) or touch-sensitive screen, a cursor control device (e.g., a mouse), a disk drive unit, a signal generation device (e.g., a speaker or remote control) and a network interface device.
  • the disk drive unit may include a machine-readable medium on which is stored one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein, including those methods illustrated above.
  • the instructions may also reside, completely or at least partially, within the main memory, the static memory, and/or within the processor during execution thereof by the machine.
  • the main memory and the processor also may constitute machine-readable media.
  • Dedicated hardware implementations including, but not limited to, application specific integrated circuits, programmable logic arrays and other hardware devices can likewise be constructed to implement the methods described herein.
  • Applications that may include the apparatus and systems of various embodiments broadly include a variety of electronic and computer systems. Some embodiments implement functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit.
  • the example system is applicable to software, firmware, and hardware implementations.
  • the methods described herein are intended for operation as software programs running on a computer processor.
  • software implementations can include, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the methods described herein.
  • the present disclosure contemplates a machine readable medium containing instructions, or that which receives and executes instructions from a propagated signal so that a device connected to a network environment can send or receive voice, video or data, and to communicate over the network using the instructions.
  • the instructions may further be transmitted or received over a network via the network interface device.
  • machine-readable medium can be a single medium
  • machine-readable medium should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
  • the term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
  • machine-readable medium shall accordingly be taken to include, but not be limited to: tangible media; solid-state memories such as a memory card or other package that houses one or more read-only (non-volatile) memories, random access memories, or other re-writable (volatile) memories; magneto- optical or optical medium such as a disk or tape; non-transitory mediums or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a machine-readable medium or a distribution medium, as listed herein and including art-recognized equivalents and successor media, in which the software implementations herein are stored.
  • the method provided by the present invention is robust as the threshold is computed statistically.
  • the tolerance factor is computed using standard deviation and thus the scope of false classification is also very less.
  • the method is light weight for classifying the texts of news video without any language model or natural language processing (NLP) based approach.
  • NLP natural language processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Systems (AREA)

Abstract

The application provides a method and system for differentiating textual information embedded in a streaming news video. The application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.

Description

A METHOD AND SYSTEM FOR DIFFERENTIATING TEXTUAL INFORMATION EMBEDDED IN STREAMING NEWS VIDEO
FIELD OF THE APPLICATION
The present application relates to broadcasting and telecommunications. Particularly, the application relates to a statistical approach for differentiating textual information embedded in a streaming news video. More particularly the application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
BACKGROUND OF THE APPLICATION
In the broadcasting and telecommunication technology domain, one major challenge of the day is to extract the context from the video. One method of extracting the context is to recognize the text embedded on the video. Video optical character recognition is a method to recognize the text from the video.
In the current scenario, lots of efforts have been made ' to develop various approaches to solve the said problem, of context recognition. It has a huge application in the problem of automatic video indexing, too. For automatic video indexing or annotation, one required step is to classify the texts embedded within the video. This problem is bigger in case of news video. Existing video text classification methods have addressed the problem using natural language processing (NLP) based approach to differentiate the different segments of a news video.
Extracting the contextual information is still a challenging task because of the variety of content embedded in a video including video, image text etc. A typical streaming news video may contain a combination of textual region, video of the news reader or the regions showing videos and images of the event the anchor is speaking about. The textual regions may be further classified in various groups, such as breaking news, ticker news or the details about the breaking news, channel name, date and time of the program, stock updates/ ticker etc.
In order to achieve an accurate differentiation of textual information embedded in streaming news video, a light weight method and system is required which could simplify the indexing and facilitate the annotation of the said news video with light resource (memory and CPU) requirement.
However, the existing methods and systems are not capable of providing a light weight approach for differentiating the textual information embedded in a streaming news video. The existing methods and systems particularly are not capable of providing a light weight approach for classifying the texts of streaming news video without any language model or natural language processing (NLP) based approach.
The existing methods and systems particularly are not capable of differentiating textual information embedded in a streaming news video which could simplify the indexing and facilitate the annotation of the said news video. Some of above mentioned methods known to us are as follows:
US5950196A to Pyreddy et al. teaches about extracting the information from printed news papers/online version of the news paper. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
US2009100454A by Weber et al. teaches about the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms, based on natural language processing (NLP) approach. Weber et al. describes a method for news video summarization. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
US2008077708A by Scott et al. teaches about techniques that enable automated processing of news content according to the user preference. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
US2002152245A by McCaskey et al. teaches about an apparatus and method for receiving daily data feeds of news article text and news images, particularly web publications of news paper content. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Luo et al. in "Semantic Entity-Relationship Model for Large-Scale Multimedia News Exploration and Recommendation" teaches about a novel framework for multimedia news exploration and analysis, particularly web publishing of news. Luo et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Kankanhalli et al. in "Video modeling using strata-based annotation" aims to achieve efficient browsing and retrieval. Kankanhalli et al. focuses on segmenting the contextual information into chunks rather than dividing physically contiguous frames into shots, as is traditionally done. Kankanhalli et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Bouaziz et al. in "A New Video Images Text Localization Approach Based on a Fast Hough Transform" teaches about a fast Hough transformation based approach for automatic video frames text localization. Bouaziz et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Ziegler et al. in "Content Extraction from News Pages Using Particle Swarm Optimization on Linguistic and Structural Features" teaches about a novel approach that extracts real content from news Web pages in an unsupervised fashion, using particle swarm optimization on linguistic and structural features.
The above mentioned prior arts fail to disclose an efficient method and system for textual information differentiation embedded in a streaming news video. The prior art also fail to disclose about a method and system for differentiating textual information embedded in a streaming news video which could simplify the indexing and facilitate the annotation of the said news video.
Thus, in the light of the above mentioned background art, it is evident that, there is a long felt need for such a solution that can provide an effective method and system for differentiating textual information embedded in a streaming news video. There is also a need for such a solution that enables a cost effective method and system which could simplify the indexing and facilitate the annotation of the said news video.
OBJECTIVES OF THE APPLICATION
The primary objective of the present application is to provide a method and system for differentiating textual information embedded in a streaming news video.
Another objective of the application is to enable a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video. Another objective of the application is to provide a method and system for computing the frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video.
Another objective of the application is to provide a method and system for computing the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video.
SUMMARY OF THE APPLICATION
Before the present methods, systems, and hardware enablement are described, it is to be understood that this application in not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments of the present application which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application which will be limited only by the appended claims.
The present application provides a method and system for differentiating textual information embedded in a streaming news video.
In one aspect of the application a method and system is provided for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video. The frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video is computed. Further, the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video is computed. Thus the statistical approach differentiates textual information embedded in a streaming news video. The textual information may include breaking news, ticker news or the details about the breaking news, channel name and date and time of the show.
The above said method and system are preferably a method and system for differentiating textual information embedded in a streaming news video but also can be used for many other applications, which may be obvious to a person skilled in the art.
BRIEF DESCRIPTION OF DRAWINGS
The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the application, there is shown in the drawings exemplary constructions of the application; however, the application is not limited to the specific methods and system disclosed. In the drawings:
Figure 1 shows prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
Figure 2 shows flow diagram of the process for differentiating textual information embedded in a streaming news video.
DETAILED DESCRIPTION OF THE APPLICATION
Some embodiments of this application, illustrating all its features, will now be discussed in detail.
The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.
It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present application, the preferred, systems and methods are now described.
The disclosed embodiments are merely exemplary of the application, which may be embodied in various forms.
The present application provides a method for differentiating textual information embedded in at least one streaming news video, characterized by simplified indexing and annotation of the said streaming news video, the method comprising processor implemented steps of: a. computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video;
b. computing the ratio of the frequency of occurrence of the said characters; and
c. defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
The present application provides a system for differentiating textual information embedded in at least one streaming news video, the system comprising of: a. at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters; and
b. at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
Referring to Figure 1 is a prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
The process starts at the step 102, the text containing regions in the streaming video are obtained using preprocessing of the streaming news video. At the step 104, the channel identification information is obtained using channel logo detection. At the step 106, the channel logo is segregated from the remaining information embedded in the said streaming news video. The process ends at the step 108, the optical character recognition technique is applied on each segregated textual segments the said streaming news video.
Referring to Figure 2 is a flow diagram of the process for differentiating textual information embedded in a streaming news video.
The process starts at the step 202, the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video is computed. At the step 204, the ratio of the frequency of occurrence of the said characters is computed. The process ends at the step 206, a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters is defined for differentiating the textual information embedded in the said streaming news video.
In one of the embodiment of the present application, a method and system is provided for differentiating textual information embedded in a streaming news video. The method is characterized by simplified indexing and annotation of the said streaming news video. The identification information of the channel streaming the news video is obtained by channel logo detection techniques available in prior art. The text containing regions are also to be identified. The text containing regions in the streaming video are obtained using preprocessing of the said streaming news video, wherein the detected channel logo is segregated from the remaining information embedded in the said streaming news video. The remaining information embedded in the said streaming news video may contain breaking news, news text, stock update or date and time of the said streaming news video. After obtaining the text containing regions in the streaming video, the channel identification information, segregating the said information from the remaining information, the optical character recognition technique is applied on each segregated textual segments the said streaming news video.
In one of the embodiment of the present application, the frequency of occurrence of optically recognized characters in the textual information is computed. The said characters embedded in said streaming news video are selected from the group comprising of upper case characters, lower case characters, special character or numerical characters. The textual information is selected from the group comprising of breaking news, ticker news or the details about the breaking news, channel name and date and time of the show. Further, the ratio of the frequency of occurrence of the said characters is computed and a set of rules is defined to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video. The set of rules are defined by adding at least one tolerance factor to the said thresholds and the said tolerance factor is obtained from the standard deviation of the observed statistics. The threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus in the Table 1. According to the Table 1, the textual information embedded in the said streaming news video is differentiated as breaking news if the frequency of occurrence of the upper case characters is greater than 90%.
Textual information embedded in the said streaming news video is differentiated as date and time information if the frequency of occurrence of the numerical characters is greater than 50% but the ratio of numerical characters and upper case characters is greater than 3 times.
Textual information embedded in the said streaming news video is differentiated as Stock update if the frequency of occurrence of the upper case and lower case characters are greater than 40% and the ratio of numerical characters and upper case characters is lying near 1 with a range of 0.2 variations.
Textual information embedded in the said streaming news video is differentiated as news details if the frequency of occurrence of the lower case characters is greater than 60%.
Table 1: A threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus.
Figure imgf000011_0001
Stock 45 0 10 45 1
News Text 8 84 3 5 0.6
The date, time and channel identification information is further used as a time stamp for indexing of the said streaming news video and furthermore they are being used to fetch additional related more information from internet for indexing of the said streaming news video.
In an embodiment of the application, the system for differentiating textual information embedded in at least one streaming news video comprising of at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters, and at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
The methodology and techniques described with respect to the exemplary embodiments can be performed using a machine or other computing device within which a set of instructions, when executed, may cause the machine to perform any one or more of the methodologies discussed above. In some embodiments, the machine operates as a standalone device. In some embodiments, the machine may be connected (e.g., using a network) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client user machine in a server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may comprise a server computer, a client user computer, a personal computer (PC), a tablet PC, a laptop computer, a desktop computer, a control system, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The machine may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU, or both), a main memory and a static memory, which communicate with each other via a bus. The machine may further include a video display unit (e.g., a liquid crystal displays (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)). The machine may include an input device (e.g., a keyboard) or touch-sensitive screen, a cursor control device (e.g., a mouse), a disk drive unit, a signal generation device (e.g., a speaker or remote control) and a network interface device.
The disk drive unit may include a machine-readable medium on which is stored one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein, including those methods illustrated above. The instructions may also reside, completely or at least partially, within the main memory, the static memory, and/or within the processor during execution thereof by the machine. The main memory and the processor also may constitute machine-readable media.
Dedicated hardware implementations including, but not limited to, application specific integrated circuits, programmable logic arrays and other hardware devices can likewise be constructed to implement the methods described herein. Applications that may include the apparatus and systems of various embodiments broadly include a variety of electronic and computer systems. Some embodiments implement functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the example system is applicable to software, firmware, and hardware implementations.
In accordance with various embodiments of the present disclosure, the methods described herein are intended for operation as software programs running on a computer processor. Furthermore, software implementations can include, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the methods described herein.
The present disclosure contemplates a machine readable medium containing instructions, or that which receives and executes instructions from a propagated signal so that a device connected to a network environment can send or receive voice, video or data, and to communicate over the network using the instructions. The instructions may further be transmitted or received over a network via the network interface device.
While the machine-readable medium can be a single medium, the term "machine- readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "machine-readable medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
The term "machine-readable medium" shall accordingly be taken to include, but not be limited to: tangible media; solid-state memories such as a memory card or other package that houses one or more read-only (non-volatile) memories, random access memories, or other re-writable (volatile) memories; magneto- optical or optical medium such as a disk or tape; non-transitory mediums or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a machine-readable medium or a distribution medium, as listed herein and including art-recognized equivalents and successor media, in which the software implementations herein are stored.
The illustrations of arrangements described herein are intended to provide a general understanding of the structure of various embodiments, and they are not intended to serve as a complete description of all the elements and features of apparatus and systems that might make use of the structures described herein. Many other arrangements will be apparent to those of skill in the art upon reviewing the above description. Other arrangements may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. Figures are also merely representational and may not be drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
The preceding description has been presented with reference to various embodiments. Persons skilled in the art and technology to which this application pertains will appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle, spirit and scope. ADVANTAGES OF THE INVENTION:
The method provided by the present invention is robust as the threshold is computed statistically.
The tolerance factor is computed using standard deviation and thus the scope of false classification is also very less.
The method is light weight for classifying the texts of news video without any language model or natural language processing (NLP) based approach.
The approach given in the application is based on the statistical analysis of the corpus.

Claims

A method for differentiating textual information embedded in at least one streaming news video, characterized by simplified indexing and annotation of the said streaming news video, the method comprising processor implemented steps of: a. computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video; b. computing the ratio of the frequency of occurrence of the said characters; and
c. defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
The method as claimed in claim 1, wherein the said characters are selected from the group comprising of upper case characters, lower case characters, special characters or numerical characters.
The method as claimed in claim 1, wherein the textual information is selected from the group comprising of breaking news, ticker news, details about the breaking news, channel name and date or time of the show. The method as claimed in claim 1, wherein the said set of rules are defined by adding at least one tolerance factor to the said thresholds for classifying the textual information embedded in the said streaming news video.
The method as claimed in claim 4, wherein the said tolerance factor is obtained from the standard deviation of the observed statistics.
The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as breaking news if the frequency of occurrence of the upper case characters is greater than 90%.
7. The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as date and time information if the frequency of occurrence of the numerical characters is greater than 50% but the ratio of numerical characters and upper case characters is greater than 3 times.
8. The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as Stock update if the frequency of occurrence of the upper case and lower case characters are greater than 40% and the ratio of numerical characters and upper case characters is lying near 1 with a range of 0.2 variation.
9. The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as news details if the frequency of occurrence of the lower case characters is greater than 60%.
10. The method as claimed in claim 1 , wherein the date, time and channel identification information is used as a time stamp for indexing of the said streaming news video.
11. The method as claimed in claim 10, wherein the said date, time and channel identification information is used to fetch additional related more information from internet for indexing of the said streaming news video.
12. The method as claimed in claim 10, wherein the channel identification information is obtained by channel logo detection.
13. The method as claimed in claim 1, wherein the text containing regions in the streaming video are obtained using preprocessing of the said streaming news video, wherein the channel log is segregated from the remaining information embedded in the said streaming news video.
14. The method as claimed in claim 13, wherein the remaining information embedded in the said streaming news video is selected from the group , comprising of breaking news, news text, stock update or date and time of the said streaming news video.
15. A system for differentiating textual information embedded in at least one streaming news video, the system comprising of: a. at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters; and
b. at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
16. The system of claim 10, wherein the differentiating textual information embedded in at least one streaming news video comprises utilizing the processor to: a. compute the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video; b. compute the ratio of the frequency of occurrence of the said characters; and
c. define a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
PCT/IN2012/000504 2011-07-20 2012-07-18 A method and system for differentiating textual information embedded in streaming news video WO2013054348A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP12840798.8A EP2734956A4 (en) 2011-07-20 2012-07-18 A method and system for differentiating textual information embedded in streaming news video
US14/233,727 US20140163969A1 (en) 2011-07-20 2012-07-18 Method and system for differentiating textual information embedded in streaming news video

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2067MU2011 2011-07-20
IN2067/MUM/2011 2011-07-20

Publications (2)

Publication Number Publication Date
WO2013054348A2 true WO2013054348A2 (en) 2013-04-18
WO2013054348A3 WO2013054348A3 (en) 2013-07-04

Family

ID=48082619

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2012/000504 WO2013054348A2 (en) 2011-07-20 2012-07-18 A method and system for differentiating textual information embedded in streaming news video

Country Status (3)

Country Link
US (1) US20140163969A1 (en)
EP (1) EP2734956A4 (en)
WO (1) WO2013054348A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9384242B1 (en) * 2013-03-14 2016-07-05 Google Inc. Discovery of news-related content
CN106951137A (en) 2017-03-02 2017-07-14 合网络技术(北京)有限公司 The sorting technique and device of multimedia resource

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100246965A1 (en) 2009-03-31 2010-09-30 Microsoft Corporation Tagging video using character recognition and propagation

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4610025A (en) * 1984-06-22 1986-09-02 Champollion Incorporated Cryptographic analysis system
US6157905A (en) * 1997-12-11 2000-12-05 Microsoft Corporation Identifying language and character set of data representing text
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US20080313172A1 (en) * 2004-12-03 2008-12-18 King Martin T Determining actions involving captured information and electronic content associated with rendered documents
US20080091713A1 (en) * 2006-10-16 2008-04-17 Candelore Brant L Capture of television metadata via OCR
CN105045777A (en) * 2007-08-01 2015-11-11 金格软件有限公司 Automatic context sensitive language correction and enhancement using an internet corpus
EP2332039A4 (en) * 2008-08-11 2012-12-05 Collective Inc Method and system for classifying text
US8320674B2 (en) * 2008-09-03 2012-11-27 Sony Corporation Text localization for image and video OCR
DE102009006857A1 (en) * 2009-01-30 2010-08-19 Living-E Ag A method for automatically classifying a text by a computer system
EP2471025B1 (en) * 2009-12-31 2019-06-05 Tata Consultancy Services Limited A method and system for preprocessing the region of video containing text

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100246965A1 (en) 2009-03-31 2010-09-30 Microsoft Corporation Tagging video using character recognition and propagation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ARPAN PAL ET AL.: "Characters from Streaming Videos", CHARACTER RECOGNITION, INTECH, 1 August 2010 (2010-08-01), pages 21 - 42, XP007918711
BOUAZIZ ET AL., A NEW VIDEO IMAGES TEXT LOCALIZATION APPROACH BASED ON A FAST HOUGH TRANSFORM
See also references of EP2734956A4
ZIEGLER ET AL., CONTENT EXTRACTION FROM NEWS PAGES USING PARTICLE SWARM OPTIMIZATION ON LINGUISTIC AND STRUCTURAL FEATURES

Also Published As

Publication number Publication date
EP2734956A4 (en) 2014-12-31
WO2013054348A3 (en) 2013-07-04
EP2734956A2 (en) 2014-05-28
US20140163969A1 (en) 2014-06-12

Similar Documents

Publication Publication Date Title
AU2011326430B2 (en) Learning tags for video annotation using latent subtags
US10303768B2 (en) Exploiting multi-modal affect and semantics to assess the persuasiveness of a video
US12001474B2 (en) Information determining method and apparatus, computer device, and storage medium
US9875222B2 (en) Capturing and storing elements from a video presentation for later retrieval in response to queries
US20160147739A1 (en) Apparatus and method for updating language analysis result
Siddiquie et al. Exploiting multimodal affect and semantics to identify politically persuasive web videos
CN111274442B (en) Method for determining video tag, server and storage medium
CN111314732A (en) Method for determining video label, server and storage medium
Zhang et al. Incorporating conditional random fields and active learning to improve sentiment identification
KR20190063352A (en) Apparatus and method for clip connection of image contents by similarity analysis between clips
US9460231B2 (en) System of generating new schema based on selective HTML elements
Ara et al. Understanding customer sentiment: Lexical analysis of restaurant reviews
EP3340069A1 (en) Automated characterization of scripted narratives
US9355099B2 (en) System and method for detecting explicit multimedia content
Seker et al. Author attribution on streaming data
US20140163969A1 (en) Method and system for differentiating textual information embedded in streaming news video
Poornima et al. Text preprocessing on extracted text from audio/video using R
Li et al. Event detection on online videos using crowdsourced time-sync comment
CN111488450A (en) Method and device for generating keyword library and electronic equipment
Kannao et al. Only overlay text: novel features for TV news broadcast video segmentation
Tapu et al. TV news retrieval based on story segmentation and concept association
CN113468377A (en) Video and literature association and integration method
CN105335522B (en) Resource aggregation method and device
CN111597386A (en) Video acquisition method
Nagrale et al. Document theme extraction using named-entity recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12840798

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 14233727

Country of ref document: US

Ref document number: 2012840798

Country of ref document: EP