WO2013054348A2 - A method and system for differentiating textual information embedded in streaming news video - Google Patents
A method and system for differentiating textual information embedded in streaming news video Download PDFInfo
- Publication number
- WO2013054348A2 WO2013054348A2 PCT/IN2012/000504 IN2012000504W WO2013054348A2 WO 2013054348 A2 WO2013054348 A2 WO 2013054348A2 IN 2012000504 W IN2012000504 W IN 2012000504W WO 2013054348 A2 WO2013054348 A2 WO 2013054348A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- characters
- news video
- streaming
- information embedded
- textual information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 3
- 238000013459 approach Methods 0.000 description 22
- 230000015654 memory Effects 0.000 description 10
- 238000003058 natural language processing Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000004069 differentiation Effects 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
- G06V30/1448—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields based on markings or identifiers characterising the document or the area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/635—Overlay text, e.g. embedded captions in a TV program
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/268—Lexical context
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/27—Server based end-user applications
- H04N21/278—Content descriptor database or directory service for end-user access
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440236—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4886—Data services, e.g. news ticker for displaying a ticker, e.g. scrolling banner for news, stock exchange, weather data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/09—Recognition of logos
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present application relates to broadcasting and telecommunications. Particularly, the application relates to a statistical approach for differentiating textual information embedded in a streaming news video. More particularly the application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
- one major challenge of the day is to extract the context from the video.
- One method of extracting the context is to recognize the text embedded on the video.
- Video optical character recognition is a method to recognize the text from the video.
- a typical streaming news video may contain a combination of textual region, video of the news reader or the regions showing videos and images of the event the anchor is speaking about.
- the textual regions may be further classified in various groups, such as breaking news, ticker news or the details about the breaking news, channel name, date and time of the program, stock updates/ ticker etc.
- the existing methods and systems are not capable of providing a light weight approach for differentiating the textual information embedded in a streaming news video.
- the existing methods and systems particularly are not capable of providing a light weight approach for classifying the texts of streaming news video without any language model or natural language processing (NLP) based approach.
- NLP natural language processing
- US2009100454A by Weber et al. teaches about the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms, based on natural language processing (NLP) approach.
- Weber et al. describes a method for news video summarization. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
- US2002152245A by McCaskey et al. teaches about an apparatus and method for receiving daily data feeds of news article text and news images, particularly web publications of news paper content.
- the patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
- Luo et al. in "Semantic Entity-Relationship Model for Large-Scale Multimedia News Exploration and Recommendation" teaches about a novel framework for multimedia news exploration and analysis, particularly web publishing of news. Luo et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
- Kankanhalli et al. in "Video modeling using strata-based annotation" aims to achieve efficient browsing and retrieval.
- Kankanhalli et al. focuses on segmenting the contextual information into chunks rather than dividing physically contiguous frames into shots, as is traditionally done.
- Kankanhalli et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
- Bouaziz et al. in "A New Video Images Text Localization Approach Based on a Fast Hough Transform" teaches about a fast Hough transformation based approach for automatic video frames text localization.
- Bouaziz et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
- the above mentioned prior arts fail to disclose an efficient method and system for textual information differentiation embedded in a streaming news video.
- the prior art also fail to disclose about a method and system for differentiating textual information embedded in a streaming news video which could simplify the indexing and facilitate the annotation of the said news video.
- the primary objective of the present application is to provide a method and system for differentiating textual information embedded in a streaming news video.
- Another objective of the application is to enable a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
- Another objective of the application is to provide a method and system for computing the frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video.
- Another objective of the application is to provide a method and system for computing the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video.
- the present application provides a method and system for differentiating textual information embedded in a streaming news video.
- a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
- the frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video is computed.
- the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video is computed.
- the textual information may include breaking news, ticker news or the details about the breaking news, channel name and date and time of the show.
- the above said method and system are preferably a method and system for differentiating textual information embedded in a streaming news video but also can be used for many other applications, which may be obvious to a person skilled in the art.
- Figure 1 shows prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
- Figure 2 shows flow diagram of the process for differentiating textual information embedded in a streaming news video.
- the present application provides a method for differentiating textual information embedded in at least one streaming news video, characterized by simplified indexing and annotation of the said streaming news video, the method comprising processor implemented steps of: a. computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video;
- the present application provides a system for differentiating textual information embedded in at least one streaming news video, the system comprising of: a. at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters; and
- At least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
- FIG. 1 is a prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
- the process starts at the step 102, the text containing regions in the streaming video are obtained using preprocessing of the streaming news video.
- the channel identification information is obtained using channel logo detection.
- the channel logo is segregated from the remaining information embedded in the said streaming news video.
- the optical character recognition technique is applied on each segregated textual segments the said streaming news video.
- FIG. 2 is a flow diagram of the process for differentiating textual information embedded in a streaming news video.
- the process starts at the step 202, the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video is computed.
- the ratio of the frequency of occurrence of the said characters is computed.
- the process ends at the step 206, a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters is defined for differentiating the textual information embedded in the said streaming news video.
- a method and system for differentiating textual information embedded in a streaming news video.
- the method is characterized by simplified indexing and annotation of the said streaming news video.
- the identification information of the channel streaming the news video is obtained by channel logo detection techniques available in prior art.
- the text containing regions are also to be identified.
- the text containing regions in the streaming video are obtained using preprocessing of the said streaming news video, wherein the detected channel logo is segregated from the remaining information embedded in the said streaming news video.
- the remaining information embedded in the said streaming news video may contain breaking news, news text, stock update or date and time of the said streaming news video.
- the frequency of occurrence of optically recognized characters in the textual information is computed.
- the said characters embedded in said streaming news video are selected from the group comprising of upper case characters, lower case characters, special character or numerical characters.
- the textual information is selected from the group comprising of breaking news, ticker news or the details about the breaking news, channel name and date and time of the show.
- the ratio of the frequency of occurrence of the said characters is computed and a set of rules is defined to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
- the set of rules are defined by adding at least one tolerance factor to the said thresholds and the said tolerance factor is obtained from the standard deviation of the observed statistics.
- the threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus in the Table 1. According to the Table 1, the textual information embedded in the said streaming news video is differentiated as breaking news if the frequency of occurrence of the upper case characters is greater than 90%.
- Textual information embedded in the said streaming news video is differentiated as date and time information if the frequency of occurrence of the numerical characters is greater than 50% but the ratio of numerical characters and upper case characters is greater than 3 times.
- Textual information embedded in the said streaming news video is differentiated as Stock update if the frequency of occurrence of the upper case and lower case characters are greater than 40% and the ratio of numerical characters and upper case characters is lying near 1 with a range of 0.2 variations.
- Textual information embedded in the said streaming news video is differentiated as news details if the frequency of occurrence of the lower case characters is greater than 60%.
- Table 1 A threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus.
- the date, time and channel identification information is further used as a time stamp for indexing of the said streaming news video and furthermore they are being used to fetch additional related more information from internet for indexing of the said streaming news video.
- the system for differentiating textual information embedded in at least one streaming news video comprising of at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters, and at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
- the methodology and techniques described with respect to the exemplary embodiments can be performed using a machine or other computing device within which a set of instructions, when executed, may cause the machine to perform any one or more of the methodologies discussed above.
- the machine operates as a standalone device.
- the machine may be connected (e.g., using a network) to other machines.
- the machine may operate in the capacity of a server or a client user machine in a server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
- the machine may comprise a server computer, a client user computer, a personal computer (PC), a tablet PC, a laptop computer, a desktop computer, a control system, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- tablet PC tablet PC
- laptop computer a laptop computer
- desktop computer a control system
- network router, switch or bridge any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- the term "machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
- the machine may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU, or both), a main memory and a static memory, which communicate with each other via a bus.
- the machine may further include a video display unit (e.g., a liquid crystal displays (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)).
- the machine may include an input device (e.g., a keyboard) or touch-sensitive screen, a cursor control device (e.g., a mouse), a disk drive unit, a signal generation device (e.g., a speaker or remote control) and a network interface device.
- the disk drive unit may include a machine-readable medium on which is stored one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein, including those methods illustrated above.
- the instructions may also reside, completely or at least partially, within the main memory, the static memory, and/or within the processor during execution thereof by the machine.
- the main memory and the processor also may constitute machine-readable media.
- Dedicated hardware implementations including, but not limited to, application specific integrated circuits, programmable logic arrays and other hardware devices can likewise be constructed to implement the methods described herein.
- Applications that may include the apparatus and systems of various embodiments broadly include a variety of electronic and computer systems. Some embodiments implement functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit.
- the example system is applicable to software, firmware, and hardware implementations.
- the methods described herein are intended for operation as software programs running on a computer processor.
- software implementations can include, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the methods described herein.
- the present disclosure contemplates a machine readable medium containing instructions, or that which receives and executes instructions from a propagated signal so that a device connected to a network environment can send or receive voice, video or data, and to communicate over the network using the instructions.
- the instructions may further be transmitted or received over a network via the network interface device.
- machine-readable medium can be a single medium
- machine-readable medium should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
- machine-readable medium shall accordingly be taken to include, but not be limited to: tangible media; solid-state memories such as a memory card or other package that houses one or more read-only (non-volatile) memories, random access memories, or other re-writable (volatile) memories; magneto- optical or optical medium such as a disk or tape; non-transitory mediums or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a machine-readable medium or a distribution medium, as listed herein and including art-recognized equivalents and successor media, in which the software implementations herein are stored.
- the method provided by the present invention is robust as the threshold is computed statistically.
- the tolerance factor is computed using standard deviation and thus the scope of false classification is also very less.
- the method is light weight for classifying the texts of news video without any language model or natural language processing (NLP) based approach.
- NLP natural language processing
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Systems (AREA)
Abstract
The application provides a method and system for differentiating textual information embedded in a streaming news video. The application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
Description
A METHOD AND SYSTEM FOR DIFFERENTIATING TEXTUAL INFORMATION EMBEDDED IN STREAMING NEWS VIDEO
FIELD OF THE APPLICATION
The present application relates to broadcasting and telecommunications. Particularly, the application relates to a statistical approach for differentiating textual information embedded in a streaming news video. More particularly the application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
BACKGROUND OF THE APPLICATION
In the broadcasting and telecommunication technology domain, one major challenge of the day is to extract the context from the video. One method of extracting the context is to recognize the text embedded on the video. Video optical character recognition is a method to recognize the text from the video.
In the current scenario, lots of efforts have been made ' to develop various approaches to solve the said problem, of context recognition. It has a huge application in the problem of automatic video indexing, too. For automatic video indexing or annotation, one required step is to classify the texts embedded within the video. This problem is bigger in case of news video. Existing video text classification methods have addressed the problem using natural language processing (NLP) based approach to differentiate the different segments of a news video.
Extracting the contextual information is still a challenging task because of the variety of content embedded in a video including video, image text etc. A typical streaming news video may contain a combination of textual region, video of the
news reader or the regions showing videos and images of the event the anchor is speaking about. The textual regions may be further classified in various groups, such as breaking news, ticker news or the details about the breaking news, channel name, date and time of the program, stock updates/ ticker etc.
In order to achieve an accurate differentiation of textual information embedded in streaming news video, a light weight method and system is required which could simplify the indexing and facilitate the annotation of the said news video with light resource (memory and CPU) requirement.
However, the existing methods and systems are not capable of providing a light weight approach for differentiating the textual information embedded in a streaming news video. The existing methods and systems particularly are not capable of providing a light weight approach for classifying the texts of streaming news video without any language model or natural language processing (NLP) based approach.
The existing methods and systems particularly are not capable of differentiating textual information embedded in a streaming news video which could simplify the indexing and facilitate the annotation of the said news video. Some of above mentioned methods known to us are as follows:
US5950196A to Pyreddy et al. teaches about extracting the information from printed news papers/online version of the news paper. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
US2009100454A by Weber et al. teaches about the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms, based on natural language processing (NLP) approach. Weber et al. describes a method for news video summarization. The patent does not teach about a statistical approach
for extracting and differentiating textual information embedded in a streaming news video.
US2008077708A by Scott et al. teaches about techniques that enable automated processing of news content according to the user preference. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
US2002152245A by McCaskey et al. teaches about an apparatus and method for receiving daily data feeds of news article text and news images, particularly web publications of news paper content. The patent does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Luo et al. in "Semantic Entity-Relationship Model for Large-Scale Multimedia News Exploration and Recommendation" teaches about a novel framework for multimedia news exploration and analysis, particularly web publishing of news. Luo et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Kankanhalli et al. in "Video modeling using strata-based annotation" aims to achieve efficient browsing and retrieval. Kankanhalli et al. focuses on segmenting the contextual information into chunks rather than dividing physically contiguous frames into shots, as is traditionally done. Kankanhalli et al. does not teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Bouaziz et al. in "A New Video Images Text Localization Approach Based on a Fast Hough Transform" teaches about a fast Hough transformation based approach for automatic video frames text localization. Bouaziz et al. does not
teach about a statistical approach for extracting and differentiating textual information embedded in a streaming news video.
Ziegler et al. in "Content Extraction from News Pages Using Particle Swarm Optimization on Linguistic and Structural Features" teaches about a novel approach that extracts real content from news Web pages in an unsupervised fashion, using particle swarm optimization on linguistic and structural features.
The above mentioned prior arts fail to disclose an efficient method and system for textual information differentiation embedded in a streaming news video. The prior art also fail to disclose about a method and system for differentiating textual information embedded in a streaming news video which could simplify the indexing and facilitate the annotation of the said news video.
Thus, in the light of the above mentioned background art, it is evident that, there is a long felt need for such a solution that can provide an effective method and system for differentiating textual information embedded in a streaming news video. There is also a need for such a solution that enables a cost effective method and system which could simplify the indexing and facilitate the annotation of the said news video.
OBJECTIVES OF THE APPLICATION
The primary objective of the present application is to provide a method and system for differentiating textual information embedded in a streaming news video.
Another objective of the application is to enable a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video.
Another objective of the application is to provide a method and system for computing the frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video.
Another objective of the application is to provide a method and system for computing the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of the textual information embedded in a news video.
SUMMARY OF THE APPLICATION
Before the present methods, systems, and hardware enablement are described, it is to be understood that this application in not limited to the particular systems, and methodologies described, as there can be multiple possible embodiments of the present application which are not expressly illustrated in the present disclosure. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present application which will be limited only by the appended claims.
The present application provides a method and system for differentiating textual information embedded in a streaming news video.
In one aspect of the application a method and system is provided for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video. The frequency of occurrence of characters in upper and lower case, special character and numerical character in the textual information embedded in a streaming news video is computed. Further, the ratio of the said characters in upper and lower case, special character and numerical character for threshold based differentiation of
the textual information embedded in a news video is computed. Thus the statistical approach differentiates textual information embedded in a streaming news video. The textual information may include breaking news, ticker news or the details about the breaking news, channel name and date and time of the show.
The above said method and system are preferably a method and system for differentiating textual information embedded in a streaming news video but also can be used for many other applications, which may be obvious to a person skilled in the art.
BRIEF DESCRIPTION OF DRAWINGS
The foregoing summary, as well as the following detailed description of preferred embodiments, are better understood when read in conjunction with the appended drawings. For the purpose of illustrating the application, there is shown in the drawings exemplary constructions of the application; however, the application is not limited to the specific methods and system disclosed. In the drawings:
Figure 1 shows prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
Figure 2 shows flow diagram of the process for differentiating textual information embedded in a streaming news video.
DETAILED DESCRIPTION OF THE APPLICATION
Some embodiments of this application, illustrating all its features, will now be discussed in detail.
The words "comprising," "having," "containing," and "including," and other forms thereof, are intended to be equivalent in meaning and be open ended in that
an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items.
It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Although any systems and methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present application, the preferred, systems and methods are now described.
The disclosed embodiments are merely exemplary of the application, which may be embodied in various forms.
The present application provides a method for differentiating textual information embedded in at least one streaming news video, characterized by simplified indexing and annotation of the said streaming news video, the method comprising processor implemented steps of: a. computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video;
b. computing the ratio of the frequency of occurrence of the said characters; and
c. defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
The present application provides a system for differentiating textual information embedded in at least one streaming news video, the system comprising of: a. at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said
streaming news video and the ratio of the frequency of occurrence of the said characters; and
b. at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
Referring to Figure 1 is a prior art flow diagram of the preprocessing of textual information embedded in a streaming news video.
The process starts at the step 102, the text containing regions in the streaming video are obtained using preprocessing of the streaming news video. At the step 104, the channel identification information is obtained using channel logo detection. At the step 106, the channel logo is segregated from the remaining information embedded in the said streaming news video. The process ends at the step 108, the optical character recognition technique is applied on each segregated textual segments the said streaming news video.
Referring to Figure 2 is a flow diagram of the process for differentiating textual information embedded in a streaming news video.
The process starts at the step 202, the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video is computed. At the step 204, the ratio of the frequency of occurrence of the said characters is computed. The process ends at the step 206, a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters is defined for differentiating the textual information embedded in the said streaming news video.
In one of the embodiment of the present application, a method and system is provided for differentiating textual information embedded in a streaming news
video. The method is characterized by simplified indexing and annotation of the said streaming news video. The identification information of the channel streaming the news video is obtained by channel logo detection techniques available in prior art. The text containing regions are also to be identified. The text containing regions in the streaming video are obtained using preprocessing of the said streaming news video, wherein the detected channel logo is segregated from the remaining information embedded in the said streaming news video. The remaining information embedded in the said streaming news video may contain breaking news, news text, stock update or date and time of the said streaming news video. After obtaining the text containing regions in the streaming video, the channel identification information, segregating the said information from the remaining information, the optical character recognition technique is applied on each segregated textual segments the said streaming news video.
In one of the embodiment of the present application, the frequency of occurrence of optically recognized characters in the textual information is computed. The said characters embedded in said streaming news video are selected from the group comprising of upper case characters, lower case characters, special character or numerical characters. The textual information is selected from the group comprising of breaking news, ticker news or the details about the breaking news, channel name and date and time of the show. Further, the ratio of the frequency of occurrence of the said characters is computed and a set of rules is defined to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video. The set of rules are defined by adding at least one tolerance factor to the said thresholds and the said tolerance factor is obtained from the standard deviation of the observed statistics. The threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus in the Table 1.
According to the Table 1, the textual information embedded in the said streaming news video is differentiated as breaking news if the frequency of occurrence of the upper case characters is greater than 90%.
Textual information embedded in the said streaming news video is differentiated as date and time information if the frequency of occurrence of the numerical characters is greater than 50% but the ratio of numerical characters and upper case characters is greater than 3 times.
Textual information embedded in the said streaming news video is differentiated as Stock update if the frequency of occurrence of the upper case and lower case characters are greater than 40% and the ratio of numerical characters and upper case characters is lying near 1 with a range of 0.2 variations.
Textual information embedded in the said streaming news video is differentiated as news details if the frequency of occurrence of the lower case characters is greater than 60%.
Table 1: A threshold based approach is defined to differentiate the type of texts based on the statistical analysis on the news video corpus.
News Text 8 84 3 5 0.6
The date, time and channel identification information is further used as a time stamp for indexing of the said streaming news video and furthermore they are being used to fetch additional related more information from internet for indexing of the said streaming news video.
In an embodiment of the application, the system for differentiating textual information embedded in at least one streaming news video comprising of at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters, and at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
The methodology and techniques described with respect to the exemplary embodiments can be performed using a machine or other computing device within which a set of instructions, when executed, may cause the machine to perform any one or more of the methodologies discussed above. In some embodiments, the machine operates as a standalone device. In some embodiments, the machine may be connected (e.g., using a network) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client user machine in a server-client user network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.
The machine may comprise a server computer, a client user computer, a personal computer (PC), a tablet PC, a laptop computer, a desktop computer, a control system, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The machine may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU, or both), a main memory and a static memory, which communicate with each other via a bus. The machine may further include a video display unit (e.g., a liquid crystal displays (LCD), a flat panel, a solid state display, or a cathode ray tube (CRT)). The machine may include an input device (e.g., a keyboard) or touch-sensitive screen, a cursor control device (e.g., a mouse), a disk drive unit, a signal generation device (e.g., a speaker or remote control) and a network interface device.
The disk drive unit may include a machine-readable medium on which is stored one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein, including those methods illustrated above. The instructions may also reside, completely or at least partially, within the main memory, the static memory, and/or within the processor during execution thereof by the machine. The main memory and the processor also may constitute machine-readable media.
Dedicated hardware implementations including, but not limited to, application specific integrated circuits, programmable logic arrays and other hardware devices can likewise be constructed to implement the methods described herein. Applications that may include the apparatus and systems of various embodiments broadly include a variety of electronic and computer systems. Some
embodiments implement functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the example system is applicable to software, firmware, and hardware implementations.
In accordance with various embodiments of the present disclosure, the methods described herein are intended for operation as software programs running on a computer processor. Furthermore, software implementations can include, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the methods described herein.
The present disclosure contemplates a machine readable medium containing instructions, or that which receives and executes instructions from a propagated signal so that a device connected to a network environment can send or receive voice, video or data, and to communicate over the network using the instructions. The instructions may further be transmitted or received over a network via the network interface device.
While the machine-readable medium can be a single medium, the term "machine- readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "machine-readable medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure.
The term "machine-readable medium" shall accordingly be taken to include, but not be limited to: tangible media; solid-state memories such as a memory card or
other package that houses one or more read-only (non-volatile) memories, random access memories, or other re-writable (volatile) memories; magneto- optical or optical medium such as a disk or tape; non-transitory mediums or other self-contained information archive or set of archives is considered a distribution medium equivalent to a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a machine-readable medium or a distribution medium, as listed herein and including art-recognized equivalents and successor media, in which the software implementations herein are stored.
The illustrations of arrangements described herein are intended to provide a general understanding of the structure of various embodiments, and they are not intended to serve as a complete description of all the elements and features of apparatus and systems that might make use of the structures described herein. Many other arrangements will be apparent to those of skill in the art upon reviewing the above description. Other arrangements may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. Figures are also merely representational and may not be drawn to scale. Certain proportions thereof may be exaggerated, while others may be minimized. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
The preceding description has been presented with reference to various embodiments. Persons skilled in the art and technology to which this application pertains will appreciate that alterations and changes in the described structures and methods of operation can be practiced without meaningfully departing from the principle, spirit and scope.
ADVANTAGES OF THE INVENTION:
The method provided by the present invention is robust as the threshold is computed statistically.
The tolerance factor is computed using standard deviation and thus the scope of false classification is also very less.
The method is light weight for classifying the texts of news video without any language model or natural language processing (NLP) based approach.
The approach given in the application is based on the statistical analysis of the corpus.
Claims
A method for differentiating textual information embedded in at least one streaming news video, characterized by simplified indexing and annotation of the said streaming news video, the method comprising processor implemented steps of: a. computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video; b. computing the ratio of the frequency of occurrence of the said characters; and
c. defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
The method as claimed in claim 1, wherein the said characters are selected from the group comprising of upper case characters, lower case characters, special characters or numerical characters.
The method as claimed in claim 1, wherein the textual information is selected from the group comprising of breaking news, ticker news, details about the breaking news, channel name and date or time of the show. The method as claimed in claim 1, wherein the said set of rules are defined by adding at least one tolerance factor to the said thresholds for classifying the textual information embedded in the said streaming news video.
The method as claimed in claim 4, wherein the said tolerance factor is obtained from the standard deviation of the observed statistics.
The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as breaking news if the frequency of occurrence of the upper case characters is greater than 90%.
7. The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as date and time information if the frequency of occurrence of the numerical characters is greater than 50% but the ratio of numerical characters and upper case characters is greater than 3 times.
8. The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as Stock update if the frequency of occurrence of the upper case and lower case characters are greater than 40% and the ratio of numerical characters and upper case characters is lying near 1 with a range of 0.2 variation.
9. The method as claimed in claim 1, wherein the said textual information embedded in the said streaming news video is differentiated as news details if the frequency of occurrence of the lower case characters is greater than 60%.
10. The method as claimed in claim 1 , wherein the date, time and channel identification information is used as a time stamp for indexing of the said streaming news video.
11. The method as claimed in claim 10, wherein the said date, time and channel identification information is used to fetch additional related more information from internet for indexing of the said streaming news video.
12. The method as claimed in claim 10, wherein the channel identification information is obtained by channel logo detection.
13. The method as claimed in claim 1, wherein the text containing regions in the streaming video are obtained using preprocessing of the said streaming news video, wherein the channel log is segregated from the remaining information embedded in the said streaming news video.
14. The method as claimed in claim 13, wherein the remaining information embedded in the said streaming news video is selected from the group , comprising of breaking news, news text, stock update or date and time of the said streaming news video.
15. A system for differentiating textual information embedded in at least one streaming news video, the system comprising of: a. at least one computing engine for computing the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video and the ratio of the frequency of occurrence of the said characters; and
b. at least one statistical engine for defining a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
16. The system of claim 10, wherein the differentiating textual information embedded in at least one streaming news video comprises utilizing the processor to: a. compute the frequency of occurrence of at least two characters in the textual information embedded in said streaming news video; b. compute the ratio of the frequency of occurrence of the said characters; and
c. define a set of rules to the thresholds of the computed ratio of the frequency of occurrence of the said characters for differentiating the textual information embedded in the said streaming news video.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP12840798.8A EP2734956A4 (en) | 2011-07-20 | 2012-07-18 | A method and system for differentiating textual information embedded in streaming news video |
US14/233,727 US20140163969A1 (en) | 2011-07-20 | 2012-07-18 | Method and system for differentiating textual information embedded in streaming news video |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IN2067MU2011 | 2011-07-20 | ||
IN2067/MUM/2011 | 2011-07-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2013054348A2 true WO2013054348A2 (en) | 2013-04-18 |
WO2013054348A3 WO2013054348A3 (en) | 2013-07-04 |
Family
ID=48082619
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IN2012/000504 WO2013054348A2 (en) | 2011-07-20 | 2012-07-18 | A method and system for differentiating textual information embedded in streaming news video |
Country Status (3)
Country | Link |
---|---|
US (1) | US20140163969A1 (en) |
EP (1) | EP2734956A4 (en) |
WO (1) | WO2013054348A2 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9384242B1 (en) * | 2013-03-14 | 2016-07-05 | Google Inc. | Discovery of news-related content |
CN106951137A (en) | 2017-03-02 | 2017-07-14 | 合网络技术(北京)有限公司 | The sorting technique and device of multimedia resource |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100246965A1 (en) | 2009-03-31 | 2010-09-30 | Microsoft Corporation | Tagging video using character recognition and propagation |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4610025A (en) * | 1984-06-22 | 1986-09-02 | Champollion Incorporated | Cryptographic analysis system |
US6157905A (en) * | 1997-12-11 | 2000-12-05 | Microsoft Corporation | Identifying language and character set of data representing text |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US20080313172A1 (en) * | 2004-12-03 | 2008-12-18 | King Martin T | Determining actions involving captured information and electronic content associated with rendered documents |
US20080091713A1 (en) * | 2006-10-16 | 2008-04-17 | Candelore Brant L | Capture of television metadata via OCR |
CN105045777A (en) * | 2007-08-01 | 2015-11-11 | 金格软件有限公司 | Automatic context sensitive language correction and enhancement using an internet corpus |
EP2332039A4 (en) * | 2008-08-11 | 2012-12-05 | Collective Inc | Method and system for classifying text |
US8320674B2 (en) * | 2008-09-03 | 2012-11-27 | Sony Corporation | Text localization for image and video OCR |
DE102009006857A1 (en) * | 2009-01-30 | 2010-08-19 | Living-E Ag | A method for automatically classifying a text by a computer system |
EP2471025B1 (en) * | 2009-12-31 | 2019-06-05 | Tata Consultancy Services Limited | A method and system for preprocessing the region of video containing text |
-
2012
- 2012-07-18 US US14/233,727 patent/US20140163969A1/en not_active Abandoned
- 2012-07-18 WO PCT/IN2012/000504 patent/WO2013054348A2/en active Application Filing
- 2012-07-18 EP EP12840798.8A patent/EP2734956A4/en not_active Ceased
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100246965A1 (en) | 2009-03-31 | 2010-09-30 | Microsoft Corporation | Tagging video using character recognition and propagation |
Non-Patent Citations (4)
Title |
---|
ARPAN PAL ET AL.: "Characters from Streaming Videos", CHARACTER RECOGNITION, INTECH, 1 August 2010 (2010-08-01), pages 21 - 42, XP007918711 |
BOUAZIZ ET AL., A NEW VIDEO IMAGES TEXT LOCALIZATION APPROACH BASED ON A FAST HOUGH TRANSFORM |
See also references of EP2734956A4 |
ZIEGLER ET AL., CONTENT EXTRACTION FROM NEWS PAGES USING PARTICLE SWARM OPTIMIZATION ON LINGUISTIC AND STRUCTURAL FEATURES |
Also Published As
Publication number | Publication date |
---|---|
EP2734956A4 (en) | 2014-12-31 |
WO2013054348A3 (en) | 2013-07-04 |
EP2734956A2 (en) | 2014-05-28 |
US20140163969A1 (en) | 2014-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2011326430B2 (en) | Learning tags for video annotation using latent subtags | |
US10303768B2 (en) | Exploiting multi-modal affect and semantics to assess the persuasiveness of a video | |
US12001474B2 (en) | Information determining method and apparatus, computer device, and storage medium | |
US9875222B2 (en) | Capturing and storing elements from a video presentation for later retrieval in response to queries | |
US20160147739A1 (en) | Apparatus and method for updating language analysis result | |
Siddiquie et al. | Exploiting multimodal affect and semantics to identify politically persuasive web videos | |
CN111274442B (en) | Method for determining video tag, server and storage medium | |
CN111314732A (en) | Method for determining video label, server and storage medium | |
Zhang et al. | Incorporating conditional random fields and active learning to improve sentiment identification | |
KR20190063352A (en) | Apparatus and method for clip connection of image contents by similarity analysis between clips | |
US9460231B2 (en) | System of generating new schema based on selective HTML elements | |
Ara et al. | Understanding customer sentiment: Lexical analysis of restaurant reviews | |
EP3340069A1 (en) | Automated characterization of scripted narratives | |
US9355099B2 (en) | System and method for detecting explicit multimedia content | |
Seker et al. | Author attribution on streaming data | |
US20140163969A1 (en) | Method and system for differentiating textual information embedded in streaming news video | |
Poornima et al. | Text preprocessing on extracted text from audio/video using R | |
Li et al. | Event detection on online videos using crowdsourced time-sync comment | |
CN111488450A (en) | Method and device for generating keyword library and electronic equipment | |
Kannao et al. | Only overlay text: novel features for TV news broadcast video segmentation | |
Tapu et al. | TV news retrieval based on story segmentation and concept association | |
CN113468377A (en) | Video and literature association and integration method | |
CN105335522B (en) | Resource aggregation method and device | |
CN111597386A (en) | Video acquisition method | |
Nagrale et al. | Document theme extraction using named-entity recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12840798 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14233727 Country of ref document: US Ref document number: 2012840798 Country of ref document: EP |