US20120039515A1 - Method and system for classifying scene for each person in video - Google Patents
Method and system for classifying scene for each person in video Download PDFInfo
- Publication number
- US20120039515A1 US20120039515A1 US13/317,509 US201113317509A US2012039515A1 US 20120039515 A1 US20120039515 A1 US 20120039515A1 US 201113317509 A US201113317509 A US 201113317509A US 2012039515 A1 US2012039515 A1 US 2012039515A1
- Authority
- US
- United States
- Prior art keywords
- person
- representation frame
- scene
- frame
- clustering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
- G06V40/173—Classification, e.g. identification face re-identification, e.g. recognising unknown faces across different face tracks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Definitions
- the present invention relates to a method and system for classifying a scene for each person in a video, and more particularly, to a method and system for classifying a scene for each person in a video based on person information and background information in video data.
- a scene is a unit between when video contents are changed.
- scenes are classified by using low level information such as color information or edge information.
- shots are clustered using low level information such as color information extracted in all frames, and a scene segmentation is detected in a conventional automatic scene segmentation algorithm.
- low level information such as color information extracted in all frames
- a scene segmentation is detected in a conventional automatic scene segmentation algorithm.
- a person in a video moves or a camera moves
- low level information changes. Accordingly, a degree of accuracy decreases.
- An aspect of the present invention provides a method and system for classifying a scene for each person in a video which may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
- An aspect of the present invention also provides a method and system for classifying a scene for each person in a video which may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
- a method of classifying a scene for each person in a video including: detecting a face within input video frames; detecting a shot change of the input video frames; extracting a person representation frame in the shot; performing a person clustering in the extracted person representation frame based on time information; detecting a scene change by separating a person portion from a background based on face extraction information, and comparing the person portion and the background; and merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
- a system for classifying a scene for each person in a video including: a face detection unit detecting a face within input video frames; a shot change detection unit detecting a shot change of the input video frames; a person representation frame extraction unit extracting a person representation frame in the shot; a person clustering unit performing a person clustering in the extracted person representation frame based on time information; a scene change detection unit detecting a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background; and a scene clustering unit merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
- FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention
- FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention
- FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention
- FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention
- FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated in FIG. 4 according to another embodiment of the present invention
- FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated in FIG. 4 according to another embodiment of the present invention.
- FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention.
- FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention.
- the system for classifying a scene for each person in a video 100 includes a face detection unit 110 , a shot change detection unit 120 , a person representation frame extraction unit 130 , a person clustering unit 140 , a scene change detection unit 150 , and a scene clustering unit 160 .
- the face detection unit 110 detects a face of input video frames. Specifically, the face detection unit 110 analyzes the input video frames, and detects the face of the input video frames.
- the shot change detection unit 120 detects a shot change within the input video frames. Specifically, the shot change detection unit 120 detects the shot change of the input video frames to segment the input video frames into a shot which is a basic unit of the video.
- the person representation frame extraction unit 130 extracts a person representation frame in the shot. Using all person frames for a person clustering is inefficient. Accordingly, the person representation frame extraction unit 130 extracts a frame which is closest to a center frame having a greatest similarity in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the person representation frame extraction unit 130 extracts the frame one by one in all clusters and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
- the person clustering unit 140 performs the person clustering in the extracted person representation frame based on time information.
- an algorithm for various poses or lightings may not be strict. Accordingly, the person clustering unit 140 performs the person clustering by using the time information to start clustering based on various forms of each person.
- a single person generally wears same clothes within a similar time period in same video data, and such clothes information has a clearer difference than face information. Accordingly, the person clustering unit 140 obtains various forms of the single person by using the clothes information.
- FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention.
- the size of clothes is determined in proportion to a size of a key person in the person representation frame 210 , 220 , 230 , 240 , and 250 in the shot.
- the person clustering unit 140 extracts clothes information from current cluster information, a current person representation frame, and a comparison person representation frame, i.e. a person representation frame to be compared.
- the person clustering unit 140 compares the current person representation frame and the comparison person representation frame, and determines whether the current person representation frame is similar to the comparison person representation frame as a result of the comparing.
- the person clustering unit 140 extends a time window when the current person representation frame is similar to the comparison person representation frame, and includes the person representation frame which has been currently compared in the current cluster information.
- the person clustering unit 140 sets a subsequent person representation frame as another comparison person representation frame on the time window.
- the person clustering unit 140 determines whether the current person representation frame and the comparison person representation frame are at an end of the time window, when the current person representation frame is different from the comparison person representation frame. The person clustering unit 140 sets the subsequent person representation frame in the time window as the other comparison person representation frame, when the current person representation frame and the comparison person representation frame are not at the end of the time window.
- a scene change detection unit 150 detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background. Specifically, the scene change detection unit 150 may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted.
- the scene change detection unit 150 receives current scene information, a current shot representation frame, and a comparison shot representation frame, and extracts background information from the current shot representation frame and the comparison shot representation frame.
- the scene change detection unit 150 compares the current shot representation frame and the comparison shot representation frame, and determines whether the current shot representation frame is similar to the comparison shot representation frame.
- the scene change detection unit 150 extends the time window when the current shot representation frame is similar to the comparison shot representation frame, and marks that the comparing of the current shot representation frame is completed.
- the scene change detection unit 150 assigns the comparison shot representation frame to the current shot representation frame, and assigns a subsequent shot representation frame in the time window to the comparison shot representation frame.
- the scene change detection unit 150 marks that the comparing of the current shot representation frame is completed, when the current shot representation frame is different from the comparison shot representation frame, and determines whether comparing all frames in the time window is completed.
- the scene change detection unit 150 assigns a subsequent shot representation frame where the comparing is incomplete to the current shot representation frame, and assigns the subsequent shot representation frame to the comparison shot representation frame, when the comparing is incomplete.
- a scene clustering unit 160 merges similar clusters from the extracted person representation frame and performs a scene clustering for each person. Specifically, the scene clustering unit 160 may perform the scene clustering for each person by comparing the person representation frame in the shot and merging the similar clusters according to the comparison, as illustrated in FIG. 3 .
- the scene clustering unit 160 receives time information-based clusters, and selects two clusters having a minimum difference value.
- the scene clustering unit 160 compares the minimum difference value and a threshold value, and merges the two clusters when the minimum difference value is less than the threshold value.
- the scene clustering detection unit 160 connects scenes including a person frame in a same cluster, when the minimum difference value is equal to or greater than the threshold value. A scene clustering method for each person is described in greater detail with reference to FIG. 3 .
- FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention.
- a scene clustering unit 160 compares a first person representation frame 310 and a second person representation frame 320 , and performs a first merge of similar clusters based on a result of the comparison.
- the scene clustering unit 160 compares a fifth person representation frame 350 and a sixth person representation frame 360 , and performs a second merge of similar clusters based on a result of the comparison.
- the scene clustering unit 160 compares a third person representation frame 330 and a seventh person representation frame 370 , and performs a third merge of similar clusters based on a result of the comparison.
- the scene clustering unit 160 compares the first merge and the second merge, and performs a fourth merge of similar clusters based on a result of the comparison.
- FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention.
- a system for classifying a scene for each person in a video detects a face within input video frames. Specifically, the system for classifying a scene for each person in a video analyzes the input video frames via a face detector and thereby may detect the face within the input video frames.
- the system for classifying a scene for each person in a video detects a shot change within the input video frames. Specifically, the system for classifying a scene for each person in a video detects the shot change within the input video frames to segment the input video frames into a shot which is a basic unit of the video.
- the system for classifying a scene for each person in a video extracts a person representation frame in the shot. Since using all person frames for a person clustering is inefficient, the system for classifying a scene for each person in a video extracts a frame which is closest to a center in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the system for classifying a scene for each person in a video extracts the frame one by one in all frames and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
- the system for classifying a scene for each person in a video performs the person clustering in the extracted person representation frame based on time information.
- an algorithm for various poses or lightings may not be strict.
- the system for classifying a scene for each person in a video performs the person clustering by using the time information to start clustering based on various forms of each person.
- a single person generally wears the same clothes within a similar time period in the same video data, and such clothes information has a clearer difference than face information.
- the system for classifying a scene for each person in a video obtains various forms of the single person by using the clothes information. An operation of a time information-based person clustering is described in greater detail with reference to FIG. 5 .
- FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated in FIG. 4 according to another embodiment of the present invention.
- the system for classifying a scene for each person in a video receives current cluster information, a current person representation frame, and a comparison person representation frame.
- the comparison person representation frame is a person representation frame to be compared.
- the system for classifying a scene for each person in a video extracts clothes information of each of the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video may extract the clothes information by referring to the location and size of the face from the face information as illustrated in FIG. 2 to reduce a time to extract clothes information.
- the system for classifying a scene for each person in a video compares the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video adds a comparison value of color information corresponding to the clothes information and a weight of a comparison value corresponding to the face information, when comparing.
- the system for classifying a scene for each person in a video determines whether the current person representation frame is similar to the comparison person representation frame, as a result of the comparing.
- the system for classifying a scene for each person in a video includes the comparison person representation frame which has been currently compared in the current cluster information. Specifically, the system for classifying a scene for each person in a video includes the comparison person representation frame, which has been compared with the current person representation frame, in the current cluster information.
- the system for classifying a scene for each person in a video sets a subsequent person representation frame in the time window T fw as other comparison person representation frame, and performs operation S 502 . Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent person representation frame in the time window T fw .
- the system for classifying a scene for each person in a video determines whether the current person representation frame and the comparison person representation frame are at an end of the time window T fw . Specifically, when the current person representation frame is different from the comparison person representation frame, the system for classifying a scene for each person in a video determines whether the all frames in the time window T fw are compared by using a result of the determining whether the current person representation frame and the comparison person representation frame are at the end of the time window T fw .
- operation S 510 when the current person representation frame and the comparison person representation frame are not at the end of the time window T fw , the system for classifying a scene for each person in a video sets the subsequent person representation frame as the comparison person representation frame, and performs operation S 502 , since the all person representation frames corresponding to the current cluster are not detected.
- the system for classifying a scene for each person in a video detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background.
- the system for classifying a scene for each person in a video may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted.
- a scene change detection operation is described in greater detail with reference to FIG. 6 .
- FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated in FIG. 4 according to another embodiment of the present invention.
- the system for classifying a scene for each person in a video receives current scene information, a current shot representation frame P f , and a comparison shot representation frame C f .
- the system for classifying a scene for each person in a video extracts background information of the current shot representation frame P f and the comparison shot representation frame C f .
- the background information is information about a pixel of another location excluding a face location and a clothes location.
- the system for classifying a scene for each person in a video compares the current shot representation frame P f and the comparison shot representation frame C f . Specifically, the system for classifying a scene for each person in a video adds the comparison value of the color information corresponding to the clothes information and the weight of the comparison value corresponding to the face information, when comparing. Also, when comparing the background information, a normalized color histogram, and a hue, saturation, value (HSV) are used.
- HSV hue, saturation, value
- the system for classifying a scene for each person in a video determines whether the current shot representation frame P f is similar to the comparison shot representation frame C f , as a result of the comparing.
- the system for classifying a scene for each person in a video extends a time window T sw . Specifically, the system for classifying a scene for each person in a video resets the time window T sw to extend a scene again, since a same scene is continued up to a point in time when the current shot representation frame P f is similar to the comparison shot representation frame C f .
- the system for classifying a scene for each person in a video marks that the comparing of the current shot representation frame P f is completed, and sets the comparison shot representation frame C f as the current shot representation frame P f .
- the system for classifying a scene for each person in a video sets a subsequent shot representation frame in the time window T sw as a comparison shot representation frame (*C f ?), and performs operation S 602 . Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent shot representation frame in the time window T sw .
- operation S 609 the system for classifying a scene for each person in a video determines whether comparing all frames in the time window T sw is completed.
- the system for classifying a scene for each person in a video determines a shot, which is examined last and determined to be a similar shot, as a last shot of a current scene, since all shots corresponding to the current scene are detected. Also, the system for classifying a scene for each person in a video performs a detection operation of a subsequent scene.
- the system for classifying a scene for each person in a video sets a subsequent shot representation frame where the comparing is incomplete as the current shot representation frame P f , and sets the subsequent shot representation frame as the comparison shot representation frame C f . Also, the system for classifying a scene for each person in a video performs operation S 602 .
- the system for classifying a scene for each person in a video merges similar clusters from the extracted person representation frame and performs the scene clustering for each person.
- the system for classifying a scene for each person in a video may perform the scene clustering by comparing and merging as illustrated in FIG. 3 . An operation of a scene clustering for each person is described in greater detail with reference to FIG. 7 .
- FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention.
- the system for classifying a scene for each person in a video receives time information-based clusters.
- the system for classifying a scene for each person in a video selects two clusters having a minimum difference value from difference values from among all clusters. Specifically, the difference values of all clusters may be compared using an average value of each cluster. Also, the minimum difference value may be used after comparing all objects of a corresponding cluster and all objects of a comparison cluster.
- the system for classifying a scene for each person in a video compares the minimum difference value and a threshold value and determines whether the minimum difference value is less than the threshold value.
- operation S 704 when the minimum difference value is less than the threshold value, the system for classifying a scene for each person in a video merges the two clusters, as illustrated in FIG. 3 , since the two clusters include a similar person. Also, the system for classifying a scene for each person in a video performs operation S 702 .
- the system for classifying a scene for each person in a video connects scenes including a person frame in a same cluster. Specifically, the system for classifying a scene for each person in a video determines that all clustering are completed when the minimum difference value is equal to or greater than the threshold value. Also, when connecting the scenes including a same person, the operation of a scene clustering for each person is completed. Each scene may be included in many clusters since various persons may exist in a single scene.
- the method and system for classifying a scene for each person in a video may be recorded in computer-readable media including program instructions to implement various operations embodied by a computer.
- the media may also include, alone or in combination with the program instructions, data files, data structures, and the like.
- Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVD; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like.
- the media may also be a transmission medium such as optical or metallic lines, wave guides, etc.
- program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
- the described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described exemplary embodiments of the present invention.
- a method and system for classifying a scene for each person in a video may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
- a method and system for classifying a scene for each person in a video may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
- a method and system for classifying a scene for each person in a video may replay for each person in video data, and thereby may enable a user to selectively view a scene including a person that the user likes.
- a method and system for classifying a scene for each person in a video may classify a person by a scene unit, which is a story unit in video data, and thereby may improve a scene classification accuracy and enable a scene-based navigation.
- a method and system for classifying a scene for each person in a video may perform a video data analysis more easily by improving a scene classification accuracy in video data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Described is a method of classifying a scene for each person in a video, the method including: detecting a face within input video frames; detecting a shot change of the input video frames; extracting a person representation frame in the shot; performing a person clustering in the extracted person representation frame based on time information; detecting a scene change by separating a person portion from a background based on face extraction information, and comparing the person portion and the background; and merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
Description
- This application is a continuation of application Ser. No. 11/882,733 filed on Aug. 3, 2007, which claims the priority of Korean Patent Application No. 10-2007-0000957, filed on Jan. 4, 2007, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference.
- 1. Field
- The present invention relates to a method and system for classifying a scene for each person in a video, and more particularly, to a method and system for classifying a scene for each person in a video based on person information and background information in video data.
- 2. Description of the Related Art
- Generally, a scene is a unit between when video contents are changed. In a conventional art, scenes are classified by using low level information such as color information or edge information.
- Specifically, shots are clustered using low level information such as color information extracted in all frames, and a scene segmentation is detected in a conventional automatic scene segmentation algorithm. However, when a person in a video moves or a camera moves, low level information changes. Accordingly, a degree of accuracy decreases.
- Also, persons in a video are clustered using face information, and thus the persons are classified in a conventional person classification method. However, face information changes depending on poses, lighting, and the like, which causes a low accuracy.
- Accordingly, a method and system for classifying a scene for each person in a video is required.
- An aspect of the present invention provides a method and system for classifying a scene for each person in a video which may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
- An aspect of the present invention also provides a method and system for classifying a scene for each person in a video which may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
- According to an aspect of the present invention, there is provided a method of classifying a scene for each person in a video, the method including: detecting a face within input video frames; detecting a shot change of the input video frames; extracting a person representation frame in the shot; performing a person clustering in the extracted person representation frame based on time information; detecting a scene change by separating a person portion from a background based on face extraction information, and comparing the person portion and the background; and merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
- According to another aspect of the present invention, there is provided a system for classifying a scene for each person in a video, the system including: a face detection unit detecting a face within input video frames; a shot change detection unit detecting a shot change of the input video frames; a person representation frame extraction unit extracting a person representation frame in the shot; a person clustering unit performing a person clustering in the extracted person representation frame based on time information; a scene change detection unit detecting a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background; and a scene clustering unit merging similar clusters from the extracted person representation frame and performing a scene clustering for each person.
- Additional aspects and/or advantages of the invention will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the invention.
- The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
- These and/or other aspects, features, and advantages of the invention will become apparent and more readily appreciated from the following description of exemplary embodiments, taken in conjunction with the accompanying drawings of which:
-
FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention; -
FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention; -
FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention; -
FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention; -
FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated inFIG. 4 according to another embodiment of the present invention; -
FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated inFIG. 4 according to another embodiment of the present invention; and -
FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention. - Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
-
FIG. 1 is a block diagram illustrating a configuration of a system for classifying a scene for each person in a video according to an embodiment of the present invention. - Referring to
FIG. 1 , the system for classifying a scene for each person in avideo 100 includes aface detection unit 110, a shotchange detection unit 120, a person representationframe extraction unit 130, aperson clustering unit 140, a scenechange detection unit 150, and ascene clustering unit 160. - The
face detection unit 110 detects a face of input video frames. Specifically, theface detection unit 110 analyzes the input video frames, and detects the face of the input video frames. - The shot
change detection unit 120 detects a shot change within the input video frames. Specifically, the shotchange detection unit 120 detects the shot change of the input video frames to segment the input video frames into a shot which is a basic unit of the video. - The person representation
frame extraction unit 130 extracts a person representation frame in the shot. Using all person frames for a person clustering is inefficient. Accordingly, the person representationframe extraction unit 130 extracts a frame which is closest to a center frame having a greatest similarity in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the person representationframe extraction unit 130 extracts the frame one by one in all clusters and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot. - The
person clustering unit 140 performs the person clustering in the extracted person representation frame based on time information. When simply performing a clustering based on all person representation frames, an algorithm for various poses or lightings may not be strict. Accordingly, theperson clustering unit 140 performs the person clustering by using the time information to start clustering based on various forms of each person. Specifically, as illustrated inFIG. 2 , a single person generally wears same clothes within a similar time period in same video data, and such clothes information has a clearer difference than face information. Accordingly, theperson clustering unit 140 obtains various forms of the single person by using the clothes information. -
FIG. 2 is a diagram illustrating an example of clothes information and face information detected in a same time window according to an embodiment of the present invention. - A location and size of a
face person representation frame clothes person representation frame FIG. 2 . The size of clothes is determined in proportion to a size of a key person in theperson representation frame - The
person clustering unit 140 extracts clothes information from current cluster information, a current person representation frame, and a comparison person representation frame, i.e. a person representation frame to be compared. Theperson clustering unit 140 compares the current person representation frame and the comparison person representation frame, and determines whether the current person representation frame is similar to the comparison person representation frame as a result of the comparing. Theperson clustering unit 140 extends a time window when the current person representation frame is similar to the comparison person representation frame, and includes the person representation frame which has been currently compared in the current cluster information. Theperson clustering unit 140 sets a subsequent person representation frame as another comparison person representation frame on the time window. Also, theperson clustering unit 140 determines whether the current person representation frame and the comparison person representation frame are at an end of the time window, when the current person representation frame is different from the comparison person representation frame. Theperson clustering unit 140 sets the subsequent person representation frame in the time window as the other comparison person representation frame, when the current person representation frame and the comparison person representation frame are not at the end of the time window. - A scene
change detection unit 150 detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background. Specifically, the scenechange detection unit 150 may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted. - The scene
change detection unit 150 receives current scene information, a current shot representation frame, and a comparison shot representation frame, and extracts background information from the current shot representation frame and the comparison shot representation frame. The scenechange detection unit 150 compares the current shot representation frame and the comparison shot representation frame, and determines whether the current shot representation frame is similar to the comparison shot representation frame. The scenechange detection unit 150 extends the time window when the current shot representation frame is similar to the comparison shot representation frame, and marks that the comparing of the current shot representation frame is completed. The scenechange detection unit 150 assigns the comparison shot representation frame to the current shot representation frame, and assigns a subsequent shot representation frame in the time window to the comparison shot representation frame. The scenechange detection unit 150 marks that the comparing of the current shot representation frame is completed, when the current shot representation frame is different from the comparison shot representation frame, and determines whether comparing all frames in the time window is completed. The scenechange detection unit 150 assigns a subsequent shot representation frame where the comparing is incomplete to the current shot representation frame, and assigns the subsequent shot representation frame to the comparison shot representation frame, when the comparing is incomplete. - A
scene clustering unit 160 merges similar clusters from the extracted person representation frame and performs a scene clustering for each person. Specifically, thescene clustering unit 160 may perform the scene clustering for each person by comparing the person representation frame in the shot and merging the similar clusters according to the comparison, as illustrated inFIG. 3 . - The
scene clustering unit 160 receives time information-based clusters, and selects two clusters having a minimum difference value. Thescene clustering unit 160 compares the minimum difference value and a threshold value, and merges the two clusters when the minimum difference value is less than the threshold value. The sceneclustering detection unit 160 connects scenes including a person frame in a same cluster, when the minimum difference value is equal to or greater than the threshold value. A scene clustering method for each person is described in greater detail with reference toFIG. 3 . -
FIG. 3 is a diagram illustrating an example of performing a clustering for each person according to an embodiment of the present invention. - In operation S1, a
scene clustering unit 160 compares a firstperson representation frame 310 and a secondperson representation frame 320, and performs a first merge of similar clusters based on a result of the comparison. In operation S2, thescene clustering unit 160 compares a fifthperson representation frame 350 and a sixthperson representation frame 360, and performs a second merge of similar clusters based on a result of the comparison. In operation S3, thescene clustering unit 160 compares a thirdperson representation frame 330 and a seventhperson representation frame 370, and performs a third merge of similar clusters based on a result of the comparison. In operation S4, thescene clustering unit 160 compares the first merge and the second merge, and performs a fourth merge of similar clusters based on a result of the comparison. -
FIG. 4 is a flowchart illustrating a method of classifying a scene for each person in a video according to another embodiment of the present invention. - Referring to
FIG. 4 , in operation S410, a system for classifying a scene for each person in a video detects a face within input video frames. Specifically, the system for classifying a scene for each person in a video analyzes the input video frames via a face detector and thereby may detect the face within the input video frames. - In operation S420, the system for classifying a scene for each person in a video detects a shot change within the input video frames. Specifically, the system for classifying a scene for each person in a video detects the shot change within the input video frames to segment the input video frames into a shot which is a basic unit of the video.
- In operation S430, the system for classifying a scene for each person in a video extracts a person representation frame in the shot. Since using all person frames for a person clustering is inefficient, the system for classifying a scene for each person in a video extracts a frame which is closest to a center in each cluster as the person representation frame, after performing a clustering of frames including a face in the shot. Specifically, the system for classifying a scene for each person in a video extracts the frame one by one in all frames and may set the frame as the person representation frame in the shot, since at least one person may be included in the shot.
- In operation S440, the system for classifying a scene for each person in a video performs the person clustering in the extracted person representation frame based on time information. When simply clustering based on all person representation frames, an algorithm for various poses or lightings may not be strict. Accordingly, the system for classifying a scene for each person in a video performs the person clustering by using the time information to start clustering based on various forms of each person. Specifically, a single person generally wears the same clothes within a similar time period in the same video data, and such clothes information has a clearer difference than face information. Accordingly, the system for classifying a scene for each person in a video obtains various forms of the single person by using the clothes information. An operation of a time information-based person clustering is described in greater detail with reference to
FIG. 5 . -
FIG. 5 is a flowchart illustrating an operation of a time information-based person clustering illustrated inFIG. 4 according to another embodiment of the present invention. - Referring to
FIG. 5 , in operation S501, the system for classifying a scene for each person in a video receives current cluster information, a current person representation frame, and a comparison person representation frame. The comparison person representation frame is a person representation frame to be compared. - In operation S502, the system for classifying a scene for each person in a video extracts clothes information of each of the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video may extract the clothes information by referring to the location and size of the face from the face information as illustrated in
FIG. 2 to reduce a time to extract clothes information. - In operation S503, the system for classifying a scene for each person in a video compares the current person representation frame and the comparison person representation frame. Specifically, the system for classifying a scene for each person in a video adds a comparison value of color information corresponding to the clothes information and a weight of a comparison value corresponding to the face information, when comparing.
- In operation S504, the system for classifying a scene for each person in a video determines whether the current person representation frame is similar to the comparison person representation frame, as a result of the comparing.
- In Operation S505, when the current person representation frame is similar to the comparison person representation frame, the system for classifying a scene for each person in a video extends a time window Tfw. Specifically, when the current person representation frame is similar to the comparison person representation frame, the system for classifying a scene for each person in a video resets the time window Tfw from a present point in time, since a same person exists up to the present point in time.
- In operation S506, the system for classifying a scene for each person in a video includes the comparison person representation frame which has been currently compared in the current cluster information. Specifically, the system for classifying a scene for each person in a video includes the comparison person representation frame, which has been compared with the current person representation frame, in the current cluster information.
- In operation S507, the system for classifying a scene for each person in a video sets a subsequent person representation frame in the time window Tfw as other comparison person representation frame, and performs operation S502. Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent person representation frame in the time window Tfw.
- In operation S508, when the current person representation frame is different from the comparison person representation frame, the system for classifying a scene for each person in a video determines whether the current person representation frame and the comparison person representation frame are at an end of the time window Tfw. Specifically, when the current person representation frame is different from the comparison person representation frame, the system for classifying a scene for each person in a video determines whether the all frames in the time window Tfw are compared by using a result of the determining whether the current person representation frame and the comparison person representation frame are at the end of the time window Tfw.
- In operation S509, when the current person representation frame and the comparison person representation frame are at the end of the time window Tfw, the system for classifying a scene for each person in a video moves to a subsequent cluster and performs a time information-based person clustering for the subsequent cluster, since all person representation frames corresponding to a current cluster are extracted.
- In operation S510, when the current person representation frame and the comparison person representation frame are not at the end of the time window Tfw, the system for classifying a scene for each person in a video sets the subsequent person representation frame as the comparison person representation frame, and performs operation S502, since the all person representation frames corresponding to the current cluster are not detected.
- In operation S450, the system for classifying a scene for each person in a video detects a scene change by separating a person portion from a background based on face extraction information and comparing the person portion and the background. Specifically, the system for classifying a scene for each person in a video may approximately extract a person by using the face extraction information, and thus may detect the scene change by the separating and the comparing after the person is approximately extracted. A scene change detection operation is described in greater detail with reference to
FIG. 6 . -
FIG. 6 is a flowchart illustrating an operation of a scene change detection illustrated inFIG. 4 according to another embodiment of the present invention. - Referring to
FIG. 6 , in operation S601, the system for classifying a scene for each person in a video receives current scene information, a current shot representation frame Pf, and a comparison shot representation frame Cf. - In operation S602, the system for classifying a scene for each person in a video extracts background information of the current shot representation frame Pf and the comparison shot representation frame Cf. The background information is information about a pixel of another location excluding a face location and a clothes location.
- In operation S603, the system for classifying a scene for each person in a video compares the current shot representation frame Pf and the comparison shot representation frame Cf. Specifically, the system for classifying a scene for each person in a video adds the comparison value of the color information corresponding to the clothes information and the weight of the comparison value corresponding to the face information, when comparing. Also, when comparing the background information, a normalized color histogram, and a hue, saturation, value (HSV) are used.
- In operation S604, the system for classifying a scene for each person in a video determines whether the current shot representation frame Pf is similar to the comparison shot representation frame Cf, as a result of the comparing.
- In operation S605, when the current shot representation frame Pf is similar to the comparison shot representation frame Cf, the system for classifying a scene for each person in a video extends a time window Tsw. Specifically, the system for classifying a scene for each person in a video resets the time window Tsw to extend a scene again, since a same scene is continued up to a point in time when the current shot representation frame Pf is similar to the comparison shot representation frame Cf.
- In operation S606, the system for classifying a scene for each person in a video marks that the comparing of the current shot representation frame Pf is completed, and sets the comparison shot representation frame Cf as the current shot representation frame Pf.
- In operation S607, the system for classifying a scene for each person in a video sets a subsequent shot representation frame in the time window Tsw as a comparison shot representation frame (*Cf?), and performs operation S602. Specifically, the system for classifying a scene for each person in a video continues to compare using the subsequent shot representation frame in the time window Tsw.
- In operation S608, when the current shot representation frame Pf is different from the comparison shot representation frame Cf, the system for classifying a scene for each person in a video marks that the comparing of the current shot representation frame Pf is completed.
- In operation S609, the system for classifying a scene for each person in a video determines whether comparing all frames in the time window Tsw is completed.
- In operation S610, when the comparing all frames in the time window Tsw is completed, the system for classifying a scene for each person in a video determines a shot, which is examined last and determined to be a similar shot, as a last shot of a current scene, since all shots corresponding to the current scene are detected. Also, the system for classifying a scene for each person in a video performs a detection operation of a subsequent scene.
- In operation S611, when the comparing is incomplete, the system for classifying a scene for each person in a video sets a subsequent shot representation frame where the comparing is incomplete as the current shot representation frame Pf, and sets the subsequent shot representation frame as the comparison shot representation frame Cf. Also, the system for classifying a scene for each person in a video performs operation S602.
- In operation S460, the system for classifying a scene for each person in a video merges similar clusters from the extracted person representation frame and performs the scene clustering for each person. Specifically, the system for classifying a scene for each person in a video may perform the scene clustering by comparing and merging as illustrated in
FIG. 3 . An operation of a scene clustering for each person is described in greater detail with reference toFIG. 7 . -
FIG. 7 is a flowchart illustrating an operation of a scene clustering for each person according to another embodiment of the present invention. - Referring to
FIG. 7 , in operation S701, the system for classifying a scene for each person in a video receives time information-based clusters. - In operation S702, the system for classifying a scene for each person in a video selects two clusters having a minimum difference value from difference values from among all clusters. Specifically, the difference values of all clusters may be compared using an average value of each cluster. Also, the minimum difference value may be used after comparing all objects of a corresponding cluster and all objects of a comparison cluster.
- In operation S703, the system for classifying a scene for each person in a video compares the minimum difference value and a threshold value and determines whether the minimum difference value is less than the threshold value.
- In operation S704, when the minimum difference value is less than the threshold value, the system for classifying a scene for each person in a video merges the two clusters, as illustrated in
FIG. 3 , since the two clusters include a similar person. Also, the system for classifying a scene for each person in a video performs operation S702. - In operation S705, when the minimum difference value is equal to or greater than the threshold value, the system for classifying a scene for each person in a video connects scenes including a person frame in a same cluster. Specifically, the system for classifying a scene for each person in a video determines that all clustering are completed when the minimum difference value is equal to or greater than the threshold value. Also, when connecting the scenes including a same person, the operation of a scene clustering for each person is completed. Each scene may be included in many clusters since various persons may exist in a single scene.
- The method and system for classifying a scene for each person in a video according to the above-described exemplary embodiments of the present invention may be recorded in computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVD; magneto-optical media such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. The media may also be a transmission medium such as optical or metallic lines, wave guides, etc. including a carrier wave transmitting signals specifying the program instructions, data structures, etc. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described exemplary embodiments of the present invention.
- A method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may provide a story overview for each person by classifying a person by a scene unit by using temporal information in video data.
- Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may improve an accuracy of a scene segmentation detection by separating a person portion and a background in video data and using information about the person portion and the background together.
- Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may replay for each person in video data, and thereby may enable a user to selectively view a scene including a person that the user likes.
- Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may classify a person by a scene unit, which is a story unit in video data, and thereby may improve a scene classification accuracy and enable a scene-based navigation.
- Also, a method and system for classifying a scene for each person in a video according to the above-described embodiments of the present invention may perform a video data analysis more easily by improving a scene classification accuracy in video data.
- Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (6)
1. A method of classifying a scene for each person in a video, the method comprising:
extracting a person representation frame in a shot;
comparing a first person representation frame and a second person representation frame;
performing a person clustering by extending a time window when the first person representation frame is similar to the second person representation frame;
merging similar clusters using a person cluster extracted from a representation frame and performing a scene clustering for each person based on a scene change,
wherein the scene change is determined using a person portion and a background portion.
2. The method of claim 1 , wherein the performing of the person clustering further comprises:
receiving cluster information, the first person representation frame, and the second person representation frame to be compared;
including the second person representation frame which has been currently compared in the current cluster information when the first person representation frame is similar to the second person representation frame; and
setting a subsequent person representation frame as third person representation frame to be compared on the time window.
3. The method of claim 2 , further comprising:
moving to a subsequent cluster when the first person representation frame and the second person representation frame are at the end of the time window; or
setting the subsequent person representation frame in the time window as the other person representation frame to be compared on the time window, when the first person representation frame and the second person representation frame to be compared are not at the end of the time window.
4. The method of claim 1 , wherein the performing of the scene clustering comprises:
receiving time information-based clusters;
selecting two clusters having a minimum difference value;
comparing the minimum difference value and a threshold value; and
merging the two clusters when the minimum difference value is less than the threshold value.
5. A non-transitory computer-readable recording medium storing a program for implementing a method of classifying a scene for each person in a video, the method comprising:
extracting a person representation frame in a shot;
comparing a first person representation frame and a second person representation frame;
performing a person clustering by extending a time window when the first person representation frame is similar to the second person representation frame;
merging similar clusters using a person cluster extracted from a representation frame and performing a scene clustering for each person based on a scene change,
wherein the scene change is determined using a person portion and a background portion.
6. A system for classifying a scene for each person in a video, the system comprising:
a person representation frame extracting unit to extract a person representation frame in a shot;
a person clustering unit to compare a first person representation frame and a second person representation frame and to perform a person clustering by extending a time window when the first person representation frame is similar to the second person representation frame;
a scene clustering unit to merge similar clusters using a person cluster extracted from a representation frame and to perform a scene clustering for each person based on a scene change,
wherein the scene change is determined using a person portion and a background portion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/317,509 US20120039515A1 (en) | 2007-01-04 | 2011-10-20 | Method and system for classifying scene for each person in video |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2007-0000957 | 2007-01-04 | ||
KR1020070000957A KR100804678B1 (en) | 2007-01-04 | 2007-01-04 | Method for classifying scene by personal of video and system thereof |
US11/882,733 US8073208B2 (en) | 2007-01-04 | 2007-08-03 | Method and system for classifying scene for each person in video |
US13/317,509 US20120039515A1 (en) | 2007-01-04 | 2011-10-20 | Method and system for classifying scene for each person in video |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/882,733 Continuation US8073208B2 (en) | 2007-01-04 | 2007-08-03 | Method and system for classifying scene for each person in video |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120039515A1 true US20120039515A1 (en) | 2012-02-16 |
Family
ID=39382421
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/882,733 Expired - Fee Related US8073208B2 (en) | 2007-01-04 | 2007-08-03 | Method and system for classifying scene for each person in video |
US13/317,509 Abandoned US20120039515A1 (en) | 2007-01-04 | 2011-10-20 | Method and system for classifying scene for each person in video |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/882,733 Expired - Fee Related US8073208B2 (en) | 2007-01-04 | 2007-08-03 | Method and system for classifying scene for each person in video |
Country Status (2)
Country | Link |
---|---|
US (2) | US8073208B2 (en) |
KR (1) | KR100804678B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104394422A (en) * | 2014-11-12 | 2015-03-04 | 华为软件技术有限公司 | Video segmentation point acquisition method and device |
WO2015135277A1 (en) * | 2014-03-14 | 2015-09-17 | 小米科技有限责任公司 | Clustering method and related device |
US9449216B1 (en) * | 2013-04-10 | 2016-09-20 | Amazon Technologies, Inc. | Detection of cast members in video content |
CN108446390A (en) * | 2018-03-22 | 2018-08-24 | 百度在线网络技术(北京)有限公司 | Method and apparatus for pushed information |
CN110807368A (en) * | 2019-10-08 | 2020-02-18 | 支付宝(杭州)信息技术有限公司 | Injection attack identification method, device and equipment |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009076982A (en) * | 2007-09-18 | 2009-04-09 | Toshiba Corp | Electronic apparatus, and face image display method |
JP2009081699A (en) * | 2007-09-26 | 2009-04-16 | Toshiba Corp | Electronic apparatus and method of controlling face image extraction |
JP2009089065A (en) | 2007-09-28 | 2009-04-23 | Toshiba Corp | Electronic device and facial image display apparatus |
US8121358B2 (en) * | 2009-03-06 | 2012-02-21 | Cyberlink Corp. | Method of grouping images by face |
US8531478B2 (en) * | 2009-03-19 | 2013-09-10 | Cyberlink Corp. | Method of browsing photos based on people |
JP2012039524A (en) * | 2010-08-10 | 2012-02-23 | Sony Corp | Moving image processing apparatus, moving image processing method and program |
US8726161B2 (en) | 2010-10-19 | 2014-05-13 | Apple Inc. | Visual presentation composition |
US20120155717A1 (en) * | 2010-12-16 | 2012-06-21 | Microsoft Corporation | Image search including facial image |
CN102682281A (en) * | 2011-03-04 | 2012-09-19 | 微软公司 | Aggregated facial tracking in video |
US9179201B2 (en) * | 2011-08-26 | 2015-11-03 | Cyberlink Corp. | Systems and methods of detecting significant faces in video streams |
US9417756B2 (en) | 2012-10-19 | 2016-08-16 | Apple Inc. | Viewing and editing media content |
US20140181668A1 (en) | 2012-12-20 | 2014-06-26 | International Business Machines Corporation | Visual summarization of video for quick understanding |
KR20160011532A (en) * | 2014-07-22 | 2016-02-01 | 삼성전자주식회사 | Method and apparatus for displaying videos |
EP3570207B1 (en) | 2018-05-15 | 2023-08-16 | IDEMIA Identity & Security Germany AG | Video cookies |
US11127221B1 (en) * | 2020-03-18 | 2021-09-21 | Facebook Technologies, Llc | Adaptive rate control for artificial reality |
CN115103223B (en) * | 2022-06-02 | 2023-11-10 | 咪咕视讯科技有限公司 | Video content detection method, device, equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040062520A1 (en) * | 2002-09-27 | 2004-04-01 | Koninklijke Philips Electronics N.V. | Enhanced commercial detection through fusion of video and audio signatures |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7742641B2 (en) * | 2004-12-06 | 2010-06-22 | Honda Motor Co., Ltd. | Confidence weighted classifier combination for multi-modal identification |
KR101195613B1 (en) * | 2005-08-04 | 2012-10-29 | 삼성전자주식회사 | Apparatus and method for partitioning moving image according to topic |
US7555149B2 (en) * | 2005-10-25 | 2009-06-30 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for segmenting videos using face detection |
-
2007
- 2007-01-04 KR KR1020070000957A patent/KR100804678B1/en not_active IP Right Cessation
- 2007-08-03 US US11/882,733 patent/US8073208B2/en not_active Expired - Fee Related
-
2011
- 2011-10-20 US US13/317,509 patent/US20120039515A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040062520A1 (en) * | 2002-09-27 | 2004-04-01 | Koninklijke Philips Electronics N.V. | Enhanced commercial detection through fusion of video and audio signatures |
Non-Patent Citations (1)
Title |
---|
Chaisorn et al, "A multi-modal approach to story segmentation for news video", IWI system, 2003. * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9449216B1 (en) * | 2013-04-10 | 2016-09-20 | Amazon Technologies, Inc. | Detection of cast members in video content |
WO2015135277A1 (en) * | 2014-03-14 | 2015-09-17 | 小米科技有限责任公司 | Clustering method and related device |
CN104394422A (en) * | 2014-11-12 | 2015-03-04 | 华为软件技术有限公司 | Video segmentation point acquisition method and device |
CN108446390A (en) * | 2018-03-22 | 2018-08-24 | 百度在线网络技术(北京)有限公司 | Method and apparatus for pushed information |
CN110807368A (en) * | 2019-10-08 | 2020-02-18 | 支付宝(杭州)信息技术有限公司 | Injection attack identification method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
KR100804678B1 (en) | 2008-02-20 |
US20080166027A1 (en) | 2008-07-10 |
US8073208B2 (en) | 2011-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8073208B2 (en) | Method and system for classifying scene for each person in video | |
US8358837B2 (en) | Apparatus and methods for detecting adult videos | |
US8316301B2 (en) | Apparatus, medium, and method segmenting video sequences based on topic | |
CN106937114B (en) | Method and device for detecting video scene switching | |
US7555149B2 (en) | Method and system for segmenting videos using face detection | |
US6195458B1 (en) | Method for content-based temporal segmentation of video | |
US20100188580A1 (en) | Detection of similar video segments | |
US8467611B2 (en) | Video key-frame extraction using bi-level sparsity | |
US9773322B2 (en) | Image processing apparatus and image processing method which learn dictionary | |
US20130156303A1 (en) | Image processing apparatus and image processing method | |
JP2015536094A (en) | Video scene detection | |
KR102221792B1 (en) | Apparatus and method for extracting story-based scene of video contents | |
JP6557592B2 (en) | Video scene division apparatus and video scene division program | |
US7813552B2 (en) | Methods of representing and analysing images | |
CN1909670B (en) | Image representation and analysis method | |
KR100717402B1 (en) | Apparatus and method for determining genre of multimedia data | |
JP2009123095A (en) | Image analysis device and image analysis method | |
US20070101354A1 (en) | Method and device for discriminating obscene video using time-based feature value | |
US8666175B2 (en) | Method and apparatus for detecting objects | |
JP2010186307A (en) | Moving image content identification apparatus and moving image content identification method | |
WO2006076760A1 (en) | Sequential data segmentation | |
Yilmaz et al. | Shot detection using principal coordinate system | |
WO2007004477A1 (en) | Image discrimination system and method | |
Bailer et al. | Detecting and clustering multiple takes of one scene | |
KR100656373B1 (en) | Method for discriminating obscene video using priority and classification-policy in time interval and apparatus thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |