[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2012093339A2 - Method and apparatus for comparing videos - Google Patents

Method and apparatus for comparing videos Download PDF

Info

Publication number
WO2012093339A2
WO2012093339A2 PCT/IB2012/000269 IB2012000269W WO2012093339A2 WO 2012093339 A2 WO2012093339 A2 WO 2012093339A2 IB 2012000269 W IB2012000269 W IB 2012000269W WO 2012093339 A2 WO2012093339 A2 WO 2012093339A2
Authority
WO
WIPO (PCT)
Prior art keywords
video
major
query
time series
declines
Prior art date
Application number
PCT/IB2012/000269
Other languages
French (fr)
Other versions
WO2012093339A3 (en
Inventor
Yansong Ren
Fangzhe Chang
Thomas L. Wood
Original Assignee
Alcatel Lucent
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/986,728 external-priority patent/US8731292B2/en
Priority claimed from US13/012,516 external-priority patent/US8849044B2/en
Application filed by Alcatel Lucent filed Critical Alcatel Lucent
Priority to JP2013547935A priority Critical patent/JP5685324B2/en
Priority to EP12711444.5A priority patent/EP2661710A2/en
Priority to CN201280011854.9A priority patent/CN103430175B/en
Priority to KR1020137017739A priority patent/KR101556513B1/en
Publication of WO2012093339A2 publication Critical patent/WO2012093339A2/en
Publication of WO2012093339A3 publication Critical patent/WO2012093339A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content

Definitions

  • the present invention relates to a method and apparatus for comparing videos.
  • video content may be uploaded by users to the site and made available to others via search engines. It is believed that current web video search engines provide a list of search results ranked according to their relevance scores based on a particular a text query entered by a user. The user must then consider the results to find the video or videos of interest.
  • duplicate video content may include video sequences with identical or approximately identical content but which are in different file formats, have different encoding parameters, and/or are of different lengths.
  • Other differences may be photometric variations, such as color and/or lighting changes, and/or minor editing operations in spatial or and temporal domain, such as the addition or alteration of captions, logos and/or borders.
  • duplicate videos can make it difficult or inconvenient for a user to find the content he or she actually wants.
  • based on sample queries from YouTube, Google Video and Yahoo! Video on average it was found that there are more than 27% near-duplicate videos listed in search results, with popular videos being those that are most duplicated in the results.
  • users Given a high percentage of duplicate videos in search results, users must spend significant time to sift through them to find the videos they need and must repeatedly watch similar copies of videos which have already been viewed.
  • the duplicate results depreciate users' experience of video search, retrieval and browsing.
  • such duplicated video content increases network overhead by storing and transferring duplicated video data across networks.
  • sequence matching an interval of time with multiple frames provides a basis for comparing the similarity of a query video and a target video.
  • this involves extracting a sequence of features, which may be, for example, ordinal, motion, color and centroid- based features, from both the query video frames and the target video frames.
  • the extracted feature sequences are then compared to determine the similarity distance between the videos. For example, where ordinal signatures are used, each video frame is first partitioned into Nl x N2 blocks and the average intensity of each block is calculated. Then, for each frame, the blocks are ranked according to their average intensities. The ranking order is considered to be that frame's ordinal measure.
  • the sequence of ordinal measures for one video is compared with that of the other to assess their similarity.
  • Sequence matching enables the start of the overlapping position between duplicate videos to be determined. Sequence matching approaches are suitable for identifying almost identical videos and copies of videos with format modifications, such as coding and frame resolution changes, and those with minor editing in the spatial and temporal domains. In particular, using spatial and temporal ordinal signatures allows detection of video distortions introduced by video
  • digitalization/encoding processes for example, changes in color, brightness and histogram equalization, encoding parameters
  • display format conversions for example converting to letter-box or pillar-box
  • modification of partial content for example, cropping and zooming in
  • Sequence matching techniques involve a relatively easy calculation and provide a compact representation of a frame, particularly when using ordinal measures. Sequence matching tends to be computationally efficient and real time computations may be carried out for processing live video. For example, an ordinal measure with 2 x 2 partitions of a frame needs only 4-dimensions to represent each frame, requiring fewer comparison points between two frames.
  • Keyframe matching techniques usually segment videos into a series of keyframes to represent the videos. Each keyframe is then partitioned into regions and features are extracted from salient local regions. The features may be, for example, color, texture, corners, or shape features for each region. Keyframe matching is capable of detecting approximate copies that have undergone a substantial degree of editing, such as changes in temporal order or insertion/deletion of frames. However, since there are simply too many local features in a keyframe, it is computationally expensive to identify keyframes, extract local features from each keyframe and conduct metric distance comparison between them to match a video clip against a large number of videos in database.
  • a method for comparing a query video and a target video includes partitioning frames of the query video and frames of the target video into blocks and calculating the mean intensity value for each block.
  • a plurality of query time series is produced for the query video, each query time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the query video.
  • a plurality of target time series is produced for the target video, each target time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the target video.
  • the query time series and the target time series are used in determining if alignment exists between the query video and the target video.
  • time series may be produced which can be compared for similarities.
  • Duplicate videos show similarities in the their respective time series, which may be used to identify that they are related.
  • a method in accordance with the invention offers efficient video duplication detection by reducing the comparison space between two videos.
  • An embodiment includes segmenting the query time series and the target time series into a respective set of discrete linear segments and performing local sequence alignment of those linear segments.
  • Linear segmentation enables mean video intensities to be compressed into a discrete list of linear inclines/declines which may then be compared for alignment.
  • the overlapping video regions usually do not span the entire length of video sequences and similar regions could be isolated. Therefore, local alignment of linear segments is needed.
  • the Smith- Waterman algorithm is well-known for determining similar regions between two nucleotide or protein sequences. The Smith-Waterman algorithm compares string segments of all possible lengths and optimizes the similarity measure. The present inventors have realized that the Smith- Waterman algorithm may be extended to perform local alignment for video intensity segments. Instead of comparing strings, intensity linear segments are compared to find local optimal alignment between videos.
  • the Smith- Waterman algorithm is a dynamic programming algorithm to provide optimized search. It is fairly demanding of time and memory resources: the computational complexity is 0(MN) and the storage is 0(min(M, N)), where M and N are the lengths of the sequences under comparison.
  • a sequence of major inclines/declines is selected as representations of key signatures of compared videos.
  • a heuristic method is applied to provide fast alignment of those major inclines/major declines by excising alignments that unlikely to result in a successful alignment before performing the more time-consuming Smith- Waterman algorithm. This reduces computational cost.
  • the heuristic method expedites the execution of the matching algorithm by filtering out very dissimilar videos and by narrowing down the potential matched regions for similar videos.
  • An embodiment in accordance with the invention may be advantageous where it is not feasible to know the types of user modifications in advance before applying video duplication detection techniques, allowing sequence matching techniques to be used. In addition, it retains the advantages of using sequence matching approaches, which is to provide efficient detection.
  • Detecting duplicate video with frame changes using an embodiment in accordance with the invention may be used by video hosting websites as a user feature; or used by video content providers to keep track of royalty payments and to detect possible copyright infringements; or used by communication "pipes" (e.g. Internet Service Providers (ISPs), peer-to-peer (P2P) system providers, content distribution network (CDN)) to reduce network traffic and to manage the storage of video content.
  • ISPs Internet Service Providers
  • P2P peer-to-peer
  • CDN content distribution network
  • It could assist video hosting websites in removing or aggregating near-duplicate videos to provide service for users to search, retrieval and browsing. It could also facilitate video content-based searching by finding similar videos, for example, with high quality (HD) or 3D.
  • HD high quality
  • a pre-existing video duplication system may be modified to include an embodiment in accordance with the invention, to enhance the ability to handle user modifications, such as frame insertions, deletions, or substitutions.
  • a device is programmed or configured to perform a method in accordance with the first aspect.
  • a data storage medium is provided for storing a machine-executable program for performing a method in accordance with the first aspect.
  • Figure 1 schematically illustrates videos to be compared and a stage in a comparison process
  • FIG. 2 schematically illustrates a method in accordance with the invention
  • Figure 3 schematically illustrates changes in intensity with time for one block
  • FIG. 4 schematically illustrates linear segmentation
  • Figure 5 schematically illustrates changes in intensity for compared videos
  • Figure 6 schematically illustrates a matrix used in the method of Figure 2;
  • FIG. 7 schematically illustrates steps in matching used in the method of Figure 2;
  • FIG 8 schematically illustrates steps in matching used in the method of Figure 2;
  • FIG. 9 schematically illustrates an apparatus in accordance with the invention.
  • a query video 1 comprising a plurality of frames is to be compared one or more target videos to determine if they are duplicates.
  • each frame in the query video 1 is partitioned into Nl x N2 blocks.
  • the mean intensity value for each block is calculated at 3.
  • the calculated mean intensity value is plotted against frame number to produce a query time series at 4.
  • all blocks are processed provide Nl x N2 time series associated with the video 1.
  • selected blocks are involved, thus resulting in fewer than Nl x N2 time series being produced.
  • a target video 5 shown in Figure 1 is based on the query video 1 but has been modified with histogram equalization, added brightness and border and frame deletion.
  • target time series shown at 6 are obtained. It can be seen that the changes in intensity for blocks from the target video 5 are generally similar in form to those of the video 1. For example, at frame number 806 for the query time series at 4, the mean intensity for one block increases while that of another decreases so that they cross over. A similar cross over can be seen at frame 739 for the target time series at 6.
  • the next step at 7 in Figure 2 is to capture information provided by temporal changes in the query and target time series by using piecewise linear segmentation techniques.
  • segmenting the time series the video is compressed and most of the essential information in the temporal changes of video intensities is captured. Due to user modification, video distortion and format conversions, one would not expect find exact matches in video duplicate detection and ignoring minor changes of temporal intensities makes the video duplicate detection process relatively insensitive to noise.
  • Figure 3 a illustrates variation in mean intensity for part of one time series such as that shown at 4 or 6 in Figure 1.
  • Figure 3b illustrates the part of the time series shown in Figure la after linear segmentation has been applied.
  • a Bottom-Up algorithm is used to segment the time series.
  • the Bottom-Up approach is a well-known approximation algorithm in time series. It starts from the finest possible approximation and iteratively merges segments until a stopping criterion is met. In this case, linear interpolation is used rather than linear regression to find the approximating line since linear interpolation can be obtained in constant time with low computational complexity.
  • the quality of fit for a potential segment is evaluated using residual error. A residual error is calculated by taking all the vertical differences between the best-fit line and the actual data points, squaring them and then summing them together.
  • the fast linear segmentation of the time series is achieved by an interpolation method using extraction of major maxima and major minima points as extrema points.
  • Figure 4a shows a linear approximation using maxima and minima points.
  • Jump points correspond to rapid changes in values, such as, for example, a jump up or down, within a short time distance. For intensity curves of video block series, these jumps typically indicate shot boundaries, caused by hard cuts or fades in/out.
  • the linear segmentation technique is extended to also include jump points so that the extrema points used in the linear segmentation method are maxima points, minima points and jump points, as illustrated in Figure 4b.
  • inclines/declines in the time series are selected at 9 as providing significant video signatures. This enables the search space for aligning linear segments to be reduced.
  • linear segments with longer distance and deeper height usually represent conspicuous changes in a scene. They are therefore chosen as major inclines.
  • Matching consecutive major inclines indicates video copies following similar behavior with the same sequence of major scene changes.
  • linear segments of deep heights but of very short lengths are typically associated with shot boundaries, such as hard cuts or fades. Such linear segments often contain less information than those representing changes within a scene.
  • a shot boundary can be determined if the linear segments from all partitioned blocks have deep heights within a same short distance occurring at a same time (i.e. the same starting frame IDs). Those linear segments representing shot boundaries are ignored in the process of selecting major inclines.
  • the major inclines/declines of a query video and a target video are compared, as illustrated in Figure 5, to find approximate alignments with consecutive matched inclines/declines that are likely to lead to a successful alignment.
  • an Ml by M2 matrix is generated, where Ml and M2 are the lengths of the major inclines/declines sequences under comparison. If two major inclines/declines at i and j match, value "1" is put in matrix (i, j).
  • ratioi 0.9.
  • the minimal distance between the two corresponding frame sequences is at most the threshold constant dist when "sliding* the shorter sequence along the longer sequence, where p ranges over the beginning of the sliding frame position in the longer video.
  • the ordinal signature measurement calculates the distance between two frame sequences F ⁇ and F 2
  • L - , is the length of the shorter sequence.
  • Reward scores are assigned to diagonal matched lines and penalty scores to gaps, that is, mismatches, when connecting neighboring diagonal lines.
  • a score is obtained by adding the reward scores of each of the connected diagonals and subtracting the gap penalties. If the score of a linked approximate alignment exceeds a given threshold, a check is made to determine if the previously ignored initial short aligned segments around the linked segments can be joined to form an approximate alignment with gaps, as shown in Figure 8 (d). Finally, the local approximate alignments having final scores exceeding a threshold are selected for further examination.
  • the next stage at 15 is to conduct fine-grain alignment of all intensity linear segments of compared videos by applying the Smith- Waterman algorithm. Based on the approximate alignments of major inclines/declines found previously, lists of linear intensity segments that could lead to successful alignment can be determined.
  • the Smith- Waterman algorithm only needs to examine a restricted range of linear segments.
  • the Smith- Waterman algorithm uses edit distance to find the optimal alignment. It constructs a scoring matrix H as follows:
  • x and y are the lists of linear segments that are potentially aligned
  • M and N are the lengths of x and y sequences
  • ⁇ ⁇ , , _y .) is a scoring scheme. If JC, and . , match, ⁇ ⁇ ⁇ is positive and if they don't match, it is negative. For insertion and deletion, ⁇ )( ⁇ , ,-) and ⁇ (-, _y ) are negative.
  • the Smith- Waterman algorithm finds the local alignment by searching for the maximal score in matrix Hand then tracking back the optimal path depending on the direction of movement used to construct the matrix. It maintains this process until a score of 0 is reached.
  • the video similarity distance is calculated at 16 by applying existing sequence matching techniques for the matched linear segments. In this embodiment, ordinal
  • the changes of intensity values of partitioned blocks are first considered as time series. Then; the time series are segmented into a list of discrete linear representations. Local sequence alignment is performed of those linear segments to find optimal matching position. Then video similarity distance is calculated based on the potential alignment position. If the best matching similarity distance is less than a given threshold, two videos are considered as duplicate. To handle changes of frames, gaps, the result of frame insertions, deletions, and substitutions, are permitted to exist when in comparing linear sequence segments.
  • a video management apparatus includes a database or store 19 which holds video files.
  • the database 19 may be one which is generally accessible to users via the Internet or may, for example, be a library or other depository with restricted access. Other types of store or database may be used instead or in addition to these possibilities.
  • a user transmits a video Q that he or she wants to add to the database 19 by submitting the video Q via a user interface 20.
  • the video Q is sent to the video database 19 and also to a partitioner 21.
  • the partitioner 21 partitions each frame of the video Q into Nl x N2 blocks.
  • a calculator 22 calculates the mean intensity values for each of the blocks.
  • mean intensity value data is received by a segmenter 23 from the calculator 22.
  • the segmenter 23 segments the changes of mean intensities of each block.
  • a sorter 24 then sorts the linear segments from all blocks based on the segment starting frame IDs into a sorted list.
  • a selector 25 receives the sorted list and selects major inclines/major declines from the sorted list.
  • an aligner 26 attempts to find an approximate match between the selected major inclines and major declines of the query video and those of one or more target videos that have undergone similar processing. The results are tested by a first comparator 27. If there is no similarity, judged against a given threshold parameter, then the query video and target video or videos are deemed to not be duplicates and the duplication detection process stops at 28.
  • a banded Smith- Waterman algorithm is applied by processor 29 and the results applied to a similarity distance calculator 30.
  • the output of the similarity distance calculator 30 is checked against a given threshold by a second comparator 31. If there is insufficient similarity, the compared videos are deemed not to be duplicates and the process stops at 32.
  • a frame matcher 33 checks unmatched frame positions for video insertions, deletions or substitutions.
  • the results of the duplicate detection process are sent to the video database 19 to be used in managing the stored videos. If the query video is not found to be a duplicate, the video database 19 accepts it for storage. If the query video is found to be a duplicate, then in one embodiment, the video database 19 rejects it with or without a message to the user to inform them.
  • the query video is found to be a duplicate, it is accepted into the video database 19 but it is denoted as a duplicate, preferably with a reference to the target video that it matches.
  • Duplicate videos may be collected together in a group. When a search performed on the database calls up one of the group, other group members may be suppressed from the search results or are given a lower ranking in the search results than they would otherwise merit, so that any duplicates tend to be presented after other non-duplicates.
  • the video management apparatus of Figure 9 may be modified so that videos held in the video database 19 are partitioned and processed at 21 and 22 prior to the query video being submitted.
  • data obtained when a video is submitted to be examined for duplicates may be retained and sent to be stored at the video database 19. If that video is subsequently not accepted into the database 19, the data is deleted. When the video is accepted into the database, the data associated with it is retained and is available for use in the aligner 26.
  • videos in the video database 19 may be partitioned and processed in Stage 1 and Stage 2 without necessarily having been used in testing for duplicates. For example, the data processing may be carried out as part of a preparation phase before opening the database to receive new videos.
  • processors may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software.
  • the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared.
  • explicit use of the term "processor” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non volatile storage.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • ROM read only memory
  • RAM random access memory
  • non volatile storage Other hardware, conventional and/or custom, may also be included.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

A method for comparing a query video and a target video includes partitioning frames of the query video and frames of the target video into blocks and calculating the mean intensity value for each block. A plurality of query time series is produced for the query video, each query time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the query video. A plurality of target time series is produced for the target video, each target time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the target video, the query time series and the target time series are used in determining if alignment exists between the query video and the target video.

Description

METHOD AND APPARATUS FOR COMPARING VIDEOS
FIELD OF THE INVENTION
The present invention relates to a method and apparatus for comparing videos.
BACKGROUND
In a video hosting website, such as, for example, YouTube, Google Video and Yahoo! Video, video content may be uploaded by users to the site and made available to others via search engines. It is believed that current web video search engines provide a list of search results ranked according to their relevance scores based on a particular a text query entered by a user. The user must then consider the results to find the video or videos of interest.
Since it is easy for users to upload videos to a host, obtain videos and distribute them again with some modifications, there are potentially numerous duplicate, or near duplicate, contents in the video searching results. Such duplicates would be considered by a user to be "essentially the same", based on their overall content and subjective impression. For example, duplicate video content may include video sequences with identical or approximately identical content but which are in different file formats, have different encoding parameters, and/or are of different lengths. Other differences may be photometric variations, such as color and/or lighting changes, and/or minor editing operations in spatial or and temporal domain, such as the addition or alteration of captions, logos and/or borders. These examples are not intended to be an exhaustive list and other types of difference may also occur in duplicate videos.
The proliferation of duplicate videos can make it difficult or inconvenient for a user to find the content he or she actually wants. As an example, based on sample queries from YouTube, Google Video and Yahoo! Video, on average it was found that there are more than 27% near-duplicate videos listed in search results, with popular videos being those that are most duplicated in the results. Given a high percentage of duplicate videos in search results, users must spend significant time to sift through them to find the videos they need and must repeatedly watch similar copies of videos which have already been viewed. The duplicate results depreciate users' experience of video search, retrieval and browsing. In addition, such duplicated video content increases network overhead by storing and transferring duplicated video data across networks.
One type of video copy detection technique is sequence matching. In sequence matching, an interval of time with multiple frames provides a basis for comparing the similarity of a query video and a target video. Typically, this involves extracting a sequence of features, which may be, for example, ordinal, motion, color and centroid- based features, from both the query video frames and the target video frames. The extracted feature sequences are then compared to determine the similarity distance between the videos. For example, where ordinal signatures are used, each video frame is first partitioned into Nl x N2 blocks and the average intensity of each block is calculated. Then, for each frame, the blocks are ranked according to their average intensities. The ranking order is considered to be that frame's ordinal measure. The sequence of ordinal measures for one video is compared with that of the other to assess their similarity.
Sequence matching enables the start of the overlapping position between duplicate videos to be determined. Sequence matching approaches are suitable for identifying almost identical videos and copies of videos with format modifications, such as coding and frame resolution changes, and those with minor editing in the spatial and temporal domains. In particular, using spatial and temporal ordinal signatures allows detection of video distortions introduced by video
digitalization/encoding processes (for example, changes in color, brightness and histogram equalization, encoding parameters) and display format conversions (for example converting to letter-box or pillar-box) and modification of partial content (for example, cropping and zooming in).
Sequence matching techniques involve a relatively easy calculation and provide a compact representation of a frame, particularly when using ordinal measures. Sequence matching tends to be computationally efficient and real time computations may be carried out for processing live video. For example, an ordinal measure with 2 x 2 partitions of a frame needs only 4-dimensions to represent each frame, requiring fewer comparison points between two frames.
However, existing sequence matching based techniques are unable to detect duplicate video clips where there are changes in frame sequences, such as insertion, deletion or substitutions of frames. Changes of frame sequences are introduced by user editing, or by video hosting websites to insert commercials into a video, for example. Since it is not feasible to assume the type of user modification beforehand, the lack of ability to detect frame sequence changes limits the applicability of sequence matching techniques to real life problems.
Existing solutions for detecting duplicate videos with frame sequence alterations such as insertions, deletions or substitutions of frames, are based on keyframe matching techniques.
Keyframe matching techniques usually segment videos into a series of keyframes to represent the videos. Each keyframe is then partitioned into regions and features are extracted from salient local regions. The features may be, for example, color, texture, corners, or shape features for each region. Keyframe matching is capable of detecting approximate copies that have undergone a substantial degree of editing, such as changes in temporal order or insertion/deletion of frames. However, since there are simply too many local features in a keyframe, it is computationally expensive to identify keyframes, extract local features from each keyframe and conduct metric distance comparison between them to match a video clip against a large number of videos in database.
Recent research has been aimed at improving the speed of keyframe matching methods by fast indexing the feature vectors or by using statistical information to reduce the dimension of feature vectors. However, for online analysis, both the cost of segmenting videos into keyframes and the cost of extracting local features from a query video are still unavoidable. It becomes a real challenge to provide online realtime video duplication detection in a Web 2.0 video hosting environment. Keyframe matching approaches are more suitable for offline video redundancy detection with fine-grain analysis to aggregate and classify database videos. BRIEF SUMMARY
According to a first aspect of the invention, a method for comparing a query video and a target video includes partitioning frames of the query video and frames of the target video into blocks and calculating the mean intensity value for each block. A plurality of query time series is produced for the query video, each query time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the query video. A plurality of target time series is produced for the target video, each target time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the target video. The query time series and the target time series are used in determining if alignment exists between the query video and the target video. By using the invention, time series may be produced which can be compared for similarities.
Duplicate videos show similarities in the their respective time series, which may be used to identify that they are related. A method in accordance with the invention offers efficient video duplication detection by reducing the comparison space between two videos.
An embodiment includes segmenting the query time series and the target time series into a respective set of discrete linear segments and performing local sequence alignment of those linear segments. Linear segmentation enables mean video intensities to be compressed into a discrete list of linear inclines/declines which may then be compared for alignment.
In duplicate videos, the overlapping video regions usually do not span the entire length of video sequences and similar regions could be isolated. Therefore, local alignment of linear segments is needed. In bioinformatics, the Smith- Waterman algorithm is well-known for determining similar regions between two nucleotide or protein sequences. The Smith-Waterman algorithm compares string segments of all possible lengths and optimizes the similarity measure. The present inventors have realized that the Smith- Waterman algorithm may be extended to perform local alignment for video intensity segments. Instead of comparing strings, intensity linear segments are compared to find local optimal alignment between videos.
The Smith- Waterman algorithm is a dynamic programming algorithm to provide optimized search. It is fairly demanding of time and memory resources: the computational complexity is 0(MN) and the storage is 0(min(M, N)), where M and N are the lengths of the sequences under comparison.
To accelerate the search process, instead of aligning all intensity segments, in an embodiment, a sequence of major inclines/declines is selected as representations of key signatures of compared videos. A heuristic method is applied to provide fast alignment of those major inclines/major declines by excising alignments that unlikely to result in a successful alignment before performing the more time-consuming Smith- Waterman algorithm. This reduces computational cost. The heuristic method expedites the execution of the matching algorithm by filtering out very dissimilar videos and by narrowing down the potential matched regions for similar videos.
An embodiment in accordance with the invention may be advantageous where it is not feasible to know the types of user modifications in advance before applying video duplication detection techniques, allowing sequence matching techniques to be used. In addition, it retains the advantages of using sequence matching approaches, which is to provide efficient detection.
Detecting duplicate video with frame changes using an embodiment in accordance with the invention may be used by video hosting websites as a user feature; or used by video content providers to keep track of royalty payments and to detect possible copyright infringements; or used by communication "pipes" (e.g. Internet Service Providers (ISPs), peer-to-peer (P2P) system providers, content distribution network (CDN)) to reduce network traffic and to manage the storage of video content. It could assist video hosting websites in removing or aggregating near-duplicate videos to provide service for users to search, retrieval and browsing. It could also facilitate video content-based searching by finding similar videos, for example, with high quality (HD) or 3D.
A pre-existing video duplication system may be modified to include an embodiment in accordance with the invention, to enhance the ability to handle user modifications, such as frame insertions, deletions, or substitutions.
According to a second aspect of the invention, a device is programmed or configured to perform a method in accordance with the first aspect. According to a third aspect of the invention, a data storage medium is provided for storing a machine-executable program for performing a method in accordance with the first aspect.
BRIEF DESCRIPTION OF THE DRAWINGS
Some embodiments of the present invention will now be described by way of example only, and with reference to the accompanying drawings, in which:
Figure 1 schematically illustrates videos to be compared and a stage in a comparison process;
Figure 2 schematically illustrates a method in accordance with the invention;
Figure 3 schematically illustrates changes in intensity with time for one block;
Figure 4 schematically illustrates linear segmentation;
Figure 5 schematically illustrates changes in intensity for compared videos;
Figure 6 schematically illustrates a matrix used in the method of Figure 2;
Figure 7 schematically illustrates steps in matching used in the method of Figure 2;
Figure 8 schematically illustrates steps in matching used in the method of Figure 2; and
Figure 9 schematically illustrates an apparatus in accordance with the invention.
DETAILED DESCRIPTION
With reference to Figure 1, a query video 1 comprising a plurality of frames is to be compared one or more target videos to determine if they are duplicates.
With reference to Figure 2, at 2, each frame in the query video 1 is partitioned into Nl x N2 blocks. The mean intensity value for each block is calculated at 3. By partitioning each frame, variations in intensity changes in the partitioned sub-regions are retained. For each block, the calculated mean intensity value is plotted against frame number to produce a query time series at 4. In this embodiment, all blocks are processed provide Nl x N2 time series associated with the video 1. In another embodiment, selected blocks only are involved, thus resulting in fewer than Nl x N2 time series being produced.
For comparison, a target video 5 shown in Figure 1 is based on the query video 1 but has been modified with histogram equalization, added brightness and border and frame deletion. When the target video 5 is processed in the same manner as described above, target time series shown at 6 are obtained. It can be seen that the changes in intensity for blocks from the target video 5 are generally similar in form to those of the video 1. For example, at frame number 806 for the query time series at 4, the mean intensity for one block increases while that of another decreases so that they cross over. A similar cross over can be seen at frame 739 for the target time series at 6.
The next step at 7 in Figure 2 is to capture information provided by temporal changes in the query and target time series by using piecewise linear segmentation techniques. By segmenting the time series, the video is compressed and most of the essential information in the temporal changes of video intensities is captured. Due to user modification, video distortion and format conversions, one would not expect find exact matches in video duplicate detection and ignoring minor changes of temporal intensities makes the video duplicate detection process relatively insensitive to noise.
Figure 3 a illustrates variation in mean intensity for part of one time series such as that shown at 4 or 6 in Figure 1. Figure 3b illustrates the part of the time series shown in Figure la after linear segmentation has been applied.
A Bottom-Up algorithm is used to segment the time series. The Bottom-Up approach is a well-known approximation algorithm in time series. It starts from the finest possible approximation and iteratively merges segments until a stopping criterion is met. In this case, linear interpolation is used rather than linear regression to find the approximating line since linear interpolation can be obtained in constant time with low computational complexity. The quality of fit for a potential segment is evaluated using residual error. A residual error is calculated by taking all the vertical differences between the best-fit line and the actual data points, squaring them and then summing them together.
In another embodiment, the fast linear segmentation of the time series is achieved by an interpolation method using extraction of major maxima and major minima points as extrema points. Figure 4a shows a linear approximation using maxima and minima points. However, the inventors have realized that relying only on these points excludes jump points, such as that shown at 8. Jump points correspond to rapid changes in values, such as, for example, a jump up or down, within a short time distance. For intensity curves of video block series, these jumps typically indicate shot boundaries, caused by hard cuts or fades in/out. Thus, in this embodiment, the linear segmentation technique is extended to also include jump points so that the extrema points used in the linear segmentation method are maxima points, minima points and jump points, as illustrated in Figure 4b.
Following linear segmentation of the time series, major inclines/declines in the time series are selected at 9 as providing significant video signatures. This enables the search space for aligning linear segments to be reduced.
The linear segments with longer distance and deeper height usually represent conspicuous changes in a scene. They are therefore chosen as major inclines.
Matching consecutive major inclines indicates video copies following similar behavior with the same sequence of major scene changes. In contrast, linear segments of deep heights but of very short lengths are typically associated with shot boundaries, such as hard cuts or fades. Such linear segments often contain less information than those representing changes within a scene. A shot boundary can be determined if the linear segments from all partitioned blocks have deep heights within a same short distance occurring at a same time (i.e. the same starting frame IDs). Those linear segments representing shot boundaries are ignored in the process of selecting major inclines.
At 12, the major inclines/declines of a query video and a target video are compared, as illustrated in Figure 5, to find approximate alignments with consecutive matched inclines/declines that are likely to lead to a successful alignment. With reference to Figure 6, an Ml by M2 matrix is generated, where Ml and M2 are the lengths of the major inclines/declines sequences under comparison. If two major inclines/declines at i and j match, value "1" is put in matrix (i, j). To check the similarity between a linear segment Sj[ij, and a segment ι¼[*2, ·· .,./_>], we consider not only the heights and lengths of the segments, but also the similarity of video frames contained in the two segments. More precisely, these two segments are similar if
• ratio L <——— < 1 / ratio L . That is, two segments are of similar lengths
- h
In this implementation, ratioi = 0.9.
Figure imgf000011_0001
• ratio H 1 / ratio H . That is, two segments are of similar lengths. In this implementation, ration = 0.75.
• minp D{p) < dist . In other words, the minimal distance between the two corresponding frame sequences is at most the threshold constant dist when "sliding* the shorter sequence along the longer sequence, where p ranges over the beginning of the sliding frame position in the longer video. In this embodiment, we choose the spatial and temporal ordinal signature algorithms for calculating the video similarity distance due to its efficiency and accuracy.
Given two frame sequences Fj and F2, the ordinal signature measurement calculates the distance between two frame sequences F\ and F2
D(p) n
Figure imgf000011_0002
where L =
Figure imgf000011_0003
- , is the length of the shorter sequence.
Since user modification and video processing techniques could cause differences in video intensity values, such as histogram equalization, frame resizing or cropping, changes in brightness/color/hue, other added noises, the heights of similar intensity linear segments could be different. The distances of similar linear segments could also be different due to linear segment approximation error or other user introduced noises. The use of parameter ration and ratioi allows tolerance of such noises to a certain degree. Even though here the ordinal signatures based
measurement Dip) is used to calculate distance of two frame sequences, matching of video frames can be based on other global descriptors or even local descriptors, using sequence matching or keyframe based matching algorithms. After aligning major inclines, the potential major inclines alignments are extended to neighbor non-major inclines to find more aligned linear segments, as shown in Figure 7. This step filters out unnecessary alignments to reduce the number of comparisons needed for applying the Smith- Waterman algorithm in the next stage.
At the next step, to find the key approximate alignments, the inventors have realized that alignment can be carried out using an approach similar to that provided by FASTA, which is a fast search algorithm used in finding similar DNA and protein sequences. All diagonal lines of consecutive value "l"s in the matrix are identified, as shown in Figure 8 (a). Next, those diagonal lines whose length is longer than a predefined threshold are retained and single matches and short aligned segments are ignored, as illustrated in Figure 8 (b). Then, the top K longest diagonal lines are selected, as shown in Figure 8 (c). To extend the overall length of an alignment, an attempt is made to join those segments of the top K diagonal lines that are close to each other to form a longer segment. Gaps are allowed in the joined longer segments to take account of frame insertions, deletions and substitutions.
Reward scores are assigned to diagonal matched lines and penalty scores to gaps, that is, mismatches, when connecting neighboring diagonal lines. A score is obtained by adding the reward scores of each of the connected diagonals and subtracting the gap penalties. If the score of a linked approximate alignment exceeds a given threshold, a check is made to determine if the previously ignored initial short aligned segments around the linked segments can be joined to form an approximate alignment with gaps, as shown in Figure 8 (d). Finally, the local approximate alignments having final scores exceeding a threshold are selected for further examination.
The next stage at 15 is to conduct fine-grain alignment of all intensity linear segments of compared videos by applying the Smith- Waterman algorithm. Based on the approximate alignments of major inclines/declines found previously, lists of linear intensity segments that could lead to successful alignment can be determined. The Smith- Waterman algorithm only needs to examine a restricted range of linear segments.
The Smith- Waterman algorithm uses edit distance to find the optimal alignment. It constructs a scoring matrix H as follows:
H(/,0) = 0, 0≤i < M
H(p,j) = 0, ≤j≤N
Figure imgf000013_0001
where x and y are the lists of linear segments that are potentially aligned, M and N are the lengths of x and y sequences, and ω{χι , , _y .) is a scoring scheme. If JC, and . , match, ω χ γ^ is positive and if they don't match, it is negative. For insertion and deletion, ώ)(χ, ,-) and ω (-, _y ) are negative.
The Smith- Waterman algorithm finds the local alignment by searching for the maximal score in matrix Hand then tracking back the optimal path depending on the direction of movement used to construct the matrix. It maintains this process until a score of 0 is reached. Once the local optimal alignment is obtained, the video similarity distance is calculated at 16 by applying existing sequence matching techniques for the matched linear segments. In this embodiment, ordinal
measurement with 2x2 partitions is used to determine the video similarity distance. If the distance is found to be less than a threshold at 17, the two compared videos are considered to be duplicates.
Next, alignment at video frame level is examined at 18, instead of at linear segment level, for linear segments. Since the optimal local alignment is based on intensity linear segments, if frame changes occur inside a segment, the entire segment is considered as not being a match using the Smith- Waterman algorithm, as discussed above. To find potential matching positions inside the unmatched segments, a frame to frame comparison is conducted to calculate the frame level similarity distance. If a frame similarity distance is less than the video similarity distance obtained using the Smith- Waterman algorithm, those frames are considered to be matched. This ensures that the similarity distance of the matched frames inside those unmatched segments will not exceed the average video similarity distance obtained from the rest of the matched segments. Frame comparisons are initiated from both the beginning and the end of the unmatched segments, towards the middle of the segments. Matching is continued until a frame similarity distance is larger than the video similarity distance. The video overlapping positions are then updated.
Thus, in this embodiment, the changes of intensity values of partitioned blocks are first considered as time series. Then; the time series are segmented into a list of discrete linear representations. Local sequence alignment is performed of those linear segments to find optimal matching position. Then video similarity distance is calculated based on the potential alignment position. If the best matching similarity distance is less than a given threshold, two videos are considered as duplicate. To handle changes of frames, gaps, the result of frame insertions, deletions, and substitutions, are permitted to exist when in comparing linear sequence segments.
With reference to Figure 9, a video management apparatus includes a database or store 19 which holds video files. The database 19 may be one which is generally accessible to users via the Internet or may, for example, be a library or other depository with restricted access. Other types of store or database may be used instead or in addition to these possibilities.
A user transmits a video Q that he or she wants to add to the database 19 by submitting the video Q via a user interface 20. The video Q is sent to the video database 19 and also to a partitioner 21. At Stage 1 of the operation, the partitioner 21 partitions each frame of the video Q into Nl x N2 blocks. A calculator 22 calculates the mean intensity values for each of the blocks.
At Stage 2, mean intensity value data is received by a segmenter 23 from the calculator 22. The segmenter 23 segments the changes of mean intensities of each block. A sorter 24 then sorts the linear segments from all blocks based on the segment starting frame IDs into a sorted list. A selector 25 receives the sorted list and selects major inclines/major declines from the sorted list.
In the next stage, Stage 3, an aligner 26 attempts to find an approximate match between the selected major inclines and major declines of the query video and those of one or more target videos that have undergone similar processing. The results are tested by a first comparator 27. If there is no similarity, judged against a given threshold parameter, then the query video and target video or videos are deemed to not be duplicates and the duplication detection process stops at 28.
If the comparator 27 detects approximate alignment, at Stage 4, a banded Smith- Waterman algorithm is applied by processor 29 and the results applied to a similarity distance calculator 30. The output of the similarity distance calculator 30 is checked against a given threshold by a second comparator 31. If there is insufficient similarity, the compared videos are deemed not to be duplicates and the process stops at 32.
If there is sufficient similarity, at Stage 5, a frame matcher 33 checks unmatched frame positions for video insertions, deletions or substitutions.
The results of the duplicate detection process are sent to the video database 19 to be used in managing the stored videos. If the query video is not found to be a duplicate, the video database 19 accepts it for storage. If the query video is found to be a duplicate, then in one embodiment, the video database 19 rejects it with or without a message to the user to inform them.
In an alternative embodiment, or mode, if the query video is found to be a duplicate, it is accepted into the video database 19 but it is denoted as a duplicate, preferably with a reference to the target video that it matches. Duplicate videos may be collected together in a group. When a search performed on the database calls up one of the group, other group members may be suppressed from the search results or are given a lower ranking in the search results than they would otherwise merit, so that any duplicates tend to be presented after other non-duplicates.
The video management apparatus of Figure 9 may be modified so that videos held in the video database 19 are partitioned and processed at 21 and 22 prior to the query video being submitted. For example, in one embodiment, data obtained when a video is submitted to be examined for duplicates may be retained and sent to be stored at the video database 19. If that video is subsequently not accepted into the database 19, the data is deleted. When the video is accepted into the database, the data associated with it is retained and is available for use in the aligner 26. In another embodiment, videos in the video database 19 may be partitioned and processed in Stage 1 and Stage 2 without necessarily having been used in testing for duplicates. For example, the data processing may be carried out as part of a preparation phase before opening the database to receive new videos.
The functions of the various elements shown in the figures, including any functional blocks labeled as "processors", may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non volatile storage. Other hardware, conventional and/or custom, may also be included.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

WE CLAIM:
1. A method for comparing a query video and a target video including:
partitioning frames of the query video and frames of the target video into blocks;
calculating the mean intensity value for each block;
producing a plurality of query time series for the query video, each query time scries representing temporal variation in mean intensity value for blocks from the same location in different frames of the query video;
producing a plurality of target time series for the target video, each target time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the target video; and
determining if alignment exists between the query video and the target video using the query time series and the target time scries.
2. The method as claimed in claim 1 and including segmenting the query time series and the target time series into a respective set of discrete linear segments; and performing local sequence alignment of those linear segments.
3. The method as claimed in claim 2 and including selecting major inclines and major declines from the segmented time series and using the major inclines and major declines in performing alignment.
4. The method as claimed in claim 3 wherein the selected major inclines and the major declines exclude jump inclines and jump declines.
5. The method as claimed in claim 2 and comparing major inclines and declines of the query video with major inclines and major declines of the target video to obtain approximate alignments with consecutive matched inclines and declines.
6. The method as claimed in claim 5 and including matching a query video sequence of major inclines/ major declines against a target video sequence of major inclines/major declines.
7. The met od as claimed in claim 6 wherein the matching is carried out by creating a matrix having cells with the query video sequence of major inclines/major declines plotted against a target video sequence of major inclines/major declines and, where a match exists, adding a marker in an appropriate cell of the matrix.
8. The method as claimed in claim 7 and after aligning major inclines/major declines, extending the major inclines/major declines to neighbour non-major inclines/non- major declines.
9. The method as claimed in claim 8 and including identifying diagonal lines of consecutive cells having markers and retaining those diagonal lines having a length that is greater than a given threshold for additional alignment processing.
10. The method as claimed in claim 9 and including selecting the K longest diagonal lines and attempting to join closely located segments included in the top diagonal lines to form a longer segment.
11. The method as claimed in claim 10 and including awarding reward scores to diagonal matched lines and penalty scores to gaps in longer lines and, when the combined score of a linked approximate alignment exceeds a given score threshold, checking if previously ignored initial short aligned segments around the linked segments can be joined to form an approximate alignment, and selecting the local approximate alignments having final scores exceeding a final score threshold for further examination.
12. The method as claimed in claim 3 and including obtaining an approximate alignment of segments to select a set of possible successful alignments and then applying a Smith- Waterman algorithm to the selected set.
13. The method as claimed in claim 12 and including performing alignment at frame level for approximate aligned segments not included in the selected set.
14. The method as claimed in claim 1 and including storing the query video in a video database holding the target video when the query video is determined not to be a duplicate of the target video.
15. A device programmed or configured to perform a method comprising the steps of partitioning frames of the query video and frames of the target video into blocks;
calculating the mean intensity value for each block;
producing a plurality of query time series for the query video, each query time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the query video;
producing a plurality of target time series for the target video, each target time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the target video; and
determining if alignment exists between the query video and the target video using the query time series and the target time series.
16. The device as claimed in claim 15 and programmed or configured to perform a method including segmenting the query time scries and the target time series into a respective set of discrete linear segments; and performing local sequence alignment of those linear segments.
17. The device as claimed in claim 16 and programmed or configured to select major inclines and major declines from the segmented time series and use the major inclines and major declines in performing alignment.
18. The device as claimed in claim 1.7 wherein the selected major inclines and the major declines exclude jump segments.
19. The device as claimed in claim 15 and programmed or configured to compare major inclines and declines of the query video with major inclines and major declines of the target video to obtain approximate alignments with consecutive matched inclines and declines.
20. The device as claimed in claim 19 and programmed or configured to match a query video sequence of major inclines/major declines against a target video sequence of major inclines/major declines.
21. The device as claimed in claim 20 and programmed or configured to perform matching by creating a matrix having cells with the query video sequence of major inclines/major declines plotted against a target video sequence of major inclines/major declines and, where a match exists, adding a marker in an appropriate cell of the matrix.
22. The device as claimed in claim 21 and programmed or configured to extend the major inclines/major declines to neighbour non-major inclines/non-major declines after aligning major inclines/major declines.
23. The device as claimed in claim 22 and programmed or configured to identify diagonal lines of consecutive cells having markers and retaining those diagonal lines having a length that is greater than a given threshold for additional alignment processing.
24. The device as claimed in claim 23 and programmed or configured to select the K longest diagonal lines and attempt to join closely located segments included in the top K diagonal lines to form a longer segment.
25. The device as claimed in claim 24 and programmed or configured to award reward scores to diagonal matched lines and penalty scores to gaps in longer lines and, when the combined score of a linked approximate alignment exceeds a given score threshold, to check if previously ignored initial short aligned segments around the linked segments can be joined to form an approximate alignment, and to select the local approximate alignments having final scores exceeding a final score threshold for further examination.
26. The device as claimed in claim 16 and programmed or configured to obtain an approximate alignment of segments to select a set of possible successful alignments and then applying a Smith- Waterman algorithm to the selected set.
27. The device as claimed in claim 26 and programmed or configured to perform alignment at frame level for approximate aligned segments not included in the selected set.
28. The device as claimed in claim 1 and programmed or configured to store the query video in a video database holding the target video when the query video is determined not to be a duplicate of the target video.
29. A data storage medium storing a machine-executable program for perforating a method of managing video content including the steps of: partitioning frames of the query video and frames of the target video into blocks;
calculating the mean intensity value for each block;
producing a plurality of query time series for the query video, each query time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the query video;
producing a plurality of target time series for the target video, each target time series representing temporal variation in mean intensity value for blocks from the same location in different frames of the target video; and
determining if alignment exists between the query video and the target video using the query time series and the target time series.
PCT/IB2012/000269 2011-01-07 2012-01-04 Method and apparatus for comparing videos WO2012093339A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2013547935A JP5685324B2 (en) 2011-01-07 2012-01-04 Method and apparatus for comparing pictures
EP12711444.5A EP2661710A2 (en) 2011-01-07 2012-01-04 Method and apparatus for comparing videos
CN201280011854.9A CN103430175B (en) 2011-01-07 2012-01-04 For the method and apparatus that video is compared
KR1020137017739A KR101556513B1 (en) 2011-01-07 2012-01-04 Method and apparatus for comparing videos

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US12/986,728 US8731292B2 (en) 2011-01-07 2011-01-07 Method and apparatus for comparing videos
US12/986,728 2011-01-07
US13/012,516 US8849044B2 (en) 2011-01-24 2011-01-24 Method and apparatus for comparing videos
US13/012,516 2011-01-24

Publications (2)

Publication Number Publication Date
WO2012093339A2 true WO2012093339A2 (en) 2012-07-12
WO2012093339A3 WO2012093339A3 (en) 2012-08-30

Family

ID=45922716

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2012/000269 WO2012093339A2 (en) 2011-01-07 2012-01-04 Method and apparatus for comparing videos

Country Status (5)

Country Link
EP (1) EP2661710A2 (en)
JP (1) JP5685324B2 (en)
KR (1) KR101556513B1 (en)
CN (1) CN103430175B (en)
WO (1) WO2012093339A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738173A (en) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 Video clip detection method and device, electronic equipment and storage medium
CN114972809A (en) * 2021-02-19 2022-08-30 株式会社理光 Method, apparatus, and computer-readable storage medium for video processing
US11593582B2 (en) 2018-03-29 2023-02-28 Beijing Bytedance Network Technology Co., Ltd. Method and device for comparing media features
CN116939267A (en) * 2023-09-14 2023-10-24 腾讯科技(深圳)有限公司 Frame alignment method, device, computer equipment and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103686345B (en) * 2013-12-18 2017-01-11 北京航天测控技术有限公司 Video content comparing method based on digital signal processor
CN104079924B (en) * 2014-03-05 2016-05-18 北京捷成世纪科技股份有限公司 Detection method and device that a kind of video mistake is broadcast
KR101709085B1 (en) * 2015-12-16 2017-02-23 서강대학교산학협력단 Shot Boundary Detection method and apparatus using Convolutional Neural Networks
JP6495219B2 (en) * 2016-10-19 2019-04-03 日本電信電話株式会社 Video detection apparatus, method, and program
CN110324549B (en) * 2018-03-28 2022-05-13 沈阳美行科技股份有限公司 Video recording method, device and equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819286A (en) * 1995-12-11 1998-10-06 Industrial Technology Research Institute Video database indexing and query method and system
US7532804B2 (en) * 2003-06-23 2009-05-12 Seiko Epson Corporation Method and apparatus for video copy detection
KR100811835B1 (en) 2006-10-25 2008-03-10 주식회사 에스원 Method for extracting moving image features and content-based moving image searching method using the extracting method
JP4916950B2 (en) * 2007-05-14 2012-04-18 ヤフー株式会社 Moving image comparison apparatus, moving image comparison method, and moving image comparison program
US9177209B2 (en) * 2007-12-17 2015-11-03 Sinoeast Concept Limited Temporal segment based extraction and robust matching of video fingerprints
WO2010078629A1 (en) * 2009-01-12 2010-07-15 The University Of Queensland A system for real time near-duplicate video detection
GB0901262D0 (en) * 2009-01-26 2009-03-11 Mitsubishi Elec R&D Ct Europe Video identification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
None
See also references of EP2661710A2

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11593582B2 (en) 2018-03-29 2023-02-28 Beijing Bytedance Network Technology Co., Ltd. Method and device for comparing media features
CN111738173A (en) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 Video clip detection method and device, electronic equipment and storage medium
CN114972809A (en) * 2021-02-19 2022-08-30 株式会社理光 Method, apparatus, and computer-readable storage medium for video processing
CN116939267A (en) * 2023-09-14 2023-10-24 腾讯科技(深圳)有限公司 Frame alignment method, device, computer equipment and storage medium
CN116939267B (en) * 2023-09-14 2023-12-05 腾讯科技(深圳)有限公司 Frame alignment method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
KR101556513B1 (en) 2015-10-02
EP2661710A2 (en) 2013-11-13
JP5685324B2 (en) 2015-03-18
KR20130108427A (en) 2013-10-02
CN103430175A (en) 2013-12-04
CN103430175B (en) 2016-12-28
WO2012093339A3 (en) 2012-08-30
JP2014506366A (en) 2014-03-13

Similar Documents

Publication Publication Date Title
US8849044B2 (en) Method and apparatus for comparing videos
US8731292B2 (en) Method and apparatus for comparing videos
WO2012093339A2 (en) Method and apparatus for comparing videos
US8335786B2 (en) Multi-media content identification using multi-level content signature correlation and fast similarity search
KR101517750B1 (en) Methods and apparatus for comparing videos
Barrios et al. Competitive content-based video copy detection using global descriptors
JP5479340B2 (en) Detect and classify matches between time-based media
JP4990383B2 (en) Image group expression method, image group search method, apparatus, computer-readable storage medium, and computer system
CN106557545B (en) Video retrieval method and device
CN102682024A (en) Method for recombining incomplete JPEG file fragmentation
KR101634395B1 (en) Video identification
KR101111046B1 (en) A Similar Video Search System through Object Detection Information and A Method thereof
US9135509B2 (en) Determining representative images for a video
Chiu et al. Efficient video segment matching for detecting temporal-based video copies
Benini et al. Identifying video content consistency by vector quantization
Chou et al. Near-duplicate video retrieval by using pattern-based prefix tree and temporal relation forest
Dang et al. A video copy detection algorithm based on two-level features measure
Li et al. Segment Oriented Search (SOS) Method for TV Repeats Detection
Chaisorn et al. A Hierarchical Filtering Approach for Copy Detection in Video Sharing Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12711444

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2012711444

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2013547935

Country of ref document: JP

Kind code of ref document: A

Ref document number: 20137017739

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE