[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112291616A - Video advertisement identification method, device, storage medium and equipment - Google Patents

Video advertisement identification method, device, storage medium and equipment Download PDF

Info

Publication number
CN112291616A
CN112291616A CN202010902794.6A CN202010902794A CN112291616A CN 112291616 A CN112291616 A CN 112291616A CN 202010902794 A CN202010902794 A CN 202010902794A CN 112291616 A CN112291616 A CN 112291616A
Authority
CN
China
Prior art keywords
advertisement
column
matrix
arithmetic progression
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010902794.6A
Other languages
Chinese (zh)
Other versions
CN112291616B (en
Inventor
朱永亮
尹海沧
马文闯
刘利
刘殿龙
曹明阔
熊浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Peacetech Co ltd
Original Assignee
Potevio Peacetech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Potevio Peacetech Co ltd filed Critical Potevio Peacetech Co ltd
Priority to CN202010902794.6A priority Critical patent/CN112291616B/en
Publication of CN112291616A publication Critical patent/CN112291616A/en
Application granted granted Critical
Publication of CN112291616B publication Critical patent/CN112291616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The scheme discloses a video advertisement identification method, a device, a medium and equipment, and the method comprises the following steps: performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B; transpose matrix A of first fingerprint feature vector matrix ATPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C; forming a data group with the row number being an arithmetic progression row by using the element bits which are larger than the first threshold value in the comparison matrix C; and under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column. The scheme is reducedThe fluctuation interference of the low engineering application scene enables the searching of the arithmetic progression to be more flexible, the recognition rate to be higher and more stable, and the positioning of the advertisement playing time to be more accurate.

Description

Video advertisement identification method, device, storage medium and equipment
Technical Field
The invention relates to the technical field of video processing. And more particularly, to an advertisement video identification method, apparatus and device.
Background
With the development of television videos, video advertisements are continuously introduced into video videos played daily. The traditional television station implants more advertisements intermittently through the played television videos, the playing of the advertisements can seriously affect the audio-visual experience of users, and the users can consume more time on the advertisements which are not interested.
In the traditional video advertisement identification method, audio and video MP4 files recorded 24 hours a day are decoded frame by frame, and one frame of image is taken as a subset of the image to be detected at a fixed integer M frame interval. And (3) decoding a certain advertisement video file which is manually cut and verified in an advertisement material library frame by frame, and taking one frame of image as a template image subset at intervals of fixed integer N frames. And then based on image quality evaluation criteria such as SAD, PSNR, SSIM and the like, obtaining the corresponding relation of the most similar images. In an ideal situation, with the increment of the image frame number of the "image subset to be detected" (the first column of data in fig. 1), the image frame number of the "template image subset" (the fourth column of data in fig. 1) may form an "arithmetic difference number sequence", and the corresponding relationship between the images (the third column of data in fig. 1) is found according to the principle of the highest similarity between the images. If the arithmetic progression can be continuously prolonged, the start playing time and the end playing time of the video advertisement can be accurately acquired, the start playing time is the playing time of the image frame to be detected in the first data of the arithmetic progression, the time unit is millisecond, the end playing time is the playing time of the image frame to be detected in the last data of the arithmetic progression, the time unit is millisecond, based on the accurately acquired start playing time and end playing time of the video advertisement, whether the advertisement exists in the video MP4 file to be detected, and the start time and the end time (the time is accurate to millisecond) of each playing of the advertisement can be realized.
However, the advertisement identification result obtained by the method is not accurate enough, the missing identification is high, and the workload of manual review cannot be effectively reduced.
Disclosure of Invention
One object of the present solution is to provide a method, an apparatus, a storage medium, and a device for fast identification of a video advertisement segment that is repeatedly played.
Another object of the present solution is to provide a device and an apparatus for performing the above recognition method.
In order to achieve the purpose, the scheme is as follows:
in a first aspect, the present disclosure provides a video advertisement recognition method, including:
performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B;
performing cosine similarity calculation on the transposed matrix AT of the first fingerprint characteristic vector matrix A and the second fingerprint characteristic vector matrix B to obtain a comparison matrix C;
forming a data group with the row number being an arithmetic progression row by using the element bits which are larger than the first threshold value in the comparison matrix C;
and under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column.
In a preferred embodiment, the constructing step of the image subset to be detected includes:
taking M frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of n from an originally recorded to-be-detected video file;
and intercepting each image in the image subset to be processed to obtain an image of a central m area in each image, and forming the image subset to be detected.
In a preferred embodiment, the constructing step of the image subset to be detected includes:
taking 12 frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of 360x288 from an originally recorded to-be-detected video file with the resolution of 720x 576;
and intercepting each image in the image subset to be processed, acquiring an image with the resolution of 224x224 in the middle area of each image, and forming the image subset to be detected.
In a preferred embodiment, the constructing of the subset of advertisement template images comprises:
and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals.
In a preferred embodiment, the step of performing cosine similarity calculation on the transposed matrix AT of the first fingerprint feature vector matrix a and the second fingerprint feature vector matrix B to obtain the comparison matrix C includes:
the matrix A is an X row and an A column, and the matrix B is a B row and an X column;
multiplying a first column of a transpose matrix AT of the first fingerprint characteristic vector matrix A with each element bit in a first row of a second fingerprint characteristic vector matrix B, and then summing to obtain a first parameter value p 11;
performing modulo operation on a first row of the second fingerprint eigenvector matrix B and a first column of the transposed matrix AT of the first fingerprint eigenvector matrix A respectively to obtain a second parameter value q11 and a third parameter value r 11;
based on the first parameter value p11, the second parameter value q11 and the third parameter value r11, obtaining a cosine similarity value t11, wherein t11 is p11/q11/r11, and the value of t11 is used as the value of the element bit of the first row and the first column of the comparison matrix C;
based on the above steps, values of t12, t13 … … t1a, up to tba are calculated similarly, and correspond to values of each element bit as the alignment matrix C.
In a preferred example, the step of forming the data group with the column number of the arithmetic progression column by using the element bits of the alignment matrix C larger than the first threshold includes:
comparing the value of the element bit with a first threshold value from the element bit of a first row and a first column in a comparison matrix C, searching the element bit of which the value is greater than the first threshold value, and taking the value of the element bit as a first expected value;
if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s ═ r + (j-i) × K; k is M/N, M is the frame interval for extracting the originally recorded to-be-detected video file image, and N is the frame interval for extracting the advertisement video template file image;
and (4) forming a data set with the column numbers of the rows as arithmetic number rows by using all the expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic number rows is K.
In a preferred embodiment, if the element bit of the ith row and the r column finds the first expected value, the next expected value is not found in the s column of the jth row;
searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm;
if the next expected value is found in the preset range, continuously searching the next expected value by taking the row where the expected value is located as a reference;
and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found.
In a preferred example, the predetermined range is within a region of s ± 20% K.
In a preferred embodiment, the method further comprises the steps of: and under the condition that the data group with the column number of the arithmetic progression is broken, processing the data fragments at the broken part, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression.
In a preferred example, in the case that there is an arithmetic progression fracture in the data set with the column number of the arithmetic progression, the step of processing the data fragment at the fracture site includes:
if the number of the data rows with the column numbers of the arithmetic progression is smaller than a third threshold value and the distance between the data rows with the column numbers of the arithmetic progression which appears again and is larger than K rows, determining the data rows with the column numbers of the arithmetic progression which are smaller than the third threshold value as fragment segments, and discarding the fragment segments; and/or the presence of a gas in the gas,
and judging the relation between the time length corresponding to the gap between two adjacent sets of row numbers with the arithmetic progression and the time total length of the advertisement template, if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the time total length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two sets of row numbers with the arithmetic progression into a data set.
In a preferred example, the step of determining the video advertisement in the video file to be detected based on the data set with the column number as the arithmetic progression comprises:
determining image frames in the video file to be detected corresponding to the first expected value and the last expected value in the data group;
searching a display timestamp corresponding to the image frame based on the corresponding position serial numbers of the image frame corresponding to the first expected value and the image frame corresponding to the last expected value in the video file to be detected;
and determining the starting and stopping milliseconds of the image in the video file to be detected corresponding to the data group, namely the total milliseconds of the matched advertisements according to the time stamp.
In a preferred example, the step of determining the video advertisement in the video file to be detected based on the data set further includes:
comparing the matched total seconds of the advertisements with a second threshold value;
if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.
In a second aspect, the present disclosure provides a video advertisement recognition apparatus, including:
the extraction unit is used for extracting the characteristics of the advertisement template image subset constructed according to the advertisement video template file and the image subset to be detected constructed according to the video file to be detected to obtain a first fingerprint characteristic vector matrix A and a second fingerprint characteristic vector matrix B;
the computing unit is used for performing cosine similarity computation on the transposed matrix AT of the first fingerprint characteristic vector matrix A and the second fingerprint characteristic vector matrix B to obtain a comparison matrix C;
the data group building unit is used for forming a data group with the row number of an arithmetic progression row by the element bits which are larger than the first threshold value in the comparison matrix C;
and the identification unit is used for determining the video advertisement in the video file to be detected based on the data group under the condition that the data group with the column number of the arithmetic progression column has no arithmetic progression column fracture.
In a third aspect, the present solution provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above-mentioned advertisement video identification method.
In a fourth aspect, the present solution provides an apparatus comprising: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the storage medium stores instructions for executing the steps of the advertisement video identification method.
The scheme has the following beneficial effects:
the advertisement identification method based on the image fingerprints reduces fluctuation interference of engineering application scenes, enables searching of an arithmetic progression to be more flexible, is higher and stable in identification rate, and is more accurate in positioning of advertisement playing time.
Drawings
In order to illustrate the implementation of the solution more clearly, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the solution, and that other drawings may be derived from these drawings by a person skilled in the art without inventive effort.
FIG. 1 is a schematic diagram showing an example of a data group with equal difference column number according to the present embodiment
FIG. 2 is a schematic processing flow diagram illustrating a video advertisement recognition method according to the present embodiment;
FIG. 3 is a diagram illustrating a comparison of fingerprint frames;
FIG. 4 is a schematic diagram showing an example of an alignment matrix according to the present embodiment;
fig. 5 shows a schematic diagram of a video advertisement recognition device according to the present scheme;
FIG. 6 shows a schematic diagram of an apparatus according to the present solution;
fig. 7 shows a schematic diagram of the recognition accuracy of the advertisement recognition by using the method of the present invention.
Detailed Description
Embodiments of the present solution will be described in further detail below with reference to the accompanying drawings. It is clear that the described embodiments are only a part of the embodiments of the present solution, and not an exhaustive list of all embodiments. It should be noted that, in the present embodiment, features of the embodiment and the embodiment may be combined with each other without conflict.
In the traditional video advertisement identification method, audio and video MP4 files recorded 24 hours a day are decoded frame by frame, and one frame of image is taken as a subset of the image to be detected at a fixed integer M frame interval. And (3) decoding a certain advertisement video file which is manually cut and verified in an advertisement material library frame by frame, and taking one frame of image as a template image subset at intervals of fixed integer N frames. And then based on image quality evaluation criteria such as SAD, PSNR, SSIM and the like, obtaining the corresponding relation of the most similar images. In an ideal situation, with the increment of the image frame number of the "image subset to be detected" (the first column of data in fig. 1), the image frame number of the "template image subset" (the fourth column of data in fig. 1) may form an "arithmetic difference number sequence", and the corresponding relationship between the images (the third column of data in fig. 1) is found according to the principle of the highest similarity between the images. If the arithmetic progression can be continuously prolonged, the start playing time and the end playing time of the video advertisement can be accurately acquired, the start playing time is the playing time of the image frame to be detected in the first data of the arithmetic progression, the time unit is millisecond, the end playing time is the playing time of the image frame to be detected in the last data of the arithmetic progression, the time unit is millisecond, based on the accurately acquired start playing time and end playing time of the video advertisement, whether the advertisement exists in the video MP4 file to be detected, and the start time and the end time (the time is accurate to millisecond) of each playing of the advertisement can be realized.
Through research and analysis of the existing television advertisement video identification method, the following interference factors exist in practical application:
(1) the input signal recorded by the acquisition card is a CVBS analog audio and video signal output by the set top box, and the gray value of the image pixel is slightly changed through operations such as analog-to-digital conversion, video coding and the like, so that the fingerprint characteristic vector of the image is slightly changed;
(2) the artificially cut advertisement template file requires the start/end position of the advertisement segment to be accurate to the image frame level, so that the originally recorded material file is required to be decoded, positioned to the specific image frame position and then re-encoded. In the stage of extracting fingerprints from the advertisement template, the encoding types (I frame and P frame) of the frame image are different from those of the original recorded material, so that the gray value of the pixel of the image is slightly changed, and the fingerprint feature vector of the image is slightly changed;
(3) the frame interval value M is not equal to N, so that the image picture content between the image subset to be detected and the template image subset is inconsistent, and the fingerprint characteristic vectors of the images are obviously different;
(4) due to the strategy of broadcasting the advertisement by the television station, after the advertisement template segment is broadcasted for a certain day, the local change of the advertisement picture content within a few seconds can be made, so that the novelty of the advertisement is increased, and the attention of consumers is attracted. Therefore, the content of the advertisement picture played at the later stage is greatly changed in the local image frame, and the fingerprint feature vector of the image is greatly changed;
(5) due to the fact that the resolution of original recorded material MP4 files is different, the difference of fingerprint vectors of images is not outstanding enough for two different advertisements with the same background;
(6) the start time and the end time of the advertisement play are required to be accurate to the image frame, i.e., millisecond level. However, the traditional method of directly calculating the image frame number (for example, the frame number obtained by using OpenCV decoding) and the video frame rate has a large error, and the accumulated error of the MP4 file, which is a raw material recorded in one day, reaches tens of seconds, which cannot meet the demand.
Due to the existence of the interference factors from (1) to (6), large errors and fluctuation exist, the equal difference series is often forced to be interrupted, a large amount of fragmentation phenomena exist, the obtained advertisement identification result is not accurate enough, the missing identification is high, and the workload of manual review cannot be effectively reduced.
Therefore, the scheme aims to provide a video advertisement identification method, a fingerprint feature vector matrix of an image is extracted based on a VGG19 deep learning network model (VGG, Visualgeometry Group), and a comparison matrix C is obtained by using cosine similarity as a basis for comparing the fingerprint feature vector matrix, so that the calculation is simplified; and forming a data group with the column number of an arithmetic progression column by using the element bits which are larger than a preset threshold value in the comparison matrix C, and finally determining the video advertisement in the video file to be detected by using the data group with the column number of the arithmetic progression column.
Hereinafter, a video advertisement recognition method proposed by the present solution is described in detail with reference to fig. 2 to 4. The method may comprise the steps of:
s1, constructing a to-be-detected image subset and an advertisement template image subset;
step S2, extracting the characteristics of a template image subset constructed according to the advertisement video template file and an image subset to be detected constructed according to the video file to be detected to obtain a first fingerprint characteristic vector matrix A and a second fingerprint characteristic vector matrix B;
step S3, transpose matrix A of first fingerprint feature vector matrix ATPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;
step S4, forming a data group with the row number being an arithmetic progression by the element bits which are larger than the first threshold value in the comparison matrix C;
step S5, under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression;
and step S6, under the condition that the data group with the column number of the arithmetic progression has the arithmetic progression fracture, processing the data fragment at the fracture part, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression.
When the scheme identifies the video advertisement, it needs to prepare the image subset to be detected and the advertisement template image subset, that is, the scheme constructs the image subset to be detected and the advertisement template image subset in step S1.
The construction mode of the image subset to be detected can be as follows: taking M frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of n from an originally recorded to-be-detected video file; and intercepting each image, acquiring an image of an m area in each image area, and forming an image subset to be detected. In one embodiment, with 12 frames as an extraction interval, performing cubic convolution down-sampling processing on a to-be-detected video file with an original recording resolution of 720x576 to obtain a to-be-processed image subset with a resolution of 360x 288; and intercepting each image in the image subset to be processed, acquiring an image of a 224x224 area in the middle of each image, and forming the image subset to be detected. The image subset to be detected formed in the mode is used as input data, the fingerprint feature vectors of the obtained images are extracted, original image information can be utilized to a greater extent, the utilization rate of image texture content is improved from 12% (namely 224x224/720/576) to 48% (namely 224x224/360/288), different advertisement templates with the same picture background in a large range are effectively solved, and the difference of the fingerprint feature vectors of the images is more obvious.
The advertisement template image subset may be constructed in the following manner: and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals. In one embodiment, the subset of advertisement template images is obtained from an advertisement video template file that is manually cut from the advertisement material at 2 frame extraction intervals.
In step S2, feature extraction may be performed on the advertisement template image subset and the to-be-detected image subset by using a VGG19 deep learning network model, so as to obtain a first fingerprint feature vector matrix a and a second fingerprint feature vector matrix B. The matrix A is X rows and A columns, and the matrix B is B rows and X columns. In one embodiment, the matrix a may be a matrix with a rows and 512 columns, and the matrix B may be a matrix with B rows and 512 columns.
In the scheme, a one-to-many mapping mode is adopted, cosine similarity calculation is carried out on the fingerprint characteristic vector of each line in the image subset to be detected and all fingerprint characteristic vectors corresponding to the advertisement template image subset, so that the image to be detected is matched in the advertisement template image, and the position serial number of the matched frame fingerprint is found. The position sequence number is a sequence number corresponding to each image in the image subset, namely a sequence index number. In the above scheme, the original material image sequence is subjected to frame extraction at equal intervals by M frames, and the images in the obtained image subset have sequential serial numbers, for example, the images are numbered by 0,1, 2, and 3. The original recorded file is analyzed by using the FFProbe software commonly used in the industry, a corresponding table can be obtained, and for the image frames with the sequence numbers of 0,1, 2 and 3, the Time0, Time1, Time2 and Time3 which are accurate to the millisecond level can be obtained. As the precision requirement of the user on the time positioning needs to reach the millisecond level, the precision requirement of the millisecond level can be more accurately and reliably reached by the method.
In step S3, transpose matrix a of first fingerprint feature vector matrix a is formedTAnd performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C. Specifically, the transposed matrix A of the first fingerprint feature vector matrix A is firstly usedTIs multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter value p11(ii) a Respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix ATObtaining a second parameter value q by modulo the first column of11And a third parameter value r11(ii) a Based on the first parameter value p11A second parameter value q11And a value of a third parameter r11To calculate the cosine similarity value t11,t11=p11/q11/r11Will t11The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C; continue to calculate t in the same way12,t13……t1aUp to tbaAnd corresponding to the value of each element bit of the comparison matrix C. In this way, the alignment matrix C is constructed. The element of the ith row and the jth column of the comparison matrix C is the cosine similarity between the fingerprint feature vector of the ith frame in the image subset to be detected and the fingerprint feature vector of the jth frame in the advertisement template image subset. In the scheme, the calculation formula of the cosine similarity is as follows: where a is a column of the matrix a and B is a row of the matrix B.
Wherein, t is as defined above11,t12,t13……t1aIs the result of cos θ calculation, where the result of cos θ is a matrix, and the comparison matrix C is developed as follows:
t11,t12,t13……t1a
t21,t22,t23……t2a
t31,t32,t33……t3a
……
tb1,tb2,tb3……tba
the expanded alignment matrix C, if represented by a column vector, is (t)1,t2,t3……ta)
Therefore, cos θ ═ a ═ b/(| a | b |) is a calculation formula, and (t ═ b |) is a calculation formula1,t2,t3……ta) Is a column vector representation of cos θ.
In the scheme, the numerical value of each element bit in the comparison matrix C is compared with a first threshold value, so that a data group with the row number of an arithmetic sequence is established. In step S4, the element bits in the comparison matrix C larger than the first threshold are combined into a data set with the row number being the arithmetic progression. Specifically, starting from the element bit in the first row and the first column in the comparison matrix C, comparing the value of the element bit with a first threshold, searching for the element bit with the value of the element bit larger than the first threshold, and taking the value of the element bit as a first expected value; if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s ═ r + (j-i) × K; k is M/N, M is the frame interval for extracting the originally recorded to-be-detected video file image, and N is the frame interval for extracting the advertisement video template file image; and (4) forming a data set with the column numbers of the rows as arithmetic number rows by using all the expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic number rows is K.
Ideally, starting from a certain row of the comparison matrix C, a value greater than the first threshold can be found in a specific column of each subsequent row, and all values greater than the first threshold are combined into a data set with an arithmetic sequence number, where K is M/N, M is a frame interval for extracting the originally recorded to-be-detected video file image, and N is a frame interval for extracting the advertisement video template file image. However, since the arithmetic sequence of the column numbers is interrupted due to fluctuation of data in reality, it is necessary to deal with the problem of interruption by some algorithm so that the column numbers which are supposed to be the arithmetic sequence in an ideal state are continued.
Specifically, in step S4, if the first expected value is found in the element bit of the ith row and the nth column, the next expected value is not found in the S column of the jth row; searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm; if the next expected value is found in the preset range, continuously searching the next expected value by taking the row where the expected value is located as a reference; and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found. In a preferred embodiment, the predetermined range may be within the region of s ± 20% K.
In the scheme, in order to ensure the accuracy of advertisement identification, it is required to ensure that the video advertisement is identified when the data group with the column number of the arithmetic progression is not broken. Therefore, if the constructed data set with the column number of the arithmetic progression is not broken, step S5 can be directly executed, that is, in the case that the data set with the column number of the arithmetic progression is not broken, the video advertisement in the video file to be detected can be determined based on the data set with the column number of the arithmetic progression. If the data set with the column number of the arithmetic progression has the condition of arithmetic progression fracture, the step S6 is further executed, that is, in the case that the data set with the column number of the arithmetic progression has the condition of arithmetic progression fracture, the data fragment at the fracture is processed, the data set with the column number of the arithmetic progression is continued, and the video advertisement in the video file to be detected is determined based on the continued data set with the column number of the arithmetic progression.
Specifically, whether the data line is a fragment may be determined according to the length of the judgment fragment, for example, if the number of data lines with column numbers of arithmetic progression is less than a third threshold and is more than K lines away from the data line with column numbers of arithmetic progression which appears again, the data line with column numbers of arithmetic progression which is less than the third threshold is determined as the fragment, and the fragment is discarded. And if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the total time length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two groups of columns being the arithmetic progression into a data set.
In one embodiment, if the data row with the column number of the arithmetic progression continues for only the first three rows, for example, the data row with the column number of the arithmetic progression does not appear continuously from the back of only 99840-99842, the data row with the column number of the arithmetic progression continues for a long time, for example, the first column of data with the column number of the arithmetic progression appears next time may be 130000. Then the three rows 99840-99842 become three "isolated" rows of data with very short column numbers of arithmetic sequence, and it is considered that these rows of data may be a scene in the television show which is accidentally similar to a certain advertisement, and are not the target of the desired search, therefore, these three isolated fragment segments are regarded as "fragments" which should be discarded.
In another embodiment, for example, for a tv station in a province, an advertisement template is usually extremely long in duration, such as more than 40 minutes for advertisements in healthcare products, medicines, medical instruments, etc. In the process of advertisement playing, for example, when meeting an hour, a television station inserts a hour-hour short video of 5 seconds. For another example, in order to avoid the advertisement identification of the AI artificial intelligence, the advertisement publisher temporarily adds a public service advertisement with a length of 10 seconds after playing the advertisement for 20 minutes each time, and then continues playing the advertisement. For the two application scenarios, the 5 second hour of the 40 minutes and the 10 second public service advertisement of the 20 minutes are both corresponding to the intermittent or transient disconnection phenomenon in the data row with long row number being the arithmetic progression. This intermittent or transient disconnection can be ignored at this point and the advertisement is considered to be still playing continuously. Therefore, in this case, two sets of data sets having the equal-difference number of column numbers can be integrated into one set of data set, so that the influence of the intermittent or transient disconnection phenomenon is not exerted. In a preferred embodiment, if the ratio of the time length corresponding to the disconnection phenomenon to the time length of the advertisement itself does not exceed 10%, the advertisement can be considered to be still in continuous play.
In the scheme, after the data group with the column number being the arithmetic progression column is determined not to have the fracture condition, the step of identifying the video advertisement can be continuously executed. Specifically, image frames in the video file to be detected corresponding to a first expected value and a last expected value in the data group are determined; searching a display time stamp PTS corresponding to the image frame based on the corresponding position serial numbers of the image frame corresponding to the first expected value and the image frame corresponding to the last expected value in the video file to be detected; and determining the starting and stopping seconds of the image in the video file to be detected corresponding to the data group, namely the total seconds of the matched advertisement according to the time stamp. The number of seconds to start and stop of the image can be accurate to the millisecond level.
In the scheme, in order to further improve the accuracy of advertisement identification, the total seconds of the obtained advertisement can be verified by setting a second threshold. Specifically, the matched total seconds of the advertisement is compared with a second threshold value; if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.
By the method, the interference caused by data fluctuation can be effectively reduced, so that the identification rate and the stability of the advertisement in the video are improved.
As shown in fig. 5, the present solution further provides a video advertisement recognition apparatus 101 implemented in cooperation with the video advertisement recognition method, the apparatus including: a first image set constructing unit 102, a second image set constructing unit 103, an extracting unit 104, a calculating unit 105, an identifying unit 107 and a compensating unit 108.
When the video advertisement recognition device 101 works, the first image set construction unit 102 is used for constructing the image subset to be detected, and the second image set construction unit 103 is used for constructing the advertisement template image subset. Feature extraction is performed on the advertisement template image subset and the image subset to be detected through an extraction unit 104 based on a VGG19 deep learning network model, so that a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B are obtained. Using a computing unit 105 to transpose a of the first fingerprint feature vector matrix aTAnd performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C. And then, a data group with the row number of an arithmetic progression row is formed by using the data group building unit to compare the element bits which are larger than the first threshold value in the matrix C. In case there is no break in the data set, the video advertisement in the video file to be detected is determined by the recognition unit 107 based on the data set. When the data set has a fracture, the data fragment at the fracture position needs to be further processed by a compensation unit when the data set with the column number of the arithmetic progression has an arithmetic progression fracture, and the data set with the column number of the arithmetic progression is continued; after the compensation is completed, the identification unit 107 determines the video advertisement in the video file to be detected based on the data set of which the supplemented column number is the arithmetic progression column.
On the basis of the above data acquisition method embodiment, the present solution further provides a computer-readable storage medium. The computer-readable storage medium is a program product for implementing the above-described data acquisition method, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAvA, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
On the basis of the embodiment of the data acquisition method, the scheme further provides the electronic equipment. The electronic device shown in the drawing SSS is only an example, and should not bring any limitation to the function and the range of use of the embodiment of the present invention.
As shown in fig. 6, the electronic device 201 is in the form of a general purpose computing device. The components of the electronic device 201 may include, but are not limited to: at least one memory unit 202, at least one processing unit 203, a display unit 204 and a bus 205 for connecting different system components.
Wherein the storage unit 202 stores a program code, which can be executed by the processing unit 203, such that the processing unit 203 performs the steps of the various exemplary embodiments described in the above data acquisition method. For example, the processing unit 203 may perform the steps as shown in fig. 1.
The memory unit 202 may include volatile memory units such as a random access memory unit (RAM) and/or a cache memory unit, and may further include a read only memory unit (ROM).
The storage unit 202 may also include programs/utilities with program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The bus 205 may include a data bus, an address bus, and a control bus.
The electronic device 201 may also communicate with one or more external devices 207 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 206. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 201, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The present solution is further illustrated by way of example below.
In the embodiment, for an original recorded MP4 file to be detected, a subset of the image to be detected is extracted by taking M as 12; the video advertisement identification method in this example is described in detail for the artificially cut advertisement MP4 file, taking N-2 extraction template image subset as an example.
In this example, for an original 720x 576-resolution image, an INTER _ CUBIC interpolation is used to obtain a down-sampled 360x 288-resolution image, and then an intermediate 224x224 region is intercepted as input data of the VGG19 deep learning network model, so as to extract a fingerprint feature vector of the image. By the method, the original image information can be fully utilized, the utilization rate of the image texture content is improved from 12% (namely 224x224/720/576) to 48% (namely 224x224/360/288), different advertisement templates with the same picture background in a large range are effectively solved, and the difference of the fingerprint feature vectors of the images is more obvious.
And taking one frame from every 2 frames in the video advertisement template file, and taking a frames together to form an advertisement template image subset. And taking the advertisement template image subset as input data of the VGG19 deep learning network model, extracting fingerprint characteristic vectors of the images, and obtaining a first fingerprint characteristic vector matrix A. Extracting fingerprint feature vectors for each frame, wherein each fingerprint feature vector is a data row containing 512 values, and finally obtaining a matrix of a × 512, which is called a matrix A; one frame is taken from each M frames of the video to be detected, M is an integral multiple of the N, for example, M is 6N and is 12, and it is assumed that b frames are taken from the video file together to form an image subset to be detected. And taking the to-be-detected image subset as input data of the VGG19 deep learning network model, extracting the fingerprint characteristic vector of the image, and obtaining a second fingerprint characteristic vector matrix B. Extracting fingerprints for each frame, wherein each fingerprint is a data row containing 512 values, and finally obtaining a matrix B × 512, which is called a matrix B;
and matching the image to be detected in the advertisement template image based on the fingerprint characteristic vector, and searching the serial number of the matched frame fingerprint.
Firstly, the matrix A needs to be transposed to obtain the matrix ATThen A isTIs a matrix of 512 rows and a columns, then the matrices B and ATAnd performing cosine similarity operation to obtain a comparison matrix C of b rows and a columns. Specifically, a transposed matrix A of a first fingerprint feature vector matrix A is formedTIs multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter value p11(ii) a Then, the first row of the matrix B is subjected to modulus calculation to obtain a second parameter value q11Transpose matrix ATIs modulo to obtain a third parameter value r11(ii) a Based on the first parameter value p11A second parameter value q11And a value of a third parameter r11To calculate the cosine similarity value t11,t11=p11/q11/r11Will t11The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C; continue to calculate t in the same way12,t13……t1aUp to tbaAnd corresponding to the value of each element bit of the comparison matrix C. Wherein, the cosine similarity formula is: cos ═ a × b)/((| a | b |).
In this example, the value of the element bit in the ith row and the jth column of the comparison matrix C is the cosine similarity between the fingerprint feature vector of the ith frame of the video image to be detected and the fingerprint feature vector of the jth frame in the video image of the advertisement template. Searching a similarity value larger than a first threshold value T from a first row and a first column of a comparison matrix C to a b-th row and an a-th column of the comparison matrix C. In this example, the first threshold T is set to 0.85.
In an ideal case without influence of factors such as data toggling, a value greater than the first threshold T can be found in a specific column of each row, starting from a certain row in the alignment matrix C (where the first element bit greater than the first threshold is located), for example, the rows are listed as [100,15], [101,15+ (101-. However, in the actual recognition process, the data is toggled, which causes the series of the column number arithmetic progression to be interrupted, so that the arithmetic progression is required to be processed to realize the continuity of the arithmetic progression of the column number under the ideal condition. The specific treatment method is as follows:
if the first expected value is found in row i, column r, then the expected value should be found in column s of row j, where s is r + (j-i) K. If the expected position is not found, a search local minimum algorithm is adopted to expand the search range, namely, the desired value is searched within a certain range before and after s, and the search cost is minimum. In this example, it will be found within 20% (i.e., s ± 20% K), and if found, it is still considered to continue the arithmetic progression; if not, jump to the next line to look for, and loop until the appropriate expected value is found.
Due to the interference of data shifting, the row number arithmetic series are broken, and at the moment, the broken part needs to be processed. The specific mode is as follows:
the scrap disposal and fracture repair comprises:
1) particularly short fragments, such as fragments less than 3 in length, are discarded; specifically, if the data row with the column number of the arithmetic progression continues for only the first three rows, for example, the data row with the column number of the arithmetic progression does not appear continuously from the back of only 99840-99842, and the data row with the column number of the arithmetic progression continues to appear long away, for example, the first column of data with the column number of the arithmetic progression appearing next time may be 130000. Then the three rows 99840-99842 become three "isolated" rows of data with very short column numbers of arithmetic sequence, and it is considered that these rows of data may be a scene in the television show which is accidentally similar to a certain advertisement, and are not the target of the desired search, therefore, these three isolated fragment segments are regarded as "fragments" which should be discarded.
2) The sequence of numbers between two adjacent segments, whose gaps correspond to a time length not exceeding the advertising template time length by a certain proportion, for example 10%, is then filled. In particular, for example, for a tv station in a province, an advertisement template is usually extremely long in duration, and the advertisement duration in terms of health products, medicines, medical instruments, etc. is more than 40 minutes. In the process of advertisement playing, for example, when meeting an hour, a television station inserts a hour-hour short video of 5 seconds. For another example, in order to avoid the advertisement identification of the AI artificial intelligence, the advertisement publisher temporarily adds a public service advertisement with a length of 10 seconds after playing the advertisement for 20 minutes each time, and then continues playing the advertisement. For the two application scenarios, the 5 second hour of the 40 minutes and the 10 second public service advertisement of the 20 minutes are both corresponding to the intermittent or transient disconnection phenomenon in the data row with long row number being the arithmetic progression. This intermittent or transient disconnection can be ignored at this point and the advertisement is considered to be still playing continuously. Therefore, in this case, two sets of data sets having the equal-difference number of column numbers can be integrated into one set of data set, so that the influence of the intermittent or transient disconnection phenomenon is not exerted. In a preferred embodiment, if the ratio of the time length corresponding to the disconnection phenomenon to the time length of the advertisement itself does not exceed 10%, the advertisement can be considered to be still in continuous play.
3) Because different advertisement templates with similar contents exist, the matching of the segments possibly overlapped by the time period in the video to be detected can be determined according to the large time length, and the specific requirements of the user are met.
And finally, identifying the video advertisement according to the obtained data set of the column number arithmetic progression without fracture. Specifically, an image frame corresponding to a first expected value and an image frame corresponding to a last expected value in a column number arithmetic progression need to be determined, then a display time stamp PTS corresponding to the image frame is searched according to serial numbers corresponding to the two image frames, and starting and stopping seconds of images in a video file to be detected corresponding to a data set, namely the total number of seconds of matched advertisements, are determined according to the time stamps.
In this example, in order to improve the reliability of advertisement identification, the total seconds of the matched advertisement may be further compared with a second threshold; if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid. In this example, the second threshold value may be established based on accumulated empirical values.
In this example, in order to cooperate with the implementation of the video advertisement recognition method, there is also provided an apparatus for implementing the advertisement recognition method, the apparatus including: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the storage medium has stored therein instructions for carrying out the steps of the above-described method. The processor may be an 8-core 16-thread, main frequency 2.1GHz, L3 cache 11M, memory 32GB CPU chip. The memory can be an 8TB mechanical hard disk. The device may also be equipped with a display equipped with a GTX 1080Ti standalone graphics card. Experiments show that when the method is implemented by adopting the equipment, the occupancy rate of the CPU is only about 15%, the occupancy rate of the display card is only about 50%, and the maximum occupancy rate is not more than 85%. The achieved advertisement recognition effect is shown in table 1.
TABLE 1
Figure BDA0002660328690000191
Figure BDA0002660328690000201
It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims (10)

1. A method for identifying video advertisements, the method comprising the steps of:
performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B;
transpose matrix A of first fingerprint feature vector matrix ATPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;
forming a data group with the row number being an arithmetic progression row by using the element bits which are larger than the first threshold value in the comparison matrix C;
and under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column.
2. The method of claim 1, wherein the step of constructing the subset of images to be detected comprises:
taking 12 frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of 360x288 from an originally recorded to-be-detected video file with the resolution of 720x 576;
intercepting each image in the image subset to be processed, acquiring an image with the resolution of 224x224 in the middle area of each image, and forming an image subset to be detected;
the construction step of the advertisement template image subset comprises the following steps:
and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals.
3. The video advertisement recognition method of claim 1, wherein the video advertisement recognition method is further characterized byThe transpose matrix A of the first fingerprint feature vector matrix ATThe cosine similarity calculation is carried out with the second fingerprint feature vector matrix B, and the step of obtaining a comparison matrix C comprises the following steps:
the matrix A is an X row and an A column, and the matrix B is a B row and an X column;
multiplying the first column of the transpose matrix AT of the first fingerprint characteristic vector matrix A with each element bit in the first row of the second fingerprint characteristic vector matrix B, and then summing to obtain a first parameter value p11
Respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix ATObtaining a second parameter value q by modulo the first column of11And a third parameter value r11
Based on the first parameter value p11A second parameter value q11And a value of a third parameter r11To calculate the cosine similarity value t11,t11=p11/q11/r11Will t11The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C;
based on the steps, t is calculated in the same way12,t13……t1aUp to tbaAnd corresponding to the value of each element bit of the comparison matrix C.
4. The method of claim 1, wherein the step of forming the data set with the row number of arithmetic progression rows by using the element bits larger than the first threshold in the comparison matrix C comprises:
comparing the value of the element bit with a first threshold value from the element bit of a first row and a first column in a comparison matrix C, searching the element bit of which the value is greater than the first threshold value, and taking the value of the element bit as a first expected value;
if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s ═ r + (j-i) × K; k is M/N, M is the frame interval for extracting the originally recorded to-be-detected video file image, and N is the frame interval for extracting the advertisement video template file image;
and (4) forming a data set with the column numbers of the rows as arithmetic number rows by using all the expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic number rows is K.
5. The method of claim 4, wherein if the first expected value is found in the element bit of the ith row and the r column, the next expected value is not found in the s column of the jth row;
searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm; wherein the predetermined range is within a region of s ± 20% K;
if the next expected value is found in the preset range, continuously searching the next expected value by taking the row where the expected value is located as a reference;
and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found.
6. The method of claim 1, further comprising the steps of: processing the data fragments at the fracture part under the condition that the data group with the column number of the arithmetic progression is fractured, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression;
wherein, when the data group with the column number of the arithmetic progression has the arithmetic progression fracture, the step of processing the data fragment at the fracture part and continuing the data group with the column number of the arithmetic progression comprises the following steps:
if the number of the data rows with the column numbers of the arithmetic progression is smaller than a third threshold value and the distance between the data rows with the column numbers of the arithmetic progression which appears again and is larger than K rows, determining the data rows with the column numbers of the arithmetic progression which are smaller than the third threshold value as fragment segments, and discarding the fragment segments; and/or the presence of a gas in the gas,
and judging the relation between the time length corresponding to the gap between two adjacent sets of row numbers with the arithmetic progression and the time total length of the advertisement template, if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the time total length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two sets of row numbers with the arithmetic progression into a data set.
7. The video advertisement identification method according to claim 1 or 6, wherein the step of determining the video advertisement in the video file to be detected based on the data group with the column number as the arithmetic progression comprises:
determining image frames in the video file to be detected corresponding to the first expected value and the last expected value in the data group;
searching a display timestamp corresponding to the image frame based on the corresponding position serial numbers of the image frame corresponding to the first expected value and the image frame corresponding to the last expected value in the video file to be detected;
determining the starting and stopping milliseconds of the image in the video file to be detected corresponding to the data set, namely the total milliseconds of the matched advertisements according to the time stamp;
comparing the matched total seconds of the advertisements with a second threshold value;
if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.
8. A video advertisement recognition apparatus, comprising:
the extraction unit is used for extracting the characteristics of the advertisement template image subset constructed according to the advertisement video template file and the image subset to be detected constructed according to the video file to be detected to obtain a first fingerprint characteristic vector matrix A and a second fingerprint characteristic vector matrix B;
a computing unit for transposing the first fingerprint feature vector matrix A into a matrix ATPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;
the data group building unit is used for forming a data group with the row number of an arithmetic progression row by the element bits which are larger than the first threshold value in the comparison matrix C;
and the identification unit is used for determining the video advertisement in the video file to be detected based on the data group under the condition that the data group with the column number of the arithmetic progression column has no arithmetic progression column fracture.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
10. An apparatus, comprising: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the storage medium has stored therein instructions for carrying out the steps of the method according to any one of claims 1 to 8.
CN202010902794.6A 2020-09-01 2020-09-01 Video advertisement identification method, device, storage medium and equipment Active CN112291616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010902794.6A CN112291616B (en) 2020-09-01 2020-09-01 Video advertisement identification method, device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010902794.6A CN112291616B (en) 2020-09-01 2020-09-01 Video advertisement identification method, device, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN112291616A true CN112291616A (en) 2021-01-29
CN112291616B CN112291616B (en) 2023-01-06

Family

ID=74419739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010902794.6A Active CN112291616B (en) 2020-09-01 2020-09-01 Video advertisement identification method, device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN112291616B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120177296A1 (en) * 2011-01-07 2012-07-12 Alcatel-Lucent Usa Inc. Method and apparatus for comparing videos
CN103235956A (en) * 2013-03-28 2013-08-07 天脉聚源(北京)传媒科技有限公司 Method and device for detecting advertisements
CN107609466A (en) * 2017-07-26 2018-01-19 百度在线网络技术(北京)有限公司 Face cluster method, apparatus, equipment and storage medium
CN110321958A (en) * 2019-07-08 2019-10-11 北京字节跳动网络技术有限公司 Training method, the video similarity of neural network model determine method
CN111339368A (en) * 2020-02-20 2020-06-26 同盾控股有限公司 Video retrieval method and device based on video fingerprints and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120177296A1 (en) * 2011-01-07 2012-07-12 Alcatel-Lucent Usa Inc. Method and apparatus for comparing videos
CN103235956A (en) * 2013-03-28 2013-08-07 天脉聚源(北京)传媒科技有限公司 Method and device for detecting advertisements
CN107609466A (en) * 2017-07-26 2018-01-19 百度在线网络技术(北京)有限公司 Face cluster method, apparatus, equipment and storage medium
CN110321958A (en) * 2019-07-08 2019-10-11 北京字节跳动网络技术有限公司 Training method, the video similarity of neural network model determine method
CN111339368A (en) * 2020-02-20 2020-06-26 同盾控股有限公司 Video retrieval method and device based on video fingerprints and electronic equipment

Also Published As

Publication number Publication date
CN112291616B (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN111026915B (en) Video classification method, video classification device, storage medium and electronic equipment
CN109874029B (en) Video description generation method, device, equipment and storage medium
US10133818B2 (en) Estimating social interest in time-based media
CN111464833B (en) Target image generation method, target image generation device, medium and electronic device
CN108184135B (en) Subtitle generating method and device, storage medium and electronic terminal
CN108509611B (en) Method and device for pushing information
CN112291589B (en) Method and device for detecting structure of video file
US9275682B1 (en) Video content alignment
US20180308522A1 (en) Video frame difference engine
CN113837083B (en) Video segment segmentation method based on Transformer
CN104754403A (en) Method and system for video sequential alignment
CN115244939B (en) System and method for data stream synchronization
CN114339360B (en) Video processing method, related device and equipment
CN111263183A (en) Singing state identification method and singing state identification device
CN112291616B (en) Video advertisement identification method, device, storage medium and equipment
CN114694070A (en) Automatic video editing method, system, terminal and storage medium
CN109101964B (en) Method, device and storage medium for determining head and tail areas in multimedia file
CN112651449A (en) Method and device for determining content characteristics of video, electronic equipment and storage medium
US20190279012A1 (en) Methods, systems, apparatuses and devices for facilitating inspection of industrial infrastructure by one or more industry experts
CN111629267A (en) Audio labeling method, device, equipment and computer readable storage medium
CN113039805A (en) Accurately automatically cropping media content by frame using multiple markers
KR20210064587A (en) High speed split device and method for video section
CN113766311B (en) Method and device for determining video segment number in video
Ram et al. Video Analysis and Repackaging for Distance Education
CN106101573A (en) The grappling of a kind of video labeling and matching process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant