CN112291616A

CN112291616A - Video advertisement identification method, device, storage medium and equipment

Info

Publication number: CN112291616A
Application number: CN202010902794.6A
Authority: CN
Inventors: 朱永亮; 尹海沧; 马文闯; 刘利; 刘殿龙; 曹明阔; 熊浩
Original assignee: Potevio Peacetech Co ltd
Current assignee: Potevio Peacetech Co ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2021-01-29
Anticipated expiration: 2040-09-01
Also published as: CN112291616B

Abstract

The scheme discloses a video advertisement identification method, a device, a medium and equipment, and the method comprises the following steps: performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B; transpose matrix A of first fingerprint feature vector matrix A^TPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C; forming a data group with the row number being an arithmetic progression row by using the element bits which are larger than the first threshold value in the comparison matrix C; and under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column. The scheme is reducedThe fluctuation interference of the low engineering application scene enables the searching of the arithmetic progression to be more flexible, the recognition rate to be higher and more stable, and the positioning of the advertisement playing time to be more accurate.

Description

Video advertisement identification method, device, storage medium and equipment

Technical Field

The invention relates to the technical field of video processing. And more particularly, to an advertisement video identification method, apparatus and device.

Background

With the development of television videos, video advertisements are continuously introduced into video videos played daily. The traditional television station implants more advertisements intermittently through the played television videos, the playing of the advertisements can seriously affect the audio-visual experience of users, and the users can consume more time on the advertisements which are not interested.

In the traditional video advertisement identification method, audio and video MP4 files recorded 24 hours a day are decoded frame by frame, and one frame of image is taken as a subset of the image to be detected at a fixed integer M frame interval. And (3) decoding a certain advertisement video file which is manually cut and verified in an advertisement material library frame by frame, and taking one frame of image as a template image subset at intervals of fixed integer N frames. And then based on image quality evaluation criteria such as SAD, PSNR, SSIM and the like, obtaining the corresponding relation of the most similar images. In an ideal situation, with the increment of the image frame number of the "image subset to be detected" (the first column of data in fig. 1), the image frame number of the "template image subset" (the fourth column of data in fig. 1) may form an "arithmetic difference number sequence", and the corresponding relationship between the images (the third column of data in fig. 1) is found according to the principle of the highest similarity between the images. If the arithmetic progression can be continuously prolonged, the start playing time and the end playing time of the video advertisement can be accurately acquired, the start playing time is the playing time of the image frame to be detected in the first data of the arithmetic progression, the time unit is millisecond, the end playing time is the playing time of the image frame to be detected in the last data of the arithmetic progression, the time unit is millisecond, based on the accurately acquired start playing time and end playing time of the video advertisement, whether the advertisement exists in the video MP4 file to be detected, and the start time and the end time (the time is accurate to millisecond) of each playing of the advertisement can be realized.

However, the advertisement identification result obtained by the method is not accurate enough, the missing identification is high, and the workload of manual review cannot be effectively reduced.

Disclosure of Invention

One object of the present solution is to provide a method, an apparatus, a storage medium, and a device for fast identification of a video advertisement segment that is repeatedly played.

Another object of the present solution is to provide a device and an apparatus for performing the above recognition method.

In order to achieve the purpose, the scheme is as follows:

in a first aspect, the present disclosure provides a video advertisement recognition method, including:

performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B;

performing cosine similarity calculation on the transposed matrix AT of the first fingerprint characteristic vector matrix A and the second fingerprint characteristic vector matrix B to obtain a comparison matrix C;

forming a data group with the row number being an arithmetic progression row by using the element bits which are larger than the first threshold value in the comparison matrix C;

and under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column.

In a preferred embodiment, the constructing step of the image subset to be detected includes:

taking M frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of n from an originally recorded to-be-detected video file;

and intercepting each image in the image subset to be processed to obtain an image of a central m area in each image, and forming the image subset to be detected.

taking 12 frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of 360x288 from an originally recorded to-be-detected video file with the resolution of 720x 576;

and intercepting each image in the image subset to be processed, acquiring an image with the resolution of 224x224 in the middle area of each image, and forming the image subset to be detected.

In a preferred embodiment, the constructing of the subset of advertisement template images comprises:

and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals.

In a preferred embodiment, the step of performing cosine similarity calculation on the transposed matrix AT of the first fingerprint feature vector matrix a and the second fingerprint feature vector matrix B to obtain the comparison matrix C includes:

the matrix A is an X row and an A column, and the matrix B is a B row and an X column;

multiplying a first column of a transpose matrix AT of the first fingerprint characteristic vector matrix A with each element bit in a first row of a second fingerprint characteristic vector matrix B, and then summing to obtain a first parameter value p 11;

performing modulo operation on a first row of the second fingerprint eigenvector matrix B and a first column of the transposed matrix AT of the first fingerprint eigenvector matrix A respectively to obtain a second parameter value q11 and a third parameter value r 11;

based on the first parameter value p11, the second parameter value q11 and the third parameter value r11, obtaining a cosine similarity value t11, wherein t11 is p11/q11/r11, and the value of t11 is used as the value of the element bit of the first row and the first column of the comparison matrix C;

based on the above steps, values of t12, t13 … … t1a, up to tba are calculated similarly, and correspond to values of each element bit as the alignment matrix C.

In a preferred example, the step of forming the data group with the column number of the arithmetic progression column by using the element bits of the alignment matrix C larger than the first threshold includes:

comparing the value of the element bit with a first threshold value from the element bit of a first row and a first column in a comparison matrix C, searching the element bit of which the value is greater than the first threshold value, and taking the value of the element bit as a first expected value;

if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s ═ r + (j-i) × K; k is M/N, M is the frame interval for extracting the originally recorded to-be-detected video file image, and N is the frame interval for extracting the advertisement video template file image;

and (4) forming a data set with the column numbers of the rows as arithmetic number rows by using all the expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic number rows is K.

In a preferred embodiment, if the element bit of the ith row and the r column finds the first expected value, the next expected value is not found in the s column of the jth row;

searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm;

if the next expected value is found in the preset range, continuously searching the next expected value by taking the row where the expected value is located as a reference;

and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found.

In a preferred example, the predetermined range is within a region of s ± 20% K.

In a preferred embodiment, the method further comprises the steps of: and under the condition that the data group with the column number of the arithmetic progression is broken, processing the data fragments at the broken part, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression.

In a preferred example, in the case that there is an arithmetic progression fracture in the data set with the column number of the arithmetic progression, the step of processing the data fragment at the fracture site includes:

if the number of the data rows with the column numbers of the arithmetic progression is smaller than a third threshold value and the distance between the data rows with the column numbers of the arithmetic progression which appears again and is larger than K rows, determining the data rows with the column numbers of the arithmetic progression which are smaller than the third threshold value as fragment segments, and discarding the fragment segments; and/or the presence of a gas in the gas,

and judging the relation between the time length corresponding to the gap between two adjacent sets of row numbers with the arithmetic progression and the time total length of the advertisement template, if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the time total length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two sets of row numbers with the arithmetic progression into a data set.

In a preferred example, the step of determining the video advertisement in the video file to be detected based on the data set with the column number as the arithmetic progression comprises:

determining image frames in the video file to be detected corresponding to the first expected value and the last expected value in the data group;

searching a display timestamp corresponding to the image frame based on the corresponding position serial numbers of the image frame corresponding to the first expected value and the image frame corresponding to the last expected value in the video file to be detected;

and determining the starting and stopping milliseconds of the image in the video file to be detected corresponding to the data group, namely the total milliseconds of the matched advertisements according to the time stamp.

In a preferred example, the step of determining the video advertisement in the video file to be detected based on the data set further includes:

comparing the matched total seconds of the advertisements with a second threshold value;

if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.

In a second aspect, the present disclosure provides a video advertisement recognition apparatus, including:

the extraction unit is used for extracting the characteristics of the advertisement template image subset constructed according to the advertisement video template file and the image subset to be detected constructed according to the video file to be detected to obtain a first fingerprint characteristic vector matrix A and a second fingerprint characteristic vector matrix B;

the computing unit is used for performing cosine similarity computation on the transposed matrix AT of the first fingerprint characteristic vector matrix A and the second fingerprint characteristic vector matrix B to obtain a comparison matrix C;

the data group building unit is used for forming a data group with the row number of an arithmetic progression row by the element bits which are larger than the first threshold value in the comparison matrix C;

and the identification unit is used for determining the video advertisement in the video file to be detected based on the data group under the condition that the data group with the column number of the arithmetic progression column has no arithmetic progression column fracture.

In a third aspect, the present solution provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above-mentioned advertisement video identification method.

In a fourth aspect, the present solution provides an apparatus comprising: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the storage medium stores instructions for executing the steps of the advertisement video identification method.

The scheme has the following beneficial effects:

the advertisement identification method based on the image fingerprints reduces fluctuation interference of engineering application scenes, enables searching of an arithmetic progression to be more flexible, is higher and stable in identification rate, and is more accurate in positioning of advertisement playing time.

Drawings

In order to illustrate the implementation of the solution more clearly, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the solution, and that other drawings may be derived from these drawings by a person skilled in the art without inventive effort.

FIG. 1 is a schematic diagram showing an example of a data group with equal difference column number according to the present embodiment

FIG. 2 is a schematic processing flow diagram illustrating a video advertisement recognition method according to the present embodiment;

FIG. 3 is a diagram illustrating a comparison of fingerprint frames;

FIG. 4 is a schematic diagram showing an example of an alignment matrix according to the present embodiment;

fig. 5 shows a schematic diagram of a video advertisement recognition device according to the present scheme;

FIG. 6 shows a schematic diagram of an apparatus according to the present solution;

fig. 7 shows a schematic diagram of the recognition accuracy of the advertisement recognition by using the method of the present invention.

Detailed Description

Embodiments of the present solution will be described in further detail below with reference to the accompanying drawings. It is clear that the described embodiments are only a part of the embodiments of the present solution, and not an exhaustive list of all embodiments. It should be noted that, in the present embodiment, features of the embodiment and the embodiment may be combined with each other without conflict.

Through research and analysis of the existing television advertisement video identification method, the following interference factors exist in practical application:

(1) the input signal recorded by the acquisition card is a CVBS analog audio and video signal output by the set top box, and the gray value of the image pixel is slightly changed through operations such as analog-to-digital conversion, video coding and the like, so that the fingerprint characteristic vector of the image is slightly changed;

(2) the artificially cut advertisement template file requires the start/end position of the advertisement segment to be accurate to the image frame level, so that the originally recorded material file is required to be decoded, positioned to the specific image frame position and then re-encoded. In the stage of extracting fingerprints from the advertisement template, the encoding types (I frame and P frame) of the frame image are different from those of the original recorded material, so that the gray value of the pixel of the image is slightly changed, and the fingerprint feature vector of the image is slightly changed;

(3) the frame interval value M is not equal to N, so that the image picture content between the image subset to be detected and the template image subset is inconsistent, and the fingerprint characteristic vectors of the images are obviously different;

(4) due to the strategy of broadcasting the advertisement by the television station, after the advertisement template segment is broadcasted for a certain day, the local change of the advertisement picture content within a few seconds can be made, so that the novelty of the advertisement is increased, and the attention of consumers is attracted. Therefore, the content of the advertisement picture played at the later stage is greatly changed in the local image frame, and the fingerprint feature vector of the image is greatly changed;

(5) due to the fact that the resolution of original recorded material MP4 files is different, the difference of fingerprint vectors of images is not outstanding enough for two different advertisements with the same background;

(6) the start time and the end time of the advertisement play are required to be accurate to the image frame, i.e., millisecond level. However, the traditional method of directly calculating the image frame number (for example, the frame number obtained by using OpenCV decoding) and the video frame rate has a large error, and the accumulated error of the MP4 file, which is a raw material recorded in one day, reaches tens of seconds, which cannot meet the demand.

Due to the existence of the interference factors from (1) to (6), large errors and fluctuation exist, the equal difference series is often forced to be interrupted, a large amount of fragmentation phenomena exist, the obtained advertisement identification result is not accurate enough, the missing identification is high, and the workload of manual review cannot be effectively reduced.

Therefore, the scheme aims to provide a video advertisement identification method, a fingerprint feature vector matrix of an image is extracted based on a VGG19 deep learning network model (VGG, Visualgeometry Group), and a comparison matrix C is obtained by using cosine similarity as a basis for comparing the fingerprint feature vector matrix, so that the calculation is simplified; and forming a data group with the column number of an arithmetic progression column by using the element bits which are larger than a preset threshold value in the comparison matrix C, and finally determining the video advertisement in the video file to be detected by using the data group with the column number of the arithmetic progression column.

Hereinafter, a video advertisement recognition method proposed by the present solution is described in detail with reference to fig. 2 to 4. The method may comprise the steps of:

s1, constructing a to-be-detected image subset and an advertisement template image subset;

step S2, extracting the characteristics of a template image subset constructed according to the advertisement video template file and an image subset to be detected constructed according to the video file to be detected to obtain a first fingerprint characteristic vector matrix A and a second fingerprint characteristic vector matrix B;

step S3, transpose matrix A of first fingerprint feature vector matrix A^TPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;

step S4, forming a data group with the row number being an arithmetic progression by the element bits which are larger than the first threshold value in the comparison matrix C;

step S5, under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression;

and step S6, under the condition that the data group with the column number of the arithmetic progression has the arithmetic progression fracture, processing the data fragment at the fracture part, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression.

When the scheme identifies the video advertisement, it needs to prepare the image subset to be detected and the advertisement template image subset, that is, the scheme constructs the image subset to be detected and the advertisement template image subset in step S1.

The construction mode of the image subset to be detected can be as follows: taking M frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of n from an originally recorded to-be-detected video file; and intercepting each image, acquiring an image of an m area in each image area, and forming an image subset to be detected. In one embodiment, with 12 frames as an extraction interval, performing cubic convolution down-sampling processing on a to-be-detected video file with an original recording resolution of 720x576 to obtain a to-be-processed image subset with a resolution of 360x 288; and intercepting each image in the image subset to be processed, acquiring an image of a 224x224 area in the middle of each image, and forming the image subset to be detected. The image subset to be detected formed in the mode is used as input data, the fingerprint feature vectors of the obtained images are extracted, original image information can be utilized to a greater extent, the utilization rate of image texture content is improved from 12% (namely 224x224/720/576) to 48% (namely 224x224/360/288), different advertisement templates with the same picture background in a large range are effectively solved, and the difference of the fingerprint feature vectors of the images is more obvious.

The advertisement template image subset may be constructed in the following manner: and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals. In one embodiment, the subset of advertisement template images is obtained from an advertisement video template file that is manually cut from the advertisement material at 2 frame extraction intervals.

In step S2, feature extraction may be performed on the advertisement template image subset and the to-be-detected image subset by using a VGG19 deep learning network model, so as to obtain a first fingerprint feature vector matrix a and a second fingerprint feature vector matrix B. The matrix A is X rows and A columns, and the matrix B is B rows and X columns. In one embodiment, the matrix a may be a matrix with a rows and 512 columns, and the matrix B may be a matrix with B rows and 512 columns.

In the scheme, a one-to-many mapping mode is adopted, cosine similarity calculation is carried out on the fingerprint characteristic vector of each line in the image subset to be detected and all fingerprint characteristic vectors corresponding to the advertisement template image subset, so that the image to be detected is matched in the advertisement template image, and the position serial number of the matched frame fingerprint is found. The position sequence number is a sequence number corresponding to each image in the image subset, namely a sequence index number. In the above scheme, the original material image sequence is subjected to frame extraction at equal intervals by M frames, and the images in the obtained image subset have sequential serial numbers, for example, the images are numbered by 0,1, 2, and 3. The original recorded file is analyzed by using the FFProbe software commonly used in the industry, a corresponding table can be obtained, and for the image frames with the sequence numbers of 0,1, 2 and 3, the Time0, Time1, Time2 and Time3 which are accurate to the millisecond level can be obtained. As the precision requirement of the user on the time positioning needs to reach the millisecond level, the precision requirement of the millisecond level can be more accurately and reliably reached by the method.

In step S3, transpose matrix a of first fingerprint feature vector matrix a is formed^TAnd performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C. Specifically, the transposed matrix A of the first fingerprint feature vector matrix A is firstly used^TIs multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter value p₁₁(ii) a Respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix A^TObtaining a second parameter value q by modulo the first column of₁₁And a third parameter value r₁₁(ii) a Based on the first parameter value p₁₁A second parameter value q₁₁And a value of a third parameter r₁₁To calculate the cosine similarity value t₁₁，t₁₁＝p₁₁/q₁₁/r₁₁Will t₁₁The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C; continue to calculate t in the same way₁₂，t₁₃……t_1aUp to t_baAnd corresponding to the value of each element bit of the comparison matrix C. In this way, the alignment matrix C is constructed. The element of the ith row and the jth column of the comparison matrix C is the cosine similarity between the fingerprint feature vector of the ith frame in the image subset to be detected and the fingerprint feature vector of the jth frame in the advertisement template image subset. In the scheme, the calculation formula of the cosine similarity is as follows: where a is a column of the matrix a and B is a row of the matrix B.

Wherein, t is as defined above₁₁，t₁₂，t₁₃……t_1aIs the result of cos θ calculation, where the result of cos θ is a matrix, and the comparison matrix C is developed as follows:

t₁₁，t₁₂，t₁₃……t_1a

t₂₁，t₂₂，t₂₃……t_2a

t₃₁，t₃₂，t₃₃……t_3a

……

t_b1，t_b2，t_b3……t_ba

the expanded alignment matrix C, if represented by a column vector, is (t)₁，t₂，t₃……t_a)

Therefore, cos θ ═ a ═ b/(| a | b |) is a calculation formula, and (t ═ b |) is a calculation formula₁，t₂，t₃……t_a) Is a column vector representation of cos θ.

In the scheme, the numerical value of each element bit in the comparison matrix C is compared with a first threshold value, so that a data group with the row number of an arithmetic sequence is established. In step S4, the element bits in the comparison matrix C larger than the first threshold are combined into a data set with the row number being the arithmetic progression. Specifically, starting from the element bit in the first row and the first column in the comparison matrix C, comparing the value of the element bit with a first threshold, searching for the element bit with the value of the element bit larger than the first threshold, and taking the value of the element bit as a first expected value; if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s ═ r + (j-i) × K; k is M/N, M is the frame interval for extracting the originally recorded to-be-detected video file image, and N is the frame interval for extracting the advertisement video template file image; and (4) forming a data set with the column numbers of the rows as arithmetic number rows by using all the expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic number rows is K.

Ideally, starting from a certain row of the comparison matrix C, a value greater than the first threshold can be found in a specific column of each subsequent row, and all values greater than the first threshold are combined into a data set with an arithmetic sequence number, where K is M/N, M is a frame interval for extracting the originally recorded to-be-detected video file image, and N is a frame interval for extracting the advertisement video template file image. However, since the arithmetic sequence of the column numbers is interrupted due to fluctuation of data in reality, it is necessary to deal with the problem of interruption by some algorithm so that the column numbers which are supposed to be the arithmetic sequence in an ideal state are continued.

Specifically, in step S4, if the first expected value is found in the element bit of the ith row and the nth column, the next expected value is not found in the S column of the jth row; searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm; if the next expected value is found in the preset range, continuously searching the next expected value by taking the row where the expected value is located as a reference; and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found. In a preferred embodiment, the predetermined range may be within the region of s ± 20% K.

In the scheme, in order to ensure the accuracy of advertisement identification, it is required to ensure that the video advertisement is identified when the data group with the column number of the arithmetic progression is not broken. Therefore, if the constructed data set with the column number of the arithmetic progression is not broken, step S5 can be directly executed, that is, in the case that the data set with the column number of the arithmetic progression is not broken, the video advertisement in the video file to be detected can be determined based on the data set with the column number of the arithmetic progression. If the data set with the column number of the arithmetic progression has the condition of arithmetic progression fracture, the step S6 is further executed, that is, in the case that the data set with the column number of the arithmetic progression has the condition of arithmetic progression fracture, the data fragment at the fracture is processed, the data set with the column number of the arithmetic progression is continued, and the video advertisement in the video file to be detected is determined based on the continued data set with the column number of the arithmetic progression.

Specifically, whether the data line is a fragment may be determined according to the length of the judgment fragment, for example, if the number of data lines with column numbers of arithmetic progression is less than a third threshold and is more than K lines away from the data line with column numbers of arithmetic progression which appears again, the data line with column numbers of arithmetic progression which is less than the third threshold is determined as the fragment, and the fragment is discarded. And if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the total time length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two groups of columns being the arithmetic progression into a data set.

In one embodiment, if the data row with the column number of the arithmetic progression continues for only the first three rows, for example, the data row with the column number of the arithmetic progression does not appear continuously from the back of only 99840-99842, the data row with the column number of the arithmetic progression continues for a long time, for example, the first column of data with the column number of the arithmetic progression appears next time may be 130000. Then the three rows 99840-99842 become three "isolated" rows of data with very short column numbers of arithmetic sequence, and it is considered that these rows of data may be a scene in the television show which is accidentally similar to a certain advertisement, and are not the target of the desired search, therefore, these three isolated fragment segments are regarded as "fragments" which should be discarded.

In another embodiment, for example, for a tv station in a province, an advertisement template is usually extremely long in duration, such as more than 40 minutes for advertisements in healthcare products, medicines, medical instruments, etc. In the process of advertisement playing, for example, when meeting an hour, a television station inserts a hour-hour short video of 5 seconds. For another example, in order to avoid the advertisement identification of the AI artificial intelligence, the advertisement publisher temporarily adds a public service advertisement with a length of 10 seconds after playing the advertisement for 20 minutes each time, and then continues playing the advertisement. For the two application scenarios, the 5 second hour of the 40 minutes and the 10 second public service advertisement of the 20 minutes are both corresponding to the intermittent or transient disconnection phenomenon in the data row with long row number being the arithmetic progression. This intermittent or transient disconnection can be ignored at this point and the advertisement is considered to be still playing continuously. Therefore, in this case, two sets of data sets having the equal-difference number of column numbers can be integrated into one set of data set, so that the influence of the intermittent or transient disconnection phenomenon is not exerted. In a preferred embodiment, if the ratio of the time length corresponding to the disconnection phenomenon to the time length of the advertisement itself does not exceed 10%, the advertisement can be considered to be still in continuous play.

In the scheme, after the data group with the column number being the arithmetic progression column is determined not to have the fracture condition, the step of identifying the video advertisement can be continuously executed. Specifically, image frames in the video file to be detected corresponding to a first expected value and a last expected value in the data group are determined; searching a display time stamp PTS corresponding to the image frame based on the corresponding position serial numbers of the image frame corresponding to the first expected value and the image frame corresponding to the last expected value in the video file to be detected; and determining the starting and stopping seconds of the image in the video file to be detected corresponding to the data group, namely the total seconds of the matched advertisement according to the time stamp. The number of seconds to start and stop of the image can be accurate to the millisecond level.

In the scheme, in order to further improve the accuracy of advertisement identification, the total seconds of the obtained advertisement can be verified by setting a second threshold. Specifically, the matched total seconds of the advertisement is compared with a second threshold value; if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.

By the method, the interference caused by data fluctuation can be effectively reduced, so that the identification rate and the stability of the advertisement in the video are improved.

As shown in fig. 5, the present solution further provides a video advertisement recognition apparatus 101 implemented in cooperation with the video advertisement recognition method, the apparatus including: a first image set constructing unit 102, a second image set constructing unit 103, an extracting unit 104, a calculating unit 105, an identifying unit 107 and a compensating unit 108.

When the video advertisement recognition device 101 works, the first image set construction unit 102 is used for constructing the image subset to be detected, and the second image set construction unit 103 is used for constructing the advertisement template image subset. Feature extraction is performed on the advertisement template image subset and the image subset to be detected through an extraction unit 104 based on a VGG19 deep learning network model, so that a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B are obtained. Using a computing unit 105 to transpose a of the first fingerprint feature vector matrix a^TAnd performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C. And then, a data group with the row number of an arithmetic progression row is formed by using the data group building unit to compare the element bits which are larger than the first threshold value in the matrix C. In case there is no break in the data set, the video advertisement in the video file to be detected is determined by the recognition unit 107 based on the data set. When the data set has a fracture, the data fragment at the fracture position needs to be further processed by a compensation unit when the data set with the column number of the arithmetic progression has an arithmetic progression fracture, and the data set with the column number of the arithmetic progression is continued; after the compensation is completed, the identification unit 107 determines the video advertisement in the video file to be detected based on the data set of which the supplemented column number is the arithmetic progression column.

On the basis of the above data acquisition method embodiment, the present solution further provides a computer-readable storage medium. The computer-readable storage medium is a program product for implementing the above-described data acquisition method, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAvA, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

On the basis of the embodiment of the data acquisition method, the scheme further provides the electronic equipment. The electronic device shown in the drawing SSS is only an example, and should not bring any limitation to the function and the range of use of the embodiment of the present invention.

As shown in fig. 6, the electronic device 201 is in the form of a general purpose computing device. The components of the electronic device 201 may include, but are not limited to: at least one memory unit 202, at least one processing unit 203, a display unit 204 and a bus 205 for connecting different system components.

Wherein the storage unit 202 stores a program code, which can be executed by the processing unit 203, such that the processing unit 203 performs the steps of the various exemplary embodiments described in the above data acquisition method. For example, the processing unit 203 may perform the steps as shown in fig. 1.

The memory unit 202 may include volatile memory units such as a random access memory unit (RAM) and/or a cache memory unit, and may further include a read only memory unit (ROM).

The storage unit 202 may also include programs/utilities with program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The bus 205 may include a data bus, an address bus, and a control bus.

The electronic device 201 may also communicate with one or more external devices 207 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 206. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 201, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The present solution is further illustrated by way of example below.

In the embodiment, for an original recorded MP4 file to be detected, a subset of the image to be detected is extracted by taking M as 12; the video advertisement identification method in this example is described in detail for the artificially cut advertisement MP4 file, taking N-2 extraction template image subset as an example.

In this example, for an original 720x 576-resolution image, an INTER _ CUBIC interpolation is used to obtain a down-sampled 360x 288-resolution image, and then an intermediate 224x224 region is intercepted as input data of the VGG19 deep learning network model, so as to extract a fingerprint feature vector of the image. By the method, the original image information can be fully utilized, the utilization rate of the image texture content is improved from 12% (namely 224x224/720/576) to 48% (namely 224x224/360/288), different advertisement templates with the same picture background in a large range are effectively solved, and the difference of the fingerprint feature vectors of the images is more obvious.

And taking one frame from every 2 frames in the video advertisement template file, and taking a frames together to form an advertisement template image subset. And taking the advertisement template image subset as input data of the VGG19 deep learning network model, extracting fingerprint characteristic vectors of the images, and obtaining a first fingerprint characteristic vector matrix A. Extracting fingerprint feature vectors for each frame, wherein each fingerprint feature vector is a data row containing 512 values, and finally obtaining a matrix of a × 512, which is called a matrix A; one frame is taken from each M frames of the video to be detected, M is an integral multiple of the N, for example, M is 6N and is 12, and it is assumed that b frames are taken from the video file together to form an image subset to be detected. And taking the to-be-detected image subset as input data of the VGG19 deep learning network model, extracting the fingerprint characteristic vector of the image, and obtaining a second fingerprint characteristic vector matrix B. Extracting fingerprints for each frame, wherein each fingerprint is a data row containing 512 values, and finally obtaining a matrix B × 512, which is called a matrix B;

and matching the image to be detected in the advertisement template image based on the fingerprint characteristic vector, and searching the serial number of the matched frame fingerprint.

Firstly, the matrix A needs to be transposed to obtain the matrix A^TThen A is^TIs a matrix of 512 rows and a columns, then the matrices B and A^TAnd performing cosine similarity operation to obtain a comparison matrix C of b rows and a columns. Specifically, a transposed matrix A of a first fingerprint feature vector matrix A is formed^TIs multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter value p₁₁(ii) a Then, the first row of the matrix B is subjected to modulus calculation to obtain a second parameter value q₁₁Transpose matrix A^TIs modulo to obtain a third parameter value r₁₁(ii) a Based on the first parameter value p₁₁A second parameter value q₁₁And a value of a third parameter r₁₁To calculate the cosine similarity value t₁₁，t₁₁＝p₁₁/q₁₁/r₁₁Will t₁₁The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C; continue to calculate t in the same way₁₂，t₁₃……t_1aUp to t_baAnd corresponding to the value of each element bit of the comparison matrix C. Wherein, the cosine similarity formula is: cos ═ a × b)/((| a | b |).

In this example, the value of the element bit in the ith row and the jth column of the comparison matrix C is the cosine similarity between the fingerprint feature vector of the ith frame of the video image to be detected and the fingerprint feature vector of the jth frame in the video image of the advertisement template. Searching a similarity value larger than a first threshold value T from a first row and a first column of a comparison matrix C to a b-th row and an a-th column of the comparison matrix C. In this example, the first threshold T is set to 0.85.

In an ideal case without influence of factors such as data toggling, a value greater than the first threshold T can be found in a specific column of each row, starting from a certain row in the alignment matrix C (where the first element bit greater than the first threshold is located), for example, the rows are listed as [100,15], [101,15+ (101-. However, in the actual recognition process, the data is toggled, which causes the series of the column number arithmetic progression to be interrupted, so that the arithmetic progression is required to be processed to realize the continuity of the arithmetic progression of the column number under the ideal condition. The specific treatment method is as follows:

if the first expected value is found in row i, column r, then the expected value should be found in column s of row j, where s is r + (j-i) K. If the expected position is not found, a search local minimum algorithm is adopted to expand the search range, namely, the desired value is searched within a certain range before and after s, and the search cost is minimum. In this example, it will be found within 20% (i.e., s ± 20% K), and if found, it is still considered to continue the arithmetic progression; if not, jump to the next line to look for, and loop until the appropriate expected value is found.

Due to the interference of data shifting, the row number arithmetic series are broken, and at the moment, the broken part needs to be processed. The specific mode is as follows:

the scrap disposal and fracture repair comprises:

1) particularly short fragments, such as fragments less than 3 in length, are discarded; specifically, if the data row with the column number of the arithmetic progression continues for only the first three rows, for example, the data row with the column number of the arithmetic progression does not appear continuously from the back of only 99840-99842, and the data row with the column number of the arithmetic progression continues to appear long away, for example, the first column of data with the column number of the arithmetic progression appearing next time may be 130000. Then the three rows 99840-99842 become three "isolated" rows of data with very short column numbers of arithmetic sequence, and it is considered that these rows of data may be a scene in the television show which is accidentally similar to a certain advertisement, and are not the target of the desired search, therefore, these three isolated fragment segments are regarded as "fragments" which should be discarded.

2) The sequence of numbers between two adjacent segments, whose gaps correspond to a time length not exceeding the advertising template time length by a certain proportion, for example 10%, is then filled. In particular, for example, for a tv station in a province, an advertisement template is usually extremely long in duration, and the advertisement duration in terms of health products, medicines, medical instruments, etc. is more than 40 minutes. In the process of advertisement playing, for example, when meeting an hour, a television station inserts a hour-hour short video of 5 seconds. For another example, in order to avoid the advertisement identification of the AI artificial intelligence, the advertisement publisher temporarily adds a public service advertisement with a length of 10 seconds after playing the advertisement for 20 minutes each time, and then continues playing the advertisement. For the two application scenarios, the 5 second hour of the 40 minutes and the 10 second public service advertisement of the 20 minutes are both corresponding to the intermittent or transient disconnection phenomenon in the data row with long row number being the arithmetic progression. This intermittent or transient disconnection can be ignored at this point and the advertisement is considered to be still playing continuously. Therefore, in this case, two sets of data sets having the equal-difference number of column numbers can be integrated into one set of data set, so that the influence of the intermittent or transient disconnection phenomenon is not exerted. In a preferred embodiment, if the ratio of the time length corresponding to the disconnection phenomenon to the time length of the advertisement itself does not exceed 10%, the advertisement can be considered to be still in continuous play.

3) Because different advertisement templates with similar contents exist, the matching of the segments possibly overlapped by the time period in the video to be detected can be determined according to the large time length, and the specific requirements of the user are met.

And finally, identifying the video advertisement according to the obtained data set of the column number arithmetic progression without fracture. Specifically, an image frame corresponding to a first expected value and an image frame corresponding to a last expected value in a column number arithmetic progression need to be determined, then a display time stamp PTS corresponding to the image frame is searched according to serial numbers corresponding to the two image frames, and starting and stopping seconds of images in a video file to be detected corresponding to a data set, namely the total number of seconds of matched advertisements, are determined according to the time stamps.

In this example, in order to improve the reliability of advertisement identification, the total seconds of the matched advertisement may be further compared with a second threshold; if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid. In this example, the second threshold value may be established based on accumulated empirical values.

In this example, in order to cooperate with the implementation of the video advertisement recognition method, there is also provided an apparatus for implementing the advertisement recognition method, the apparatus including: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the storage medium has stored therein instructions for carrying out the steps of the above-described method. The processor may be an 8-core 16-thread, main frequency 2.1GHz, L3 cache 11M, memory 32GB CPU chip. The memory can be an 8TB mechanical hard disk. The device may also be equipped with a display equipped with a GTX 1080Ti standalone graphics card. Experiments show that when the method is implemented by adopting the equipment, the occupancy rate of the CPU is only about 15%, the occupancy rate of the display card is only about 50%, and the maximum occupancy rate is not more than 85%. The achieved advertisement recognition effect is shown in table 1.

TABLE 1

It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims

1. A method for identifying video advertisements, the method comprising the steps of:

transpose matrix A of first fingerprint feature vector matrix A^TPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;

2. The method of claim 1, wherein the step of constructing the subset of images to be detected comprises:

intercepting each image in the image subset to be processed, acquiring an image with the resolution of 224x224 in the middle area of each image, and forming an image subset to be detected;

the construction step of the advertisement template image subset comprises the following steps:

3. The video advertisement recognition method of claim 1, wherein the video advertisement recognition method is further characterized byThe transpose matrix A of the first fingerprint feature vector matrix A^TThe cosine similarity calculation is carried out with the second fingerprint feature vector matrix B, and the step of obtaining a comparison matrix C comprises the following steps:

multiplying the first column of the transpose matrix AT of the first fingerprint characteristic vector matrix A with each element bit in the first row of the second fingerprint characteristic vector matrix B, and then summing to obtain a first parameter value p₁₁；

Respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix A^TObtaining a second parameter value q by modulo the first column of₁₁And a third parameter value r₁₁；

Based on the first parameter value p₁₁A second parameter value q₁₁And a value of a third parameter r₁₁To calculate the cosine similarity value t₁₁，t₁₁＝p₁₁/q₁₁/r₁₁Will t₁₁The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C;

based on the steps, t is calculated in the same way₁₂，t₁₃……t_1aUp to t_baAnd corresponding to the value of each element bit of the comparison matrix C.

4. The method of claim 1, wherein the step of forming the data set with the row number of arithmetic progression rows by using the element bits larger than the first threshold in the comparison matrix C comprises:

5. The method of claim 4, wherein if the first expected value is found in the element bit of the ith row and the r column, the next expected value is not found in the s column of the jth row;

searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm; wherein the predetermined range is within a region of s ± 20% K;

6. The method of claim 1, further comprising the steps of: processing the data fragments at the fracture part under the condition that the data group with the column number of the arithmetic progression is fractured, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression;

wherein, when the data group with the column number of the arithmetic progression has the arithmetic progression fracture, the step of processing the data fragment at the fracture part and continuing the data group with the column number of the arithmetic progression comprises the following steps:

7. The video advertisement identification method according to claim 1 or 6, wherein the step of determining the video advertisement in the video file to be detected based on the data group with the column number as the arithmetic progression comprises:

determining the starting and stopping milliseconds of the image in the video file to be detected corresponding to the data set, namely the total milliseconds of the matched advertisements according to the time stamp;

8. A video advertisement recognition apparatus, comprising:

a computing unit for transposing the first fingerprint feature vector matrix A into a matrix A^TPerforming cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.

10. An apparatus, comprising: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the storage medium has stored therein instructions for carrying out the steps of the method according to any one of claims 1 to 8.