[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107426585A - A kind of television advertising based on audio/video information retrieval supervises broadcast system - Google Patents

A kind of television advertising based on audio/video information retrieval supervises broadcast system Download PDF

Info

Publication number
CN107426585A
CN107426585A CN201710648059.5A CN201710648059A CN107426585A CN 107426585 A CN107426585 A CN 107426585A CN 201710648059 A CN201710648059 A CN 201710648059A CN 107426585 A CN107426585 A CN 107426585A
Authority
CN
China
Prior art keywords
mrow
msub
audio
image
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710648059.5A
Other languages
Chinese (zh)
Inventor
郑丽敏
程国栋
杨璐
田立军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201710648059.5A priority Critical patent/CN107426585A/en
Publication of CN107426585A publication Critical patent/CN107426585A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2407Monitoring of transmitted content, e.g. distribution time, number of downloads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of television advertising based on audio/video information retrieval to supervise broadcast system.Use PV988AV specialty television acquisition cards and DirectShow technologies collection 6 one sections of videos for including advertisement of CCTV 1 and CCTV,Then manual segmentation goes out each advertisement as sample,150 advertisements are intercepted altogether,And start (first three frame) and ending (frame of tail three) progress data analysis to advertisement,Noise reduction is carried out to voice data,Filtering,Preemphasis,Adding window,Fourier transform,Cosine inverse transformation,Extract zero-crossing rate,Jing Yin rate,Short-time energy,The audio frequency characteristics such as MFCC coefficients,Noise reduction is carried out to the image of input,RGB color is changed to hsv color space,RGB color is to HSI color space conversions,Extract color histogram,Color moment,The graphic features such as color convergence vector,Extract Mel cepstrum coefficients and RGB color histogram,Using " one-to-one " multi-class sorting technique of SVMs,Sampling feature vectors are trained,Establish advertisement head identification model and advertisement tail identification model,And realize and the real-time prison of known advertisement is broadcast.

Description

A kind of television advertising based on audio/video information retrieval supervises broadcast system
Technical field
The present invention relates to AV treatment technology and technical field of computer vision, more particularly to one kind to be based on audio frequency and video The television advertising prison broadcast system of information retrieval.
Background technology
With the development and opening of China's Broadcast Television Industry, the TV programme of TV station are increasingly various, as current electricity The television advertising of the extra earning main source of television stations, higher requirement is proposed to the program of TV station.Advertisement detecting, load are broadcast It is last link of advertising management, and a most important link.And with broadcast television content management domain Deepen continuously, all kinds of digital audio/video frequency contents is fast in radio and television making issuing units and internet audio frequency and video community at different levels Speed expansion, especially occupies very big proportion and workload to commercial advertisement management in TV industry.Users are accurate to video, multiple Miscellaneous and personalized retrieval proposes requirement.Therefore the audio frequency and video searching system established under real-time status is studied to be very important.
The content of the invention
It is an object of the invention to provide a kind of television advertising based on audio/video information retrieval to supervise broadcast system, above-mentioned to solve Technical problem, main technical content of the invention are as follows:
A kind of television advertising based on audio/video information retrieval supervises broadcast system, including following six step:
(1) TV signal is gathered, and simulated television sound, vision signal are gathered using DirectShow technologies, with tune platform, in advance Look at, set audio sample attribute, sound card cache size, video frequency output size, adjustment video frame rate, the real-time gathered data that preserves to arrive A series of functions such as internal memory, gathered data to file;
(2) video file handle, advertising copy file is opened, played, reading attributes, audio frequency and video shunting, decoding, Read the sequence of operations such as audio, video data;
(3) audio feature extraction, it is inverse to voice data progress noise reduction, filtering, preemphasis, adding window, Fourier transform, cosine The audio frequency characteristics such as conversion, extraction zero-crossing rate, Jing Yin rate, short-time energy, MFCC coefficients;
(4) image characteristics extraction, noise reduction, RGB color to the conversion of hsv color space, RGB are carried out to the image of input Color space is to HSI color space conversions, extraction color histogram, color moment, the graphic feature such as color convergence vector;
(5) pattern-recognition, using " one-to-one " multi-class sorting technique of SVMs, input training sample is carried out Training, identification model is established, preserve identification model parameter to file;Can be to the list that inputs in real time using the identification model of foundation Individual forecast sample classification, non real-time prediction can also be carried out by the file path for the forecast sample for giving high-volume set form Classification.
(6) result exports, and the advertisement gone out to Real time identification preserves, shows that it is identified as which advertisement, time started, terminates Time, if the data such as erroneous judgement.
Advantages of the present invention
1st, the present invention can quickly know them in the case of a large amount of television advertising samples are given from real-time TV programme Do not come out.
2nd, present system advertisement head discrimination is high, and separating capacity is stronger, reduces manual intervention.
Brief description of the drawings
Fig. 1 is the system flow chart of the present invention;
Fig. 2 is MFCC extraction process;
Embodiment
Further to illustrate the present invention to reach the technological means and effect that predetermined goal of the invention is taken, below in conjunction with Accompanying drawing and preferred embodiment, to according to its embodiment, structure, feature and its effect proposed by the present invention, describing in detail As after.
A kind of television advertising based on audio/video information retrieval supervises broadcast system, including following six step:
(1) TV signal is gathered, and simulated television sound, vision signal are gathered using DirectShow technologies, with tune platform, in advance Look at, set audio sample attribute, sound card cache size, video frequency output size, adjustment video frame rate, the real-time gathered data that preserves to arrive A series of functions such as internal memory, gathered data to file;
(2) video file handle, advertising copy file is opened, played, reading attributes, audio frequency and video shunting, decoding, Read the sequence of operations such as audio, video data;
(3) audio feature extraction, it is inverse to voice data progress noise reduction, filtering, preemphasis, adding window, Fourier transform, cosine The audio frequency characteristics such as conversion, extraction zero-crossing rate, Jing Yin rate, short-time energy, MFCC coefficients;
(4) image characteristics extraction, noise reduction, RGB color to the conversion of hsv color space, RGB are carried out to the image of input Color space is to HSI color space conversions, extraction color histogram, color moment, the graphic feature such as color convergence vector;
(5) pattern-recognition, using " one-to-one " multi-class sorting technique of SVMs, input training sample is carried out Training, identification model is established, preserve identification model parameter to file;Can be to the list that inputs in real time using the identification model of foundation Individual forecast sample classification, non real-time prediction can also be carried out by the file path for the forecast sample for giving high-volume set form Classification.
(6) result exports, and the advertisement gone out to Real time identification preserves, shows that it is identified as which advertisement, time started, terminates Time, if the data such as erroneous judgement.
TV signal acquisition step material requested and form in above-mentioned steps (1) is as follows.
(a) the television acquisition card that the system uses presses king's TV firmly for the quick jade for asking rain of showing disdain in the day of Tian Min developments in science and technology Co., Ltd Capture card;Sound card is sound card built in mainboard;The processor of computer is Intel CPU, frequency 2.6GB, inside saves as 512MB, firmly Disk capacity is 80G, video memory 128MB.Main development tools are used as using VC 6.0.
(b) advertising copy that the system uses using aid Ulead VideoStudio 9.0 from recording CCTV-1 and CCTV-6 includes the video of advertisement, is split in file, chooses 150 advertisement conducts having the property that Sample advertisement:Beginning does not have audio or audio identical image is different or the identical audio of image is different.Preserve advertising copy Audio sample rate be 11025Hz, 16bit, monophonic, MPEGLayer-3 compressed formats, video frame rate be 25 frames/second, The avi file of 24bit sample sizes, image size 80*60, DIVXMPEG4V3 compressed format.
(c) using DirectShow, day it is quick show disdain for jade for asking rain press king's capture card firmly and VC++6.0 realize to television audio signals and Video signal collective, and the real-time processing of data.The voice data gathered in real time is sample rate 11025Hz, 16bit, monophone Road, PCM format, video frame rate are 25 frames/second, image size 80*60,24 bitmap formats.
Audio feature extraction step and principle in above-mentioned steps (3) is as follows.
The audio sampling frequency handled in the present system is 11.025KHZ, sampling word length 16bit, monophonic, audio frame are It is N/t (N is audio sampling frequency, and t is video frame rate) individual sampled point that 1024 sampled points, frame, which move,;Preemphasis to lift high frequency, To signal adding window to avoid the influence at Short Time Speech section edge.Grown below with a section audio data, wherein N for audio.
The definition of preemphasis is as follows:
xi=xi-axi-1 0.9≤a≤1.0
Parameter a takes 0.95.The purpose of preemphasis is to make up the radio-frequency head that audio signal is constrained by articulatory system Point.Sound can be relatively sharp clearer and more melodious after preemphasis processing, and volume have also been smaller.Adding window is defined as follows:
si=xiw(i)
Wherein w is window function, and Hamming window functions are more common one:
MFCC extraction process is as shown in Figure 2.Fast Fourier Transform (FFT) is carried out first, after Fourier transformation, is obtained To complex result.Then the energy of each frequency is calculated, then it is taken the logarithm.Build M Mel bandpass filters H (M), o (m), c (m), h (m) are lower limit, center and the upper limiting frequency of m-th of triangular filter respectively, then between adjacent triangular filter Lower limit, center and upper limiting frequency have following relation:
C (m)=o (m+1)=h (m-1), m=1,2 ..., M
Wherein c (m) is to be spacedly distributed on Mel frequency spectrums, i.e.,
, wherein fmel-hAnd fmel-hObtained by lower formula.
Mel (f)=2595*log10(1+f/700)
The frequency response of m-th of triangular filter is defined by formula:
Wherein
For S3(k), for there is M wave filter to export M result, wherein the output of m-th of wave filter is:
Inverse discrete cosine transformation will be carried out above by the logarithmic energy of M Mel bandpass filter, obtain L MFCC system Number, general L take 12 to 16 or so, and MFCC coefficients are:
L=16 in the present system.
Image characteristics extraction step and principle in above-mentioned steps (4) is as follows.
The system is using color histogram as characteristics of image.Color histogram is the one kind for representing distribution of color in image Statistical value, its transverse axis represent color quantizing dimension, and the longitudinal axis represents what is occurred with color value in identical image in quantized interval Number, it is defined as follows:If image I sizes are w × h, the R component color quantizing value in wherein image I is r1,r2,....., rm.For p=(x, y) ∈ I, r is made(p)Represent this component color values, i.e. Ir=(p | r(p)=r).So for component color ri, i ∈ m, image I R component color histogram is:
If image I three components (R, G, B) are quantified as l, m, n respectively, then image I color histogram is:
Three element quantization series are all 8 when extracting image RGB color histogram in the present system, then total intrinsic dimensionality K=8+8+8.Image size is 80*60, can so improve the influence for calculating histogram speed and reducing TV signal noise.
Pattern recognition step and principle in above-mentioned steps (5) is as follows.
The system carrys out train classification models using SVMs.Start three frame audios to advertisement first and image respectively extracts Feature, it is designated as:
ATi={ at1,at2,...,atn},VTi={ vt1,vt2,...,vtm}
Make Ti={+k, ATi,VTi}={+k, at1,at2,...,atn,vt1,vt2,...,vtm}
Wherein i=1,2,3;N=1,2 ..., 16;M=1,2 ... 24;
+ k is advertisement class label.N*3 characteristic vector can be obtained for n advertising copy:
Ti(i=1,2 ..., n*3).
Herein using one-to-one multi-class classification method, for k classes problem, it is necessary to build k (k-1)/2 decision-making letter Number.If fm,n(x) it is as follows for the decision function of m classes and the n-th class, its building process.
Obtain optimum solutionIt is as follows that optimal deviation can be obtained:
Wherein, x+1And x-1It is any one supporting vector from+1 He -1 class.Construct decision function:
Wherein l=240=6 × (16+24).
Establish in aforementioned manners after disaggregated model, it is necessary to constantly repeat the above steps adjustment kernel function and penalty coefficient. Final the system uses gaussian radial basis function (RBF), Gamma=3.0517578125e-0.005 and penalty coefficient C= 0.3125。

Claims (5)

1. a kind of television advertising based on audio/video information retrieval supervises broadcast system, it is characterised in that comprises the following steps:
(1) TV signal is gathered, and simulated television sound, vision signal are gathered using DirectShow technologies, have adjust platform, preview, Audio sample attribute, sound card cache size, video frequency output size, adjustment video frame rate, the real-time gathered data that preserves are set in Deposit, a series of functions such as gathered data to file;
(2) video file is handled, and advertising copy file is opened, played, reading attributes, audio frequency and video shunt, decode, read The sequence of operations such as audio, video data;
(3) audio feature extraction, noise reduction, filtering, preemphasis, adding window, Fourier transform, cosine inversion are carried out to voice data Change, the audio frequency characteristics such as extraction zero-crossing rate, Jing Yin rate, short-time energy, MFCC coefficients;
(4) image characteristics extraction, noise reduction, RGB color to the conversion of hsv color space, RGB color are carried out to the image of input Space is to HSI color space conversions, extraction color histogram, color moment, the graphic feature such as color convergence vector;
(5) pattern-recognition, using " one-to-one " multi-class sorting technique of SVMs, input training sample is instructed Practice, establish identification model, preserve identification model parameter to file;Can be single to what is inputted in real time using the identification model of foundation Forecast sample is classified, and can also carry out non real-time prediction point by the file path for the forecast sample for giving high-volume set form Class.
(6) result exports, and the advertisement gone out to Real time identification preserves, shows that it is identified as which advertisement, time started, at the end of Between, if the data such as erroneous judgement.
2. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that TV signal acquisition step and material requested and form are as follows in the step (1).
The television acquisition card that (2a) the system uses presses king's TV to adopt firmly for the quick jade for asking rain of showing disdain in the day of Tian Min developments in science and technology Co., Ltd Truck;Sound card is sound card built in mainboard;The processor of computer is Intel CPU, frequency 2.6GB, inside saves as 512MB, hard disk Capacity is 80G, video memory 128MB.Main development tools are used as using VC 6.0.
The advertising copy that (2b) the system uses is to utilize CCTV-1s of the aid Ulead VideoStudio 9.0 from recording The video of advertisement is included with CCTV-6, is split in file, it is wide as sample to choose 150 advertisements having the property that Accuse:Beginning does not have audio or audio identical image is different or the identical audio of image is different.Advertising copy audio is preserved to adopt Sample rate is 11025Hz, 16bit, monophonic, MPEGLayer-3 compressed formats, and video frame rate is 25 frames/second, 24bit samplings The avi file of size, image size 80*60, DIVXMPEG4V3 compressed format.
(2c) presses king's capture card and VC++6.0 to realize to television audio signals and regard firmly using DirectShow, the quick jade for asking rain of showing disdain in day Frequency signal acquisition, and the real-time processing of data.The voice data gathered in real time be sample rate 11025Hz, 16bit, monophonic, PCM format, video frame rate are 25 frames/second, image size 80*60,24 bitmap formats.
3. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that Step (3) the sound intermediate frequency characteristic extraction step and principle are as follows.
The audio sampling frequency that (3a) is handled in the present system is 11.025KHZ, sampling word length 16bit, monophonic, audio frame are It is N/t (N is audio sampling frequency, and t is video frame rate) individual sampled point that 1024 sampled points, frame, which move,;Preemphasis to lift high frequency, To signal adding window to avoid the influence at Short Time Speech section edge.Grown below with a section audio data, wherein N for audio.
The definition of preemphasis is as follows:
xi=xi-axi-1 0.9≤a≤1.0
Parameter a takes 0.95.The purpose of preemphasis is to make up the HFS that audio signal is constrained by articulatory system.Sound Sound can be relatively sharp clearer and more melodious after preemphasis processing, and volume have also been smaller.Adding window is defined as follows:
si=xiw(i)
Wherein w is window function, and Hamming window functions are more common one:
<mrow> <mi>w</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>0.54</mn> <mo>-</mo> <mn>0.46</mn> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mn>2</mn> <mi>&amp;pi;</mi> <mo>&amp;CenterDot;</mo> <mi>i</mi> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>,</mo> <mn>0</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>&amp;le;</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow>
(3b) MFCC extraction process is as shown in Figure 2.Fast Fourier Transform (FFT) is carried out first, after Fourier transformation, is obtained To complex result.Then the energy of each frequency is calculated, then it is taken the logarithm.Build M Mel bandpass filters H (M), o (m), c (m), h (m) are lower limit, center and the upper limiting frequency of m-th of triangular filter respectively, then between adjacent triangular filter Lower limit, center and upper limiting frequency have following relation:
C (m)=o (m+1)=h (m-1), m=1,2 ..., M
Wherein c (m) is to be spacedly distributed on Mel frequency spectrums, i.e.,
<mrow> <mi>c</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msub> <mi>f</mi> <mrow> <mi>m</mi> <mi>e</mi> <mi>l</mi> <mo>-</mo> <mi>h</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>f</mi> <mrow> <mi>m</mi> <mi>e</mi> <mi>l</mi> <mo>-</mo> <mi>l</mi> </mrow> </msub> </mrow> <mrow> <mi>M</mi> <mo>+</mo> <mn>1</mn> </mrow> </mfrac> <mo>*</mo> <mi>m</mi> </mrow> ,
Wherein fmel-hAnd fmel-hObtained by lower formula.
Mel (f)=2595*log10(1+f/700)
The frequency response of m-th of triangular filter is defined by formula:
<mrow> <msub> <mi>H</mi> <mi>m</mi> </msub> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <mi>k</mi> <mo>&lt;</mo> <mi>o</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mi>k</mi> <mo>-</mo> <mi>o</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>c</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>o</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mi>o</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> <mo>&amp;le;</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>c</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>k</mi> </mrow> <mrow> <mi>h</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>c</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>,</mo> <mi>c</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> <mo>&amp;le;</mo> <mi>k</mi> <mo>&amp;le;</mo> <mi>h</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> <mo>,</mo> <mi>k</mi> <mo>&gt;</mo> <mi>h</mi> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mtd> </mtr> </mtable> </mfenced> </mrow>
Wherein
For S3(k), for there is M wave filter to export M result, wherein the output of m-th of wave filter is:
Inverse discrete cosine transformation will be carried out above by the logarithmic energy of M Mel bandpass filter, and obtain L MFCC coefficient, one As L take 12 to 16 or so, MFCC coefficients are:
<mrow> <mi>M</mi> <mi>F</mi> <mi>C</mi> <mi>C</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>S</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>*</mo> <mo>(</mo> <mrow> <mi>i</mi> <mo>+</mo> <mn>0.5</mn> </mrow> <mo>)</mo> <mo>*</mo> <mi>&amp;pi;</mi> <mo>/</mo> <mi>M</mi> <mo>)</mo> </mrow> <mo>,</mo> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mi>L</mi> </mrow>
L=16 in the present system.
4. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that Image characteristics extraction step and principle are as follows in the step (4).
(4a) the system is using color histogram as characteristics of image.Color histogram is the one kind for representing distribution of color in image Statistical value, its transverse axis represent color quantizing dimension, and the longitudinal axis represents what is occurred with color value in identical image in quantized interval Number, it is defined as follows:If image I sizes are w × h, the R component color quantizing value in wherein image I is r1,r2,....., rm.For p=(x, y) ∈ I, r is made(p)Represent this component color values, i.e. Ir=(p | r(p)=r).So for component color ri, i ∈ m, image I R component color histogram is:
<mrow> <msub> <mi>H</mi> <msub> <mi>r</mi> <mi>i</mi> </msub> </msub> <mrow> <mo>(</mo> <mi>I</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>|</mo> <msub> <mi>I</mi> <msub> <mi>r</mi> <mi>i</mi> </msub> </msub> <mo>|</mo> </mrow> 2
If image I three components (R, G, B) are quantified as l, m, n respectively, then image I color histogram is:
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>H</mi> <mo>=</mo> <mo>{</mo> <msub> <mi>H</mi> <mi>r</mi> </msub> <mo>,</mo> <msub> <mi>H</mi> <mi>g</mi> </msub> <mo>,</mo> <msub> <mi>H</mi> <mi>b</mi> </msub> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mo>{</mo> <msub> <mi>H</mi> <msub> <mi>r</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>H</mi> <msub> <mi>r</mi> <mi>l</mi> </msub> </msub> <mo>,</mo> <msub> <mi>H</mi> <msub> <mi>g</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>H</mi> <msub> <mi>g</mi> <mi>m</mi> </msub> </msub> <mo>,</mo> <msub> <mi>H</mi> <msub> <mi>b</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>H</mi> <msub> <mi>b</mi> <mi>n</mi> </msub> </msub> <mo>}</mo> </mrow> </mtd> </mtr> </mtable> </mfenced>
Three element quantization series are all 8 when (4b) extracts image RGB color histogram in the present system, then total intrinsic dimensionality K=8+8+8.Image size is 80*60, can so improve the influence for calculating histogram speed and reducing TV signal noise.
5. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that Pattern recognition step and principle are as follows in the step (4).
(5a) the system carrys out train classification models using SVMs.Start three frame audios to advertisement first and image respectively extracts Feature, it is designated as:
ATi={ at1,at2,...,atn},VTi={ vt1,vt2,...,vtm}
Make Ti={+k, ATi,VTi}={+k, at1,at2,...,atn,vt1,vt2,...,vtm}
Wherein i=1,2,3;N=1,2 ..., 16;M=1,2 ... 24;
+ k is advertisement class label.N*3 characteristic vector can be obtained for n advertising copy:
Ti(i=1,2 ..., n*3).
(5b) herein using one-to-one multi-class classification method, for k classes problem, it is necessary to build k (k-1)/2 decision-making letter Number.If fm,n(x) it is as follows for the decision function of m classes and the n-th class, its building process.
Obtain optimum solutionIt is as follows that optimal deviation can be obtained:
Wherein, x+1And x-1It is any one supporting vector from+1 He -1 class.Construct decision function:
Wherein l=240=6 × (16+24).
(5c) is established in aforementioned manners after disaggregated model, it is necessary to constantly repeat the above steps adjustment kernel function and penalty coefficient. Final the system uses gaussian radial basis function (RBF), Gamma=3.0517578125e-0.005 and penalty coefficient C= 0.3125。
CN201710648059.5A 2017-08-01 2017-08-01 A kind of television advertising based on audio/video information retrieval supervises broadcast system Pending CN107426585A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710648059.5A CN107426585A (en) 2017-08-01 2017-08-01 A kind of television advertising based on audio/video information retrieval supervises broadcast system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710648059.5A CN107426585A (en) 2017-08-01 2017-08-01 A kind of television advertising based on audio/video information retrieval supervises broadcast system

Publications (1)

Publication Number Publication Date
CN107426585A true CN107426585A (en) 2017-12-01

Family

ID=60436444

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710648059.5A Pending CN107426585A (en) 2017-08-01 2017-08-01 A kind of television advertising based on audio/video information retrieval supervises broadcast system

Country Status (1)

Country Link
CN (1) CN107426585A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108156518A (en) * 2017-12-26 2018-06-12 上海亿动信息技术有限公司 A kind of method and device that advertisement progress advertisement orientation dispensing is paid close attention to by user
CN108320190A (en) * 2018-03-05 2018-07-24 北京电广聪信息技术有限公司 A kind of audio collecting system and method
CN108428150A (en) * 2018-03-05 2018-08-21 北京电广聪信息技术有限公司 A method of being used for advertisement audio feature extraction
CN108460633A (en) * 2018-03-05 2018-08-28 北京电广聪信息技术有限公司 A kind of method for building up and application thereof of advertisement audio collection identifying system
CN108540833A (en) * 2018-04-16 2018-09-14 北京交通大学 A kind of television advertising recognition methods based on camera lens
CN108882016A (en) * 2018-07-31 2018-11-23 成都华栖云科技有限公司 A kind of method and system that video gene data extracts
CN111920390A (en) * 2020-09-15 2020-11-13 成都启英泰伦科技有限公司 Snore detection method based on embedded terminal
CN112437340A (en) * 2020-11-13 2021-03-02 广东省广播电视局 Method and system for determining whether variant long advertisements exist in audio and video
CN113299281A (en) * 2021-05-24 2021-08-24 青岛科技大学 Driver sharp high pitch recognition early warning method and system based on acoustic text fusion
CN113627363A (en) * 2021-08-13 2021-11-09 百度在线网络技术(北京)有限公司 Video file processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080028A (en) * 2006-05-25 2007-11-28 北大方正集团有限公司 An advertisement video detection method
CN102799605A (en) * 2012-05-02 2012-11-28 天脉聚源(北京)传媒科技有限公司 Method and system for monitoring advertisement broadcast
KR20130094891A (en) * 2012-02-17 2013-08-27 진준형 Advertisement monitor having multi recognition function
CN103617263A (en) * 2013-11-29 2014-03-05 安徽大学 Television advertisement film automatic detection method based on multi-mode characteristics
CN103780916A (en) * 2012-10-25 2014-05-07 合肥林晨信息科技有限公司 Digital television advertisement intelligent identification system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080028A (en) * 2006-05-25 2007-11-28 北大方正集团有限公司 An advertisement video detection method
KR20130094891A (en) * 2012-02-17 2013-08-27 진준형 Advertisement monitor having multi recognition function
CN102799605A (en) * 2012-05-02 2012-11-28 天脉聚源(北京)传媒科技有限公司 Method and system for monitoring advertisement broadcast
CN103780916A (en) * 2012-10-25 2014-05-07 合肥林晨信息科技有限公司 Digital television advertisement intelligent identification system
CN103617263A (en) * 2013-11-29 2014-03-05 安徽大学 Television advertisement film automatic detection method based on multi-mode characteristics

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108156518A (en) * 2017-12-26 2018-06-12 上海亿动信息技术有限公司 A kind of method and device that advertisement progress advertisement orientation dispensing is paid close attention to by user
CN108460633B (en) * 2018-03-05 2022-06-03 北京明略昭辉科技有限公司 Method for establishing advertisement audio acquisition and identification system and application thereof
CN108320190A (en) * 2018-03-05 2018-07-24 北京电广聪信息技术有限公司 A kind of audio collecting system and method
CN108428150A (en) * 2018-03-05 2018-08-21 北京电广聪信息技术有限公司 A method of being used for advertisement audio feature extraction
CN108460633A (en) * 2018-03-05 2018-08-28 北京电广聪信息技术有限公司 A kind of method for building up and application thereof of advertisement audio collection identifying system
CN108540833A (en) * 2018-04-16 2018-09-14 北京交通大学 A kind of television advertising recognition methods based on camera lens
CN108882016A (en) * 2018-07-31 2018-11-23 成都华栖云科技有限公司 A kind of method and system that video gene data extracts
CN111920390A (en) * 2020-09-15 2020-11-13 成都启英泰伦科技有限公司 Snore detection method based on embedded terminal
CN112437340A (en) * 2020-11-13 2021-03-02 广东省广播电视局 Method and system for determining whether variant long advertisements exist in audio and video
CN112437340B (en) * 2020-11-13 2023-02-21 广东省广播电视局 Method and system for determining whether variant long advertisements exist in audio and video
CN113299281A (en) * 2021-05-24 2021-08-24 青岛科技大学 Driver sharp high pitch recognition early warning method and system based on acoustic text fusion
CN113627363A (en) * 2021-08-13 2021-11-09 百度在线网络技术(北京)有限公司 Video file processing method, device, equipment and storage medium
CN113627363B (en) * 2021-08-13 2023-08-15 百度在线网络技术(北京)有限公司 Video file processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107426585A (en) A kind of television advertising based on audio/video information retrieval supervises broadcast system
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN101546556B (en) Classification system for identifying audio content
CN110633725B (en) Method and device for training classification model and classification method and device
CN109493881B (en) Method and device for labeling audio and computing equipment
CN106952644A (en) A kind of complex audio segmentation clustering method based on bottleneck characteristic
CN111488489A (en) Video file classification method, device, medium and electronic equipment
WO2016155047A1 (en) Method of recognizing sound event in auditory scene having low signal-to-noise ratio
CN1979491A (en) Method for music mood classification and system thereof
CN113763965B (en) Speaker identification method with multiple attention feature fusion
CN109684506A (en) A kind of labeling processing method of video, device and calculate equipment
CN112819020A (en) Method and device for training classification model and classification method
Benamer et al. Database for arabic speech commands recognition
CN115798459B (en) Audio processing method and device, storage medium and electronic equipment
Omar et al. Fourier Domain Kernel Density Estimation-based Approach for Hail Sound classification
CN110767248B (en) Anti-modulation interference audio fingerprint extraction method
CN103366753A (en) Moving picture experts group audio layer-3 (MP3) audio double-compression detection method under same code rate
CN113516987B (en) Speaker recognition method, speaker recognition device, storage medium and equipment
Yu Research on music emotion classification based on CNN-LSTM network
CN111312215A (en) Natural speech emotion recognition method based on convolutional neural network and binaural representation
CN117558281A (en) Speaker identification method and system based on enhanced self-supervision framework
Singh et al. Application of different filters in mel frequency cepstral coefficients feature extraction and fuzzy vector quantization approach in speaker recognition
Saz et al. Background-tracking acoustic features for genre identification of broadcast shows
Büker et al. Double compressed AMR audio detection using long-term features and deep neural networks
Dong et al. Application of voiceprint recognition based on improved ecapa-tdnn

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171201