CN107426585A - A kind of television advertising based on audio/video information retrieval supervises broadcast system - Google Patents
A kind of television advertising based on audio/video information retrieval supervises broadcast system Download PDFInfo
- Publication number
- CN107426585A CN107426585A CN201710648059.5A CN201710648059A CN107426585A CN 107426585 A CN107426585 A CN 107426585A CN 201710648059 A CN201710648059 A CN 201710648059A CN 107426585 A CN107426585 A CN 107426585A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- audio
- image
- color
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims abstract description 8
- 230000009467 reduction Effects 0.000 claims abstract description 8
- 238000005070 sampling Methods 0.000 claims abstract description 8
- 238000005516 engineering process Methods 0.000 claims abstract description 6
- 239000000284 extract Substances 0.000 claims abstract description 6
- 238000012706 support-vector machine Methods 0.000 claims abstract description 6
- 230000009466 transformation Effects 0.000 claims abstract description 5
- 238000001914 filtration Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 15
- 238000011161 development Methods 0.000 claims description 5
- 230000018109 developmental process Effects 0.000 claims description 5
- 238000003909 pattern recognition Methods 0.000 claims description 5
- 239000010977 jade Substances 0.000 claims description 4
- 230000005236 sound signal Effects 0.000 claims description 4
- 238000013145 classification model Methods 0.000 claims description 2
- 239000000463 material Substances 0.000 claims description 2
- 230000008569 process Effects 0.000 claims description 2
- 238000013139 quantization Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 claims 1
- 238000007405 data analysis Methods 0.000 abstract 1
- 230000011218 segmentation Effects 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
- H04N21/2407—Monitoring of transmitted content, e.g. distribution time, number of downloads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/812—Monomedia components thereof involving advertisement data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of television advertising based on audio/video information retrieval to supervise broadcast system.Use PV988AV specialty television acquisition cards and DirectShow technologies collection 6 one sections of videos for including advertisement of CCTV 1 and CCTV,Then manual segmentation goes out each advertisement as sample,150 advertisements are intercepted altogether,And start (first three frame) and ending (frame of tail three) progress data analysis to advertisement,Noise reduction is carried out to voice data,Filtering,Preemphasis,Adding window,Fourier transform,Cosine inverse transformation,Extract zero-crossing rate,Jing Yin rate,Short-time energy,The audio frequency characteristics such as MFCC coefficients,Noise reduction is carried out to the image of input,RGB color is changed to hsv color space,RGB color is to HSI color space conversions,Extract color histogram,Color moment,The graphic features such as color convergence vector,Extract Mel cepstrum coefficients and RGB color histogram,Using " one-to-one " multi-class sorting technique of SVMs,Sampling feature vectors are trained,Establish advertisement head identification model and advertisement tail identification model,And realize and the real-time prison of known advertisement is broadcast.
Description
Technical field
The present invention relates to AV treatment technology and technical field of computer vision, more particularly to one kind to be based on audio frequency and video
The television advertising prison broadcast system of information retrieval.
Background technology
With the development and opening of China's Broadcast Television Industry, the TV programme of TV station are increasingly various, as current electricity
The television advertising of the extra earning main source of television stations, higher requirement is proposed to the program of TV station.Advertisement detecting, load are broadcast
It is last link of advertising management, and a most important link.And with broadcast television content management domain
Deepen continuously, all kinds of digital audio/video frequency contents is fast in radio and television making issuing units and internet audio frequency and video community at different levels
Speed expansion, especially occupies very big proportion and workload to commercial advertisement management in TV industry.Users are accurate to video, multiple
Miscellaneous and personalized retrieval proposes requirement.Therefore the audio frequency and video searching system established under real-time status is studied to be very important.
The content of the invention
It is an object of the invention to provide a kind of television advertising based on audio/video information retrieval to supervise broadcast system, above-mentioned to solve
Technical problem, main technical content of the invention are as follows:
A kind of television advertising based on audio/video information retrieval supervises broadcast system, including following six step:
(1) TV signal is gathered, and simulated television sound, vision signal are gathered using DirectShow technologies, with tune platform, in advance
Look at, set audio sample attribute, sound card cache size, video frequency output size, adjustment video frame rate, the real-time gathered data that preserves to arrive
A series of functions such as internal memory, gathered data to file;
(2) video file handle, advertising copy file is opened, played, reading attributes, audio frequency and video shunting, decoding,
Read the sequence of operations such as audio, video data;
(3) audio feature extraction, it is inverse to voice data progress noise reduction, filtering, preemphasis, adding window, Fourier transform, cosine
The audio frequency characteristics such as conversion, extraction zero-crossing rate, Jing Yin rate, short-time energy, MFCC coefficients;
(4) image characteristics extraction, noise reduction, RGB color to the conversion of hsv color space, RGB are carried out to the image of input
Color space is to HSI color space conversions, extraction color histogram, color moment, the graphic feature such as color convergence vector;
(5) pattern-recognition, using " one-to-one " multi-class sorting technique of SVMs, input training sample is carried out
Training, identification model is established, preserve identification model parameter to file;Can be to the list that inputs in real time using the identification model of foundation
Individual forecast sample classification, non real-time prediction can also be carried out by the file path for the forecast sample for giving high-volume set form
Classification.
(6) result exports, and the advertisement gone out to Real time identification preserves, shows that it is identified as which advertisement, time started, terminates
Time, if the data such as erroneous judgement.
Advantages of the present invention
1st, the present invention can quickly know them in the case of a large amount of television advertising samples are given from real-time TV programme
Do not come out.
2nd, present system advertisement head discrimination is high, and separating capacity is stronger, reduces manual intervention.
Brief description of the drawings
Fig. 1 is the system flow chart of the present invention;
Fig. 2 is MFCC extraction process;
Embodiment
Further to illustrate the present invention to reach the technological means and effect that predetermined goal of the invention is taken, below in conjunction with
Accompanying drawing and preferred embodiment, to according to its embodiment, structure, feature and its effect proposed by the present invention, describing in detail
As after.
A kind of television advertising based on audio/video information retrieval supervises broadcast system, including following six step:
(1) TV signal is gathered, and simulated television sound, vision signal are gathered using DirectShow technologies, with tune platform, in advance
Look at, set audio sample attribute, sound card cache size, video frequency output size, adjustment video frame rate, the real-time gathered data that preserves to arrive
A series of functions such as internal memory, gathered data to file;
(2) video file handle, advertising copy file is opened, played, reading attributes, audio frequency and video shunting, decoding,
Read the sequence of operations such as audio, video data;
(3) audio feature extraction, it is inverse to voice data progress noise reduction, filtering, preemphasis, adding window, Fourier transform, cosine
The audio frequency characteristics such as conversion, extraction zero-crossing rate, Jing Yin rate, short-time energy, MFCC coefficients;
(4) image characteristics extraction, noise reduction, RGB color to the conversion of hsv color space, RGB are carried out to the image of input
Color space is to HSI color space conversions, extraction color histogram, color moment, the graphic feature such as color convergence vector;
(5) pattern-recognition, using " one-to-one " multi-class sorting technique of SVMs, input training sample is carried out
Training, identification model is established, preserve identification model parameter to file;Can be to the list that inputs in real time using the identification model of foundation
Individual forecast sample classification, non real-time prediction can also be carried out by the file path for the forecast sample for giving high-volume set form
Classification.
(6) result exports, and the advertisement gone out to Real time identification preserves, shows that it is identified as which advertisement, time started, terminates
Time, if the data such as erroneous judgement.
TV signal acquisition step material requested and form in above-mentioned steps (1) is as follows.
(a) the television acquisition card that the system uses presses king's TV firmly for the quick jade for asking rain of showing disdain in the day of Tian Min developments in science and technology Co., Ltd
Capture card;Sound card is sound card built in mainboard;The processor of computer is Intel CPU, frequency 2.6GB, inside saves as 512MB, firmly
Disk capacity is 80G, video memory 128MB.Main development tools are used as using VC 6.0.
(b) advertising copy that the system uses using aid Ulead VideoStudio 9.0 from recording
CCTV-1 and CCTV-6 includes the video of advertisement, is split in file, chooses 150 advertisement conducts having the property that
Sample advertisement:Beginning does not have audio or audio identical image is different or the identical audio of image is different.Preserve advertising copy
Audio sample rate be 11025Hz, 16bit, monophonic, MPEGLayer-3 compressed formats, video frame rate be 25 frames/second,
The avi file of 24bit sample sizes, image size 80*60, DIVXMPEG4V3 compressed format.
(c) using DirectShow, day it is quick show disdain for jade for asking rain press king's capture card firmly and VC++6.0 realize to television audio signals and
Video signal collective, and the real-time processing of data.The voice data gathered in real time is sample rate 11025Hz, 16bit, monophone
Road, PCM format, video frame rate are 25 frames/second, image size 80*60,24 bitmap formats.
Audio feature extraction step and principle in above-mentioned steps (3) is as follows.
The audio sampling frequency handled in the present system is 11.025KHZ, sampling word length 16bit, monophonic, audio frame are
It is N/t (N is audio sampling frequency, and t is video frame rate) individual sampled point that 1024 sampled points, frame, which move,;Preemphasis to lift high frequency,
To signal adding window to avoid the influence at Short Time Speech section edge.Grown below with a section audio data, wherein N for audio.
The definition of preemphasis is as follows:
xi=xi-axi-1 0.9≤a≤1.0
Parameter a takes 0.95.The purpose of preemphasis is to make up the radio-frequency head that audio signal is constrained by articulatory system
Point.Sound can be relatively sharp clearer and more melodious after preemphasis processing, and volume have also been smaller.Adding window is defined as follows:
si=xiw(i)
Wherein w is window function, and Hamming window functions are more common one:
MFCC extraction process is as shown in Figure 2.Fast Fourier Transform (FFT) is carried out first, after Fourier transformation, is obtained
To complex result.Then the energy of each frequency is calculated, then it is taken the logarithm.Build M Mel bandpass filters H (M), o
(m), c (m), h (m) are lower limit, center and the upper limiting frequency of m-th of triangular filter respectively, then between adjacent triangular filter
Lower limit, center and upper limiting frequency have following relation:
C (m)=o (m+1)=h (m-1), m=1,2 ..., M
Wherein c (m) is to be spacedly distributed on Mel frequency spectrums, i.e.,
, wherein fmel-hAnd fmel-hObtained by lower formula.
Mel (f)=2595*log10(1+f/700)
The frequency response of m-th of triangular filter is defined by formula:
Wherein
For S3(k), for there is M wave filter to export M result, wherein the output of m-th of wave filter is:
Inverse discrete cosine transformation will be carried out above by the logarithmic energy of M Mel bandpass filter, obtain L MFCC system
Number, general L take 12 to 16 or so, and MFCC coefficients are:
L=16 in the present system.
Image characteristics extraction step and principle in above-mentioned steps (4) is as follows.
The system is using color histogram as characteristics of image.Color histogram is the one kind for representing distribution of color in image
Statistical value, its transverse axis represent color quantizing dimension, and the longitudinal axis represents what is occurred with color value in identical image in quantized interval
Number, it is defined as follows:If image I sizes are w × h, the R component color quantizing value in wherein image I is r1,r2,.....,
rm.For p=(x, y) ∈ I, r is made(p)Represent this component color values, i.e. Ir=(p | r(p)=r).So for component color
ri, i ∈ m, image I R component color histogram is:
If image I three components (R, G, B) are quantified as l, m, n respectively, then image I color histogram is:
Three element quantization series are all 8 when extracting image RGB color histogram in the present system, then total intrinsic dimensionality
K=8+8+8.Image size is 80*60, can so improve the influence for calculating histogram speed and reducing TV signal noise.
Pattern recognition step and principle in above-mentioned steps (5) is as follows.
The system carrys out train classification models using SVMs.Start three frame audios to advertisement first and image respectively extracts
Feature, it is designated as:
ATi={ at1,at2,...,atn},VTi={ vt1,vt2,...,vtm}
Make Ti={+k, ATi,VTi}={+k, at1,at2,...,atn,vt1,vt2,...,vtm}
Wherein i=1,2,3;N=1,2 ..., 16;M=1,2 ... 24;
+ k is advertisement class label.N*3 characteristic vector can be obtained for n advertising copy:
Ti(i=1,2 ..., n*3).
Herein using one-to-one multi-class classification method, for k classes problem, it is necessary to build k (k-1)/2 decision-making letter
Number.If fm,n(x) it is as follows for the decision function of m classes and the n-th class, its building process.
Obtain optimum solutionIt is as follows that optimal deviation can be obtained:
Wherein, x+1And x-1It is any one supporting vector from+1 He -1 class.Construct decision function:
Wherein l=240=6 × (16+24).
Establish in aforementioned manners after disaggregated model, it is necessary to constantly repeat the above steps adjustment kernel function and penalty coefficient.
Final the system uses gaussian radial basis function (RBF), Gamma=3.0517578125e-0.005 and penalty coefficient C=
0.3125。
Claims (5)
1. a kind of television advertising based on audio/video information retrieval supervises broadcast system, it is characterised in that comprises the following steps:
(1) TV signal is gathered, and simulated television sound, vision signal are gathered using DirectShow technologies, have adjust platform, preview,
Audio sample attribute, sound card cache size, video frequency output size, adjustment video frame rate, the real-time gathered data that preserves are set in
Deposit, a series of functions such as gathered data to file;
(2) video file is handled, and advertising copy file is opened, played, reading attributes, audio frequency and video shunt, decode, read
The sequence of operations such as audio, video data;
(3) audio feature extraction, noise reduction, filtering, preemphasis, adding window, Fourier transform, cosine inversion are carried out to voice data
Change, the audio frequency characteristics such as extraction zero-crossing rate, Jing Yin rate, short-time energy, MFCC coefficients;
(4) image characteristics extraction, noise reduction, RGB color to the conversion of hsv color space, RGB color are carried out to the image of input
Space is to HSI color space conversions, extraction color histogram, color moment, the graphic feature such as color convergence vector;
(5) pattern-recognition, using " one-to-one " multi-class sorting technique of SVMs, input training sample is instructed
Practice, establish identification model, preserve identification model parameter to file;Can be single to what is inputted in real time using the identification model of foundation
Forecast sample is classified, and can also carry out non real-time prediction point by the file path for the forecast sample for giving high-volume set form
Class.
(6) result exports, and the advertisement gone out to Real time identification preserves, shows that it is identified as which advertisement, time started, at the end of
Between, if the data such as erroneous judgement.
2. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that
TV signal acquisition step and material requested and form are as follows in the step (1).
The television acquisition card that (2a) the system uses presses king's TV to adopt firmly for the quick jade for asking rain of showing disdain in the day of Tian Min developments in science and technology Co., Ltd
Truck;Sound card is sound card built in mainboard;The processor of computer is Intel CPU, frequency 2.6GB, inside saves as 512MB, hard disk
Capacity is 80G, video memory 128MB.Main development tools are used as using VC 6.0.
The advertising copy that (2b) the system uses is to utilize CCTV-1s of the aid Ulead VideoStudio 9.0 from recording
The video of advertisement is included with CCTV-6, is split in file, it is wide as sample to choose 150 advertisements having the property that
Accuse:Beginning does not have audio or audio identical image is different or the identical audio of image is different.Advertising copy audio is preserved to adopt
Sample rate is 11025Hz, 16bit, monophonic, MPEGLayer-3 compressed formats, and video frame rate is 25 frames/second, 24bit samplings
The avi file of size, image size 80*60, DIVXMPEG4V3 compressed format.
(2c) presses king's capture card and VC++6.0 to realize to television audio signals and regard firmly using DirectShow, the quick jade for asking rain of showing disdain in day
Frequency signal acquisition, and the real-time processing of data.The voice data gathered in real time be sample rate 11025Hz, 16bit, monophonic,
PCM format, video frame rate are 25 frames/second, image size 80*60,24 bitmap formats.
3. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that
Step (3) the sound intermediate frequency characteristic extraction step and principle are as follows.
The audio sampling frequency that (3a) is handled in the present system is 11.025KHZ, sampling word length 16bit, monophonic, audio frame are
It is N/t (N is audio sampling frequency, and t is video frame rate) individual sampled point that 1024 sampled points, frame, which move,;Preemphasis to lift high frequency,
To signal adding window to avoid the influence at Short Time Speech section edge.Grown below with a section audio data, wherein N for audio.
The definition of preemphasis is as follows:
xi=xi-axi-1 0.9≤a≤1.0
Parameter a takes 0.95.The purpose of preemphasis is to make up the HFS that audio signal is constrained by articulatory system.Sound
Sound can be relatively sharp clearer and more melodious after preemphasis processing, and volume have also been smaller.Adding window is defined as follows:
si=xiw(i)
Wherein w is window function, and Hamming window functions are more common one:
<mrow>
<mi>w</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mn>0.54</mn>
<mo>-</mo>
<mn>0.46</mn>
<mi>c</mi>
<mi>o</mi>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mrow>
<mn>2</mn>
<mi>&pi;</mi>
<mo>&CenterDot;</mo>
<mi>i</mi>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mn>0</mn>
<mo>&le;</mo>
<mi>i</mi>
<mo>&le;</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
(3b) MFCC extraction process is as shown in Figure 2.Fast Fourier Transform (FFT) is carried out first, after Fourier transformation, is obtained
To complex result.Then the energy of each frequency is calculated, then it is taken the logarithm.Build M Mel bandpass filters H (M), o
(m), c (m), h (m) are lower limit, center and the upper limiting frequency of m-th of triangular filter respectively, then between adjacent triangular filter
Lower limit, center and upper limiting frequency have following relation:
C (m)=o (m+1)=h (m-1), m=1,2 ..., M
Wherein c (m) is to be spacedly distributed on Mel frequency spectrums, i.e.,
<mrow>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>f</mi>
<mrow>
<mi>m</mi>
<mi>e</mi>
<mi>l</mi>
<mo>-</mo>
<mi>h</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>f</mi>
<mrow>
<mi>m</mi>
<mi>e</mi>
<mi>l</mi>
<mo>-</mo>
<mi>l</mi>
</mrow>
</msub>
</mrow>
<mrow>
<mi>M</mi>
<mo>+</mo>
<mn>1</mn>
</mrow>
</mfrac>
<mo>*</mo>
<mi>m</mi>
</mrow>
,
Wherein fmel-hAnd fmel-hObtained by lower formula.
Mel (f)=2595*log10(1+f/700)
The frequency response of m-th of triangular filter is defined by formula:
<mrow>
<msub>
<mi>H</mi>
<mi>m</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mn>0</mn>
<mo>,</mo>
<mi>k</mi>
<mo><</mo>
<mi>o</mi>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mtd>
</mtr>
<mtr>
<mtd>
<mfrac>
<mrow>
<mi>k</mi>
<mo>-</mo>
<mi>o</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>o</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>,</mo>
<mi>o</mi>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
<mo>&le;</mo>
<mi>k</mi>
<mo>&le;</mo>
<mi>c</mi>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mtd>
</mtr>
<mtr>
<mtd>
<mfrac>
<mrow>
<mi>h</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>k</mi>
</mrow>
<mrow>
<mi>h</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>,</mo>
<mi>c</mi>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
<mo>&le;</mo>
<mi>k</mi>
<mo>&le;</mo>
<mi>h</mi>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mtd>
</mtr>
<mtr>
<mtd>
<mn>0</mn>
<mo>,</mo>
<mi>k</mi>
<mo>></mo>
<mi>h</mi>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mtd>
</mtr>
</mtable>
</mfenced>
</mrow>
Wherein
For S3(k), for there is M wave filter to export M result, wherein the output of m-th of wave filter is:
Inverse discrete cosine transformation will be carried out above by the logarithmic energy of M Mel bandpass filter, and obtain L MFCC coefficient, one
As L take 12 to 16 or so, MFCC coefficients are:
<mrow>
<mi>M</mi>
<mi>F</mi>
<mi>C</mi>
<mi>C</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<msub>
<mi>S</mi>
<mn>4</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mi>c</mi>
<mi>o</mi>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>*</mo>
<mo>(</mo>
<mrow>
<mi>i</mi>
<mo>+</mo>
<mn>0.5</mn>
</mrow>
<mo>)</mo>
<mo>*</mo>
<mi>&pi;</mi>
<mo>/</mo>
<mi>M</mi>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mn>2</mn>
<mo>,</mo>
<mo>...</mo>
<mi>L</mi>
</mrow>
L=16 in the present system.
4. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that
Image characteristics extraction step and principle are as follows in the step (4).
(4a) the system is using color histogram as characteristics of image.Color histogram is the one kind for representing distribution of color in image
Statistical value, its transverse axis represent color quantizing dimension, and the longitudinal axis represents what is occurred with color value in identical image in quantized interval
Number, it is defined as follows:If image I sizes are w × h, the R component color quantizing value in wherein image I is r1,r2,.....,
rm.For p=(x, y) ∈ I, r is made(p)Represent this component color values, i.e. Ir=(p | r(p)=r).So for component color
ri, i ∈ m, image I R component color histogram is:
<mrow>
<msub>
<mi>H</mi>
<msub>
<mi>r</mi>
<mi>i</mi>
</msub>
</msub>
<mrow>
<mo>(</mo>
<mi>I</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mo>|</mo>
<msub>
<mi>I</mi>
<msub>
<mi>r</mi>
<mi>i</mi>
</msub>
</msub>
<mo>|</mo>
</mrow>
2
If image I three components (R, G, B) are quantified as l, m, n respectively, then image I color histogram is:
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>H</mi>
<mo>=</mo>
<mo>{</mo>
<msub>
<mi>H</mi>
<mi>r</mi>
</msub>
<mo>,</mo>
<msub>
<mi>H</mi>
<mi>g</mi>
</msub>
<mo>,</mo>
<msub>
<mi>H</mi>
<mi>b</mi>
</msub>
<mo>)</mo>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>=</mo>
<mo>{</mo>
<msub>
<mi>H</mi>
<msub>
<mi>r</mi>
<mn>1</mn>
</msub>
</msub>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<msub>
<mi>H</mi>
<msub>
<mi>r</mi>
<mi>l</mi>
</msub>
</msub>
<mo>,</mo>
<msub>
<mi>H</mi>
<msub>
<mi>g</mi>
<mn>1</mn>
</msub>
</msub>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<msub>
<mi>H</mi>
<msub>
<mi>g</mi>
<mi>m</mi>
</msub>
</msub>
<mo>,</mo>
<msub>
<mi>H</mi>
<msub>
<mi>b</mi>
<mn>1</mn>
</msub>
</msub>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<msub>
<mi>H</mi>
<msub>
<mi>b</mi>
<mi>n</mi>
</msub>
</msub>
<mo>}</mo>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Three element quantization series are all 8 when (4b) extracts image RGB color histogram in the present system, then total intrinsic dimensionality
K=8+8+8.Image size is 80*60, can so improve the influence for calculating histogram speed and reducing TV signal noise.
5. a kind of television advertising based on audio/video information retrieval according to claim 1 supervises broadcast system, it is characterised in that
Pattern recognition step and principle are as follows in the step (4).
(5a) the system carrys out train classification models using SVMs.Start three frame audios to advertisement first and image respectively extracts
Feature, it is designated as:
ATi={ at1,at2,...,atn},VTi={ vt1,vt2,...,vtm}
Make Ti={+k, ATi,VTi}={+k, at1,at2,...,atn,vt1,vt2,...,vtm}
Wherein i=1,2,3;N=1,2 ..., 16;M=1,2 ... 24;
+ k is advertisement class label.N*3 characteristic vector can be obtained for n advertising copy:
Ti(i=1,2 ..., n*3).
(5b) herein using one-to-one multi-class classification method, for k classes problem, it is necessary to build k (k-1)/2 decision-making letter
Number.If fm,n(x) it is as follows for the decision function of m classes and the n-th class, its building process.
Obtain optimum solutionIt is as follows that optimal deviation can be obtained:
Wherein, x+1And x-1It is any one supporting vector from+1 He -1 class.Construct decision function:
Wherein l=240=6 × (16+24).
(5c) is established in aforementioned manners after disaggregated model, it is necessary to constantly repeat the above steps adjustment kernel function and penalty coefficient.
Final the system uses gaussian radial basis function (RBF), Gamma=3.0517578125e-0.005 and penalty coefficient C=
0.3125。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710648059.5A CN107426585A (en) | 2017-08-01 | 2017-08-01 | A kind of television advertising based on audio/video information retrieval supervises broadcast system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710648059.5A CN107426585A (en) | 2017-08-01 | 2017-08-01 | A kind of television advertising based on audio/video information retrieval supervises broadcast system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107426585A true CN107426585A (en) | 2017-12-01 |
Family
ID=60436444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710648059.5A Pending CN107426585A (en) | 2017-08-01 | 2017-08-01 | A kind of television advertising based on audio/video information retrieval supervises broadcast system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107426585A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108156518A (en) * | 2017-12-26 | 2018-06-12 | 上海亿动信息技术有限公司 | A kind of method and device that advertisement progress advertisement orientation dispensing is paid close attention to by user |
CN108320190A (en) * | 2018-03-05 | 2018-07-24 | 北京电广聪信息技术有限公司 | A kind of audio collecting system and method |
CN108428150A (en) * | 2018-03-05 | 2018-08-21 | 北京电广聪信息技术有限公司 | A method of being used for advertisement audio feature extraction |
CN108460633A (en) * | 2018-03-05 | 2018-08-28 | 北京电广聪信息技术有限公司 | A kind of method for building up and application thereof of advertisement audio collection identifying system |
CN108540833A (en) * | 2018-04-16 | 2018-09-14 | 北京交通大学 | A kind of television advertising recognition methods based on camera lens |
CN108882016A (en) * | 2018-07-31 | 2018-11-23 | 成都华栖云科技有限公司 | A kind of method and system that video gene data extracts |
CN111920390A (en) * | 2020-09-15 | 2020-11-13 | 成都启英泰伦科技有限公司 | Snore detection method based on embedded terminal |
CN112437340A (en) * | 2020-11-13 | 2021-03-02 | 广东省广播电视局 | Method and system for determining whether variant long advertisements exist in audio and video |
CN113299281A (en) * | 2021-05-24 | 2021-08-24 | 青岛科技大学 | Driver sharp high pitch recognition early warning method and system based on acoustic text fusion |
CN113627363A (en) * | 2021-08-13 | 2021-11-09 | 百度在线网络技术(北京)有限公司 | Video file processing method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101080028A (en) * | 2006-05-25 | 2007-11-28 | 北大方正集团有限公司 | An advertisement video detection method |
CN102799605A (en) * | 2012-05-02 | 2012-11-28 | 天脉聚源(北京)传媒科技有限公司 | Method and system for monitoring advertisement broadcast |
KR20130094891A (en) * | 2012-02-17 | 2013-08-27 | 진준형 | Advertisement monitor having multi recognition function |
CN103617263A (en) * | 2013-11-29 | 2014-03-05 | 安徽大学 | Television advertisement film automatic detection method based on multi-mode characteristics |
CN103780916A (en) * | 2012-10-25 | 2014-05-07 | 合肥林晨信息科技有限公司 | Digital television advertisement intelligent identification system |
-
2017
- 2017-08-01 CN CN201710648059.5A patent/CN107426585A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101080028A (en) * | 2006-05-25 | 2007-11-28 | 北大方正集团有限公司 | An advertisement video detection method |
KR20130094891A (en) * | 2012-02-17 | 2013-08-27 | 진준형 | Advertisement monitor having multi recognition function |
CN102799605A (en) * | 2012-05-02 | 2012-11-28 | 天脉聚源(北京)传媒科技有限公司 | Method and system for monitoring advertisement broadcast |
CN103780916A (en) * | 2012-10-25 | 2014-05-07 | 合肥林晨信息科技有限公司 | Digital television advertisement intelligent identification system |
CN103617263A (en) * | 2013-11-29 | 2014-03-05 | 安徽大学 | Television advertisement film automatic detection method based on multi-mode characteristics |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108156518A (en) * | 2017-12-26 | 2018-06-12 | 上海亿动信息技术有限公司 | A kind of method and device that advertisement progress advertisement orientation dispensing is paid close attention to by user |
CN108460633B (en) * | 2018-03-05 | 2022-06-03 | 北京明略昭辉科技有限公司 | Method for establishing advertisement audio acquisition and identification system and application thereof |
CN108320190A (en) * | 2018-03-05 | 2018-07-24 | 北京电广聪信息技术有限公司 | A kind of audio collecting system and method |
CN108428150A (en) * | 2018-03-05 | 2018-08-21 | 北京电广聪信息技术有限公司 | A method of being used for advertisement audio feature extraction |
CN108460633A (en) * | 2018-03-05 | 2018-08-28 | 北京电广聪信息技术有限公司 | A kind of method for building up and application thereof of advertisement audio collection identifying system |
CN108540833A (en) * | 2018-04-16 | 2018-09-14 | 北京交通大学 | A kind of television advertising recognition methods based on camera lens |
CN108882016A (en) * | 2018-07-31 | 2018-11-23 | 成都华栖云科技有限公司 | A kind of method and system that video gene data extracts |
CN111920390A (en) * | 2020-09-15 | 2020-11-13 | 成都启英泰伦科技有限公司 | Snore detection method based on embedded terminal |
CN112437340A (en) * | 2020-11-13 | 2021-03-02 | 广东省广播电视局 | Method and system for determining whether variant long advertisements exist in audio and video |
CN112437340B (en) * | 2020-11-13 | 2023-02-21 | 广东省广播电视局 | Method and system for determining whether variant long advertisements exist in audio and video |
CN113299281A (en) * | 2021-05-24 | 2021-08-24 | 青岛科技大学 | Driver sharp high pitch recognition early warning method and system based on acoustic text fusion |
CN113627363A (en) * | 2021-08-13 | 2021-11-09 | 百度在线网络技术(北京)有限公司 | Video file processing method, device, equipment and storage medium |
CN113627363B (en) * | 2021-08-13 | 2023-08-15 | 百度在线网络技术(北京)有限公司 | Video file processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107426585A (en) | A kind of television advertising based on audio/video information retrieval supervises broadcast system | |
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN101546556B (en) | Classification system for identifying audio content | |
CN110633725B (en) | Method and device for training classification model and classification method and device | |
CN109493881B (en) | Method and device for labeling audio and computing equipment | |
CN106952644A (en) | A kind of complex audio segmentation clustering method based on bottleneck characteristic | |
CN111488489A (en) | Video file classification method, device, medium and electronic equipment | |
WO2016155047A1 (en) | Method of recognizing sound event in auditory scene having low signal-to-noise ratio | |
CN1979491A (en) | Method for music mood classification and system thereof | |
CN113763965B (en) | Speaker identification method with multiple attention feature fusion | |
CN109684506A (en) | A kind of labeling processing method of video, device and calculate equipment | |
CN112819020A (en) | Method and device for training classification model and classification method | |
Benamer et al. | Database for arabic speech commands recognition | |
CN115798459B (en) | Audio processing method and device, storage medium and electronic equipment | |
Omar et al. | Fourier Domain Kernel Density Estimation-based Approach for Hail Sound classification | |
CN110767248B (en) | Anti-modulation interference audio fingerprint extraction method | |
CN103366753A (en) | Moving picture experts group audio layer-3 (MP3) audio double-compression detection method under same code rate | |
CN113516987B (en) | Speaker recognition method, speaker recognition device, storage medium and equipment | |
Yu | Research on music emotion classification based on CNN-LSTM network | |
CN111312215A (en) | Natural speech emotion recognition method based on convolutional neural network and binaural representation | |
CN117558281A (en) | Speaker identification method and system based on enhanced self-supervision framework | |
Singh et al. | Application of different filters in mel frequency cepstral coefficients feature extraction and fuzzy vector quantization approach in speaker recognition | |
Saz et al. | Background-tracking acoustic features for genre identification of broadcast shows | |
Büker et al. | Double compressed AMR audio detection using long-term features and deep neural networks | |
Dong et al. | Application of voiceprint recognition based on improved ecapa-tdnn |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171201 |