CN104679902A

CN104679902A - Information abstract extraction method in conjunction with cross-media fuse

Info

Publication number: CN104679902A
Application number: CN201510123093.1A
Authority: CN
Inventors: 裴廷睿; 赵津锋; 李哲涛; 崔荣峻; 吴相润; 关屋大雄
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2015-03-20
Filing date: 2015-03-20
Publication date: 2015-06-03
Anticipated expiration: 2035-03-20
Also published as: CN104679902B

Abstract

The invention proposes an information abstract extraction method combined with cross-media fusion. Firstly, classify the input multimedia data (text, image, audio, video, etc.) according to the data type; then homogeneous heterogeneous multimedia data and establish the text label of the data to obtain the same-dimensional image and text label; then The two-dimensional image data is clustered and the relevance test of the text label is carried out; several same-dimensional images are then classified and fused into one image; finally, a cross-media information summary is generated. Users can view the fusion image of each type of information through the information summary, and can quickly access the corresponding multimedia data.

Description

An Information Abstract Extraction Method Combined with Cross-media Fusion

技术领域 technical field

本发明涉及一种结合跨媒体融合的信息摘要提取方法，属于信息提取领域。 The invention relates to an information abstract extraction method combined with cross-media fusion, belonging to the field of information extraction.

背景技术 Background technique

我们生活在一个信息时代，海量信息扩增，互联网每天在新增大量的信息，而信息的存储方式日渐多样化，文本、图像、音频、视频是多媒体资源的基本存在形式。如今多种类型媒体数据混合并存，媒体数据组织结构复杂，但不同类型的媒体数据从不同侧面表达同一语义，信息提取中需要根据媒体之间存在的各种联系，从一种媒体跨越到另一种媒体。因此，如何跨越媒体之间的界限，如何提取多种媒体之间的潜在关联性，成为目前信息提取所面临的挑战。 We live in an information age, where massive amounts of information are increasing, and the Internet is adding a large amount of information every day, and the storage methods of information are becoming more and more diverse. Text, images, audio, and video are the basic forms of multimedia resources. Nowadays, various types of media data coexist, and the organization structure of media data is complex, but different types of media data express the same semantics from different aspects. In information extraction, it is necessary to cross from one media to another according to the various connections between media. kind of media. Therefore, how to cross the boundary between media and how to extract the potential correlation between multiple media has become a challenge for information extraction.

对于多种媒体形式混合并存的大数据，现有方法主要是通过同一种媒体的特征辨识来实现的，难以跨越多媒体之间的语义鸿沟，例如图像的视觉特征与音频的听觉特征之间的特征维数不同而无法直接度量他们之间的相似性，因此，现有信息提取方法不能很好为用户提供直观缩略图（或信息摘要），如何将混合的大量多媒体数据分类与提取，成为信息提取亟需解决的关键技术难题之一，也是目前所研究的热门课题。 For big data where multiple media forms coexist, the existing methods are mainly realized through the feature recognition of the same media, and it is difficult to bridge the semantic gap between multimedia, such as the visual features of images and the auditory features of audio. The dimensions are different and the similarity between them cannot be directly measured. Therefore, the existing information extraction methods cannot provide users with intuitive thumbnails (or information summaries). How to classify and extract a large amount of mixed multimedia data becomes information extraction. It is one of the key technical problems that need to be solved urgently, and it is also a hot topic of research at present.

现有的成熟文本挖掘技术、图像特征提取算法、音频场景识别、语音识别、视频场景分割、关键帧提取等方法可以提取单一媒体的语义信息，如何将这些算法加以结合，将不同维数的特征信息提取，形成处理多媒体的信息提取系统，我们通过图像这一中间维数的媒体来解决此问题。 Existing mature text mining technology, image feature extraction algorithm, audio scene recognition, speech recognition, video scene segmentation, key frame extraction and other methods can extract semantic information of a single media. How to combine these algorithms to combine features of different dimensions Information extraction, forming an information extraction system for processing multimedia, we solve this problem through the medium of image, which is an intermediate dimension.

发明内容 Contents of the invention

针对上述问题，本发明提出一种结合跨媒体融合的信息摘要提取方法。通过采用将异维数据同维化为图像的方法，解决了难以跨越多媒体语义鸿沟的问题。通过图像聚类方法，从而间接的将多媒体数据分类和提取，生成跨媒体信息摘要。 In view of the above problems, the present invention proposes an information summary extraction method combined with cross-media fusion. By adopting the method of synchronizing different-dimensional data into images, it solves the problem that it is difficult to cross the multimedia semantic gap. Through the image clustering method, the multimedia data is classified and extracted indirectly, and the cross-media information summary is generated.

本发明提出了一种结合跨媒体融合的信息摘要提取方法。首先将输入的多媒体数据(文字、图像、音频、视频等)按数据类型将其分类；再将异维多媒体数据同维化并建立数据的文本标签，获得同维图像和文本标签；然后将同维图像数据聚类并进行文本标签的关联性检验；再分类别融合若干张同维图像为一副图像；最后生成跨媒体信息摘要。用户通过信息摘要可查看每类信息的融合图像，并可快速访问对应的多媒体数据。 The invention proposes an information abstract extraction method combined with cross-media fusion. Firstly, the input multimedia data (text, image, audio, video, etc.) are classified according to the data type; then the multi-dimensional multimedia data is synchronized and the text label of the data is established to obtain the same-dimensional image and text label; then the same-dimensional image and text label are obtained. The two-dimensional image data is clustered and the relevance test of the text label is carried out; several same-dimensional images are then classified and fused into one image; finally, a cross-media information summary is generated. Users can view the fusion image of each type of information through the information summary, and can quickly access the corresponding multimedia data.

本发明提出一种结合跨媒体融合的信息摘要提取方法，包括以下步骤： The present invention proposes a method for extracting information summaries combined with cross-media fusion, comprising the following steps:

步骤一：将输入的多媒体数据中(文字、图像、音频、视频)按数据类型分类为原始文本数据，原始图像数据，原始音频数据，原始视频数据； Step 1: Classify the input multimedia data (text, image, audio, video) into original text data by data type , the original image data , the raw audio data , the original video data ;

步骤二：设置图像数据维数（图像像素）标准值，建立带有文本标签的同维图像样本库，进行异维多媒体数据同维化处理，根据数据类型的不同采用相对应的处理方法； Step 2: Set the standard value of the image data dimension (image pixel), establish a same-dimensional image sample library with text labels, carry out the same-dimensional processing of different-dimensional multimedia data, and adopt corresponding processing methods according to different data types;

步骤三：对已处理的同维图像数据，根据聚类所需要的准确度确定阈值，按照图像聚类算法进行聚类，根据每类数据的文本标签进行文本标签关联性检验，将不满足条件的数据再次聚类，直到不满足条件的数据数量小于阈值，可得类同维图像数据的地址，即索引； Step 3: For the processed same-dimensional image data, determine the threshold according to the accuracy required for clustering , perform clustering according to the image clustering algorithm, perform text label correlation test according to the text labels of each type of data, and cluster the data that does not meet the conditions again until the number of data that does not meet the conditions is less than the threshold ,Available same-dimensional image data the address of the index ;

步骤四：对已聚类的同维图像数据，按照一种融合规则，进行融合，从而得到每一类同维图像数据的融合图像； Step 4: Fusion the clustered same-dimensional image data according to a fusion rule to obtain a fusion image of each type of same-dimensional image data ;

步骤五：根据每一类同维图像数据的融合图像以及索引，生成信息摘要。 Step 5: Generate an information summary according to the fused image and index of each type of same-dimensional image data.

与现有方法相比，本发明的优势在于： Compared with existing methods, the advantages of the present invention are:

1、将异维的多媒体数据语义用同维图像数据表达，跨越了多种媒体之间的界限，并运用图像处理的相关算法处理多媒体数据； 1. Express the semantics of different-dimensional multimedia data with image data of the same dimension, crossing the boundaries between various media, and use relevant algorithms of image processing to process multimedia data;

2、图像聚类方法与文本标签关联性检验相结合，保证了分类的准确性和数据之间的强关联性。 2. The image clustering method is combined with the text label correlation test to ensure the accuracy of the classification and the strong correlation between the data.

附图说明 Description of drawings

图1 是本发明的流程图； Fig. 1 is a flow chart of the present invention;

图2 是本发明中异维数据同维化方法流程图； Fig. 2 is a flow chart of the same-dimensionalization method for different-dimensional data in the present invention;

图3 是本发明中同维图像数据聚类与文本标签关联性检验示意图。 Fig. 3 is a schematic diagram of the same-dimensional image data clustering and text label correlation test in the present invention.

具体实施方法Specific implementation method

下面结合附图和具体实施方式对本发明进一步详细描述： Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:

步骤一：将输入的多媒体数据中(文字、图像、音频、视频)按数据类型分类为原始文本数据，原始图像数据，原始音频数据，原始视频数据。 Step 1: Classify the input multimedia data (text, image, audio, video) into original text data by data type , the original image data , the raw audio data , the original video data .

步骤二：参见图2，设置图像数据维数（图像像素）标准值，建立带有文本标签的同维图像样本库，进行异维多媒体数据同维化处理，根据数据类型的不同的采用相对应的处理方法； Step 2: Refer to Figure 2, set the standard value of the image data dimension (image pixel), establish a sample library of the same dimension image with text labels, perform the same dimension processing of different dimension multimedia data, and use corresponding methods according to different data types processing method;

现有分类结果为组原始文本数据、组原始图像数据、组原始音频数据、组原始视频数据，将组原始文本数据处理为同维图像数据，将组原始图像数据处理为同维图像数据，将组原始音频数据处理为同维图像数据，将组原始视频数据处理为同维图像数据，详细步骤如下； The existing classification results are group of raw text data, group of raw image data, group of raw audio data, group of raw video data, the group raw text data Processing as same-dimensional image data ,Will Group raw image data Processing as same-dimensional image data ,Will Group raw audio data Processing as same-dimensional image data ,Will Group raw video data Processing as same-dimensional image data , the detailed steps are as follows;

1）将原始文本数据处理为同维图像数据的过程和相关操作； 1) Convert raw text data to Processing as same-dimensional image data process and related operations;

a）预处理，利用某种文本挖掘技术（如基于语义理解的文本挖掘），将原始文本数据中每组文本信息段落的关键词提取为标签； a) Preprocessing, using some text mining technology (such as text mining based on semantic understanding), the original text data The keywords of each group of text information paragraphs in the text are extracted as tags ;

b）将组文本数据根据标签关键词和样本库对应到同维图像数据，其中，一组文本可对应多个标签以及同维图像数据，对应的样本图像可表示为。 b) will The group text data is mapped to the same-dimensional image data according to the tag keywords and sample library , where a set of text can correspond to multiple labels and same-dimensional image data, The corresponding sample image can be expressed as .

2）将原始图像数据处理为同维图像数据的过程和相关操作； 2) Convert the original image data to Processing as same-dimensional image data process and related operations;

a）预处理原始图像数据，利用相关算法增强关键特征(如剔除背景区域)，得到处理后的图像； a) Preprocessing raw image data , using related algorithms to enhance key features (such as removing background areas) to obtain processed images ;

b）对于图像，利用某种图像缩放技术（如双三次插值与小波逆向插值）缩放为同维图像数据（与样本库同维）； b) for images , use some image scaling technology (such as bicubic interpolation and wavelet inverse interpolation) to scale to the same dimension image data (same dimension as the sample library);

c）将同维图像数据采用某种识别方法(如基于视觉信息的图像特征提取算法)与样本库比对，获得图像的文本标签，结果存放于。 c) Convert the same-dimensional image data Use a certain recognition method (such as image feature extraction algorithm based on visual information) to compare with the sample library to obtain the text label of the image, and the result is stored in .

3）将原始音频数据处理为同维图像数据的过程和相关操作； 3) Convert the raw audio data to Processing as same-dimensional image data process and related operations;

a）预处理原始音频数据，利用相关算法提取音频场景（如基于概率潜在语义分析的音频场景识别方法），语言语义（如基于神经网络的语音识别）等关键特征，得到提取的文本标签； a) Preprocess raw audio data , use related algorithms to extract key features such as audio scene (such as audio scene recognition method based on probabilistic latent semantic analysis), language semantics (such as neural network-based speech recognition), and get the extracted text label ;

b）对于提取的文本标签，文本标签与样本库对应，得到同维图像数据，其中，同组音频可对应多个标签以及同维图像数据，对应的多个样本图像可表示为。 b) For the extracted text labels , the text label corresponds to the sample library, and the same-dimensional image data is obtained , where the same group of audio can correspond to multiple tags and image data of the same dimension, The corresponding multiple sample images can be expressed as .

4）将原始视频数据处理为同维图像数据的过程和相关操作； 4) Convert the raw video data to Processing as same-dimensional image data process and related operations;

a）预处理原始视频数据，利用某一场景分割算法（如基于语义的视频场景分割算法），对于每一视频，得到分割场景后个视频片段； a) Preprocessing raw video data , using a scene segmentation algorithm (such as a semantic-based video scene segmentation algorithm), for each video , after getting the segmented scene video clips ;

b）对于的每个视频片段，采用某一关键帧提取方法（如基于聚类算法的多特征融合关键帧提取），获得关键帧图像，每一视频的关键帧图像的集合记为； b) For each video segment of , using a certain key frame extraction method (such as multi-feature fusion key frame extraction based on clustering algorithm) to obtain the key frame image , the set of key frame images of each video is denoted as ;

c）对于关键帧图像，利用相关算法增强关键特征（如剔除背景区域）； c) For keyframe images , using related algorithms to enhance key features (such as removing background areas);

d）对已处理的图像利用某种图像缩放技术（如双三次插值与小波逆向插值）缩放为同维图像数据（与样本库同维）； d) Use certain image scaling techniques (such as bicubic interpolation and wavelet inverse interpolation) to scale the processed image to the same dimension image data (same dimension as the sample library);

e）将同维图像数据，采用某种识别方法与样本库比对，获得图像的文本标签，结果存放于。 e) Convert the same-dimensional image data , using a certain recognition method to compare with the sample library to obtain the text label of the image, and the result is stored in .

步骤三：参见图3，对已处理的同维图像数据，根据聚类所需要的准确度确定阈值，按照某种图像聚类算法进行聚类（如基于遗传算法的图像聚类），根据每类数据的文本标签进行文本标签关联性检验，将不满足条件的数据再次聚类，直到不满足条件的数据数量小于阈值，可得索引，为类同维图像数据的地址，详细步骤如下： Step 3: See Figure 3, for the processed same-dimensional image data, determine the threshold according to the accuracy required for clustering , perform clustering according to a certain image clustering algorithm (such as image clustering based on genetic algorithm), perform text label correlation test according to the text labels of each type of data, and re-cluster the data that does not meet the conditions until the conditions are not satisfied The number of data is less than the threshold , available index ,for same-dimensional image data address, the detailed steps are as follows:

1）对已处理的同维图像数据，根据聚类所需要的准确度确定阈值，越小，分类数量越多，分类越精确，反之，分类数量越少； 1) For the processed same-dimensional image data, determine the threshold according to the accuracy required for clustering , The smaller the value, the greater the number of classifications and the more accurate the classification; otherwise, the fewer the number of classifications;

2）按照某种图像聚类算法进行聚类（如基于遗传算法的图像聚类），存储已聚类的同维图像地址，对于已聚类的同一类同维图像，提取其对应的文本标签，进行文本标签与图像聚类结果的文本标签关联性检验； 2) Perform clustering according to a certain image clustering algorithm (such as image clustering based on genetic algorithm), store the address of the clustered same-dimensional image, and extract the corresponding text label for the same type of clustered same-dimensional image , performing a text label correlation test between the text label and the image clustering result;

3）对于已聚类的同维图像数据，若不满足文本标签关联性检验条件的数量大于阈值，则将不满足条件的数据剔除本类，重新成为未聚类的同维图像数据，并按照相同或不同的聚类方法再次聚类，直到不满足条件的数据数量小于阈值； 3) For clustered image data of the same dimension, if the number that does not meet the text label relevance test condition is greater than the threshold , then the data that does not meet the conditions will be removed from this category, and it will become unclustered same-dimensional image data again, and will be clustered again according to the same or different clustering methods until the number of data that does not meet the conditions is less than the threshold ;

4）将分类结果以地址的形式存储，得到索引，为类同维图像数据的地址。 4) Store the classification results in the form of addresses to get the index ,for same-dimensional image data the address of.

步骤四：对已聚类的同维图像数据，按照某一融合规则（如选取目标较多一幅图像），进行融合，从而得到每一类同维图像数据的融合图像； Step 4: Fusion the clustered same-dimensional image data according to a certain fusion rule (such as selecting one image with more targets), so as to obtain the fusion image of each type of same-dimensional image data ;

依次按索引取出类同维图像数据，按照某一融合规则，进行融合，从而得到每一类同维图像数据的融合图像。 by index take out same-dimensional image data , according to a certain fusion rule, fusion is performed to obtain the fusion image of each type of same-dimensional image data .

步骤五：根据每一类同维图像数据的融合图像以及索引，生成信息摘要； Step 5: Generate an information summary according to the fusion image and index of each type of same-dimensional image data;

将获得的融合图像以及索引生成信息摘要，用户可查看融合图像，访问对应的多媒体数据。 The resulting fused image will be and the index Generating a summary of information , the user can view the fused image and access the corresponding multimedia data.

Claims

1. combine the informative abstract extracting method across Media Convergence, it is characterized in that, first the multi-medium data (word, image, audio frequency, video etc.) of input is classified by data type; Again different dimension multi-medium data is set up the text label of data with dimensionization, obtain dimension image and text label together; Then by same dimensional data image cluster and carry out text label relevance inspection; Several same dimension images of sub-category fusion are a sub-picture again; Finally generate and make a summary across media information; Described method at least comprises the following steps:

Step one: be urtext data by data type classifications by (word, image, audio frequency, video) in the multi-medium data of input , raw image data , original audio data , original video data ;

Step 2: arrange view data dimension (image pixel) standard value, sets up the same Wei Tuxiangyangbenku with text label, carries out different dimension multi-medium data with dimensionization process, adopts corresponding disposal route according to the difference of data type;

Step 3: to processed same dimensional data image, the accuracy definite threshold required for cluster , carry out cluster according to image clustering algorithm, carry out the inspection of text label relevance according to the text label of every class data, by the data cluster again do not satisfied condition, until the data bulk do not satisfied condition is less than threshold value , can obtain roughly the same dimensional data image address, i.e. index ;

Step 4: to the same dimensional data image of cluster, according to a kind of fusion rule, merge, thus the fused images obtaining each roughly the same dimensional data image ;

Step 5: according to each the roughly the same fused images of dimensional data image and index, information generated is made a summary.

2. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, to it is characterized in that in step 2 and by urtext data be treated to same dimensional data image process and associative operation, at least further comprising the steps of:

1) pre-service, utilizes Text Mining Technology, by urtext data in often organize text message paragraph keyword extraction be label ;

2) will group text data corresponds to same dimensional data image according to label keyword and Sample Storehouse , wherein, one group of text may correspond to multiple label and same dimensional data image, corresponding sample image can be expressed as .

3. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, to it is characterized in that in step 2 and by raw image data be treated to same dimensional data image process and associative operation, at least further comprising the steps of:

1) pre-service raw image data , utilize related algorithm to strengthen key feature, obtain the image after processing ;

2) for image , utilize image scaling techniques to be scaled same dimensional data image (with Sample Storehouse with tieing up);

3) by same dimensional data image adopt image-recognizing method and Sample Storehouse comparison, obtain the text label of image, result is deposited in .

4. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, it is characterized in that original audio data in step 2 be treated to same dimensional data image process and associative operation, at least further comprising the steps of:

1) pre-service original audio data , utilize related algorithm to extract audio scene, the key features such as language semantic, obtain the text label extracted ;

2) for the text label extracted , text label is corresponding with Sample Storehouse, obtains same dimensional data image , wherein, may correspond to multiple label and same dimensional data image with group audio frequency, corresponding multiple sample images can be expressed as .

5. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, it is characterized in that original video data in step 2 be treated to same dimensional data image process and associative operation, at least further comprising the steps of:

1) pre-service original video data , utilize Algorithm of Scene, for each video , after obtaining split sence individual video segment ;

2) for each video segment , adopt extraction method of key frame, obtain key frame images , the set of the key frame images of each video is designated as ;

3) for key frame images , utilize related algorithm to strengthen key feature;

4) same dimensional data image is scaled to processed imagery exploitation image scaling techniques (with Sample Storehouse with tieing up);

5) by same dimensional data image , adopt image-recognizing method and Sample Storehouse comparison, obtain the text label of image, result is deposited in .

6. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, it is characterized in that to same dimension image clustering and the process setting up index in step 3, at least further comprising the steps of:

1) to processed same dimensional data image, the accuracy definite threshold required for cluster , less, classification quantity is more, classifies more accurate, otherwise classification quantity is fewer;

2) carry out cluster according to image clustering algorithm, store the same dimension image address of cluster, for the same class of cluster with tieing up image, extract the text label of its correspondence, the relevance of carrying out text label and image clustering result is checked;

3) for the same dimensional data image of cluster, if the quantity not meeting test condition is greater than threshold value , then the data do not satisfied condition are rejected this class, again become the same dimensional data image of non-cluster, and according to identical or different method cluster again, until the data bulk do not satisfied condition is less than threshold value ;

4) classification results is stored with the form of address, obtain index , for roughly the same dimensional data image address.

7. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, it is characterized in that in step 4, sub-category fusion is the process of a width with dimensional data image, at least further comprising the steps of:

1) index is pressed successively take out roughly the same dimensional data image , according to a kind of fusion rule, merge, thus obtain the fused images of each roughly the same dimensional data image .

8. a kind of combination according to claim 1 is across the informative abstract extracting method of Media Convergence, it is characterized in that according to the fusion of each category information same dimension image and index in step 5, the process of information generated summary, at least further comprising the steps of:

1) fused images will obtained and index information generated is made a summary .