CN111797326B

CN111797326B - False news detection method and system integrating multi-scale visual information

Info

Publication number: CN111797326B
Application number: CN202010459132.6A
Authority: CN
Inventors: 曹娟; 亓鹏; 谢添; 刘浩远; 郭俊波
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2023-05-12
Anticipated expiration: 2040-05-27
Also published as: CN111797326A

Abstract

The invention provides a false news detection method integrating multi-scale visual information, which comprises the following steps: a frequency domain characteristic acquisition step, namely constructing a frequency domain sub-network model by using a convolutional neural network, and acquiring a frequency domain characteristic representation of an input image through the frequency domain sub-network model; a semantic feature acquisition step, namely constructing a pixel domain sub-network model by using a convolutional neural network, and acquiring semantic feature representation of the input image through the pixel domain sub-network model; and an image detection step, wherein the frequency domain feature representation and the semantic feature representation are fused to obtain an image representation of the input image, and the prediction probability of the input image as a false news picture is obtained according to the image representation. The invention also provides a false news detection system fused with the multi-scale visual information, a computer readable storage medium and a data processing device comprising the computer readable storage medium.

Description

False news detection method and system integrating multi-scale visual information

Technical Field

The invention relates to the field of news credibility authentication research, in particular to a false news detection method integrating multi-scale visual information.

Background

In recent years, social media has become an important news information platform by virtue of the advantages of strong timeliness, low cost, strong interactivity, low admission threshold and the like, and people are gradually used to acquire the latest news on the social media and freely publish own views. However, the convenience and openness of social media also provides great convenience for the spread of false news, creating many negative social impacts. For example, during the previous month of a large voting campaign, each participant had read on average 1-3 false news published by the well-known media. These false news inevitably mislead the voter and are likely to even affect the outcome of the vote. Therefore, whether to automatically detect false news by using a technical means has become a problem to be solved in the self-media era.

Advances in multimedia technology have facilitated the transition from a traditional text-based news format to a multimedia content-based news format. Multimedia content can better describe news events, has higher confidence and is easier to attract the reader's attention than mere text content. However, this trend also creates new opportunities for false news. False news often uses extremely misleading and even tampered pictures to attract and misguide readers, thereby facilitating the rapid spread of false news. More than 40% of false news on a microblog dataset is statistically inclusive of matches. Thus, visual content has become a non-negligible part of false news.

The existing false news detection method mainly focuses on text content and social context. With the popularity of multimedia content, researchers began to detect false news in combination with visual information. These visual information based tasks can be divided into three categories: based on visual statistics, visual evidence features, and visual semantics features.

Work based on visual statistics uses statistics of pictures in news to help discriminate false news, such as the number of matches, picture popularity, and picture type. But these statistics are too basic to characterize visual patterns of false news complexity.

Visual evidence features are commonly used to detect picture tampering. To verify the authenticity of news patterns, some work has utilized visual evidence features, such as blockiness, etc., to aid in the detection of false news. For example, the multimedia authentication task held by MediaEval in 2015 and 2016 provides 7 visual evidence features to help detect tampering and misuse of multimedia content. Based on these forensic features, l.wu et al designed higher-level forensic features and combined text features and user features to solve news authentication problems. However, most evidence obtaining features are designed manually to detect certain specific tamper marks, and cannot detect the untampered real pictures in the false news pictures. In addition, these manual features require expert design, are labor intensive, and cannot capture complex patterns. These limitations result in the visual evidence features performing poorly in actual false news detection tasks.

With the popularity of convolutional neural networks, most multimedia content-based works use pre-trained deep convolutional neural networks to obtain a generic visual representation and fuse with textual information for detection of false news. Jin and other methods for first passing through the deep neural network fuse multi-mode content to solve the problem of false news detection; wang et al propose a neural network for event countermeasure, which uses multi-modal features to detect new emerging false news events; dhruv et al propose a self-coding based approach to learn the shared expression of multimodal information for false news detection. However, these efforts have focused more on how to fuse information of different modalities, ignoring the efficient modeling of this modality for visual content. Due to the lack of task related information, the general visual expressions adopted by these works cannot reflect the essential characteristics of false news pictures, and the performance of visual content in false news detection tasks is weakened.

Disclosure of Invention

Aiming at the problems, the invention provides a false news detection method integrating multi-scale visual information, which comprises the following steps: a frequency domain characteristic acquisition step, namely constructing a frequency domain sub-network model by using a convolutional neural network, and acquiring a frequency domain characteristic representation of an input image through the frequency domain sub-network model; a semantic feature acquisition step, namely constructing a pixel domain sub-network model by using a convolutional neural network, and acquiring semantic feature representation of the input image through the pixel domain sub-network model; and an image detection step, wherein the frequency domain feature representation and the semantic feature representation are fused to obtain an image representation of the input image, and the prediction probability of the input image as a false news picture is obtained according to the image representation.

The false news detection method provided by the invention comprises the following steps of: constructing a large-scale network of the frequency domain sub-network model by using a convolutional neural network; performing a block discrete cosine transform on the input image to obtain a large-scale histogram of the input image corresponding to a plurality of frequencies; sampling the large-scale histogram to obtain a plurality of large-scale multidimensional vectors; fusing the plurality of large-scale multidimensional vectors through the large-scale network to obtain a large-scale frequency domain feature representation l of the input image _large The method comprises the steps of carrying out a first treatment on the surface of the Constructing a small-scale network of the frequency domain sub-network model by using a convolutional neural network; dividing the input image into a plurality of image blocks with the same size, and performing block discrete cosine transform on the image blocks to obtain small-scale histograms corresponding to the image blocks on a plurality of frequencies; selecting a plurality of small-scale histograms in a high frequency band for sampling to obtain a plurality of small-scale multidimensional vectors; fusing the plurality of small-scale multidimensional vectors through the small-scale network to obtain a small-scale frequency domain feature representation l of the input image _small The method comprises the steps of carrying out a first treatment on the surface of the Will l _large And l _small Performing splicing and fusion to obtain a frequency domain feature representation l of the input image _F 。

The false news detection method provided by the invention comprises the following steps of: constructing a cyclic fusion network by using a convolutional neural network; acquiring a first characteristic diagram of the input of the cyclic fusion network on multiple scales, up-sampling the first characteristic diagram to obtain a second characteristic diagram with the same size, and performing channel splicing on the second characteristic diagram to obtain a global context knowledge representation as the output of the cyclic fusion network; taking the output of the cycle fusion network of the round as the input of the cycle fusion network of the next round, and connecting a plurality of the cycle fusion networks in series to form a sub-network model of the pixel domain; taking the input image as the input of the pixel domain sub-network model, taking the global context knowledge representation obtained after preset round iteration as the semantic feature representation l of the input image _p 。

According to the inventionThe false news detection method specifically comprises the following steps: representing l by the frequency domain feature _F And the semantic feature representation/ _P Obtaining the image representation u, u=αl _F +(1-α)l _P The method comprises the steps of carrying out a first treatment on the surface of the Projecting the image representation u to the false news picture target space and the true news picture target space, respectively, with a full connection layer, obtaining the prediction probability p, and taking the cross entropy error L between the prediction probability p and the true value y as a loss function, p=softmax (W _c u+b _c )，L＝-∑[ylogp+(1-y)log(1-p)]The method comprises the steps of carrying out a first treatment on the surface of the Wherein alpha is a normalized weight,

F(l _F )＝v ^T tanh(W _F l _F +b _F )，F(l _P )＝v ^T tanh(W _F l _P +b _F )，W _c and W is _F As a weight matrix, b _c And b _F To bias, v ^T The transposed weight vectors, softmax and tanh, are activation functions.

The invention also provides a false news detection system integrating the multi-scale visual information, which comprises the following steps: the frequency domain feature acquisition module is used for constructing a frequency domain sub-network model by using a convolutional neural network, and obtaining the frequency domain feature representation of the input image through the frequency domain sub-network model; the semantic feature acquisition module is used for constructing a pixel domain sub-network model by using a convolutional neural network, and acquiring semantic feature representation of the input image through the pixel domain sub-network model; and the image detection module is used for fusing the frequency domain feature representation with the semantic feature representation to obtain the image representation of the input image, and obtaining the prediction probability of the input image as a false news picture according to the image representation.

The invention relates to a false news detection system, wherein the frequency domain characteristic acquisition module specifically comprises: the large-scale frequency domain feature representation acquisition module is used for acquiring a large-scale frequency domain feature representation of the input image; constructing a large-scale network of the frequency domain sub-network model by using a convolutional neural network; performing a block discrete cosine transform on the input image to obtain the input image at a plurality of frequenciesA corresponding large-scale histogram; sampling the large-scale histogram to obtain a plurality of large-scale multidimensional vectors; fusing the plurality of large-scale multidimensional vectors through the large-scale network to obtain a large-scale frequency domain feature representation l of the input image _large The method comprises the steps of carrying out a first treatment on the surface of the The small-scale frequency domain feature representation acquisition module is used for acquiring a small-scale frequency domain feature representation of the input image; constructing a small-scale network of the frequency domain sub-network model by using a convolutional neural network; dividing the input image into a plurality of image blocks with the same size, and performing block discrete cosine transform on the image blocks to obtain small-scale histograms corresponding to the image blocks on a plurality of frequencies; selecting a plurality of small-scale histograms in a high frequency band for sampling to obtain a plurality of small-scale multidimensional vectors; fusing the plurality of small-scale multidimensional vectors through the small-scale network to obtain a small-scale frequency domain feature representation l of the input image _small The method comprises the steps of carrying out a first treatment on the surface of the Splicing and fusing module for splicing l _large And l _small Performing splicing and fusion to obtain a frequency domain feature representation l of the input image _F 。

The false news detection system of the invention, wherein the semantic feature acquisition module specifically comprises: the cyclic fusion network construction module is used for constructing a cyclic fusion network by using a convolutional neural network; acquiring a first characteristic diagram of the input of the cyclic fusion network on multiple scales, up-sampling the first characteristic diagram to obtain a second characteristic diagram with the same size, and performing channel splicing on the second characteristic diagram to obtain a global context knowledge representation as the output of the cyclic fusion network; the loop fusion network serial module is used for taking the output of the loop fusion network of the current loop as the input of the loop fusion network of the next loop, and connecting a plurality of loop fusion networks in series to form the pixel domain sub-network model; the semantic feature acquisition module is used for taking the input image as the input of the pixel domain sub-network model, and taking the global context knowledge representation obtained after the preset round of iteration as the semantic feature representation l of the input image _p 。

The invention relates to a false news detection system, wherein the image detection module specifically comprises: an image representation acquisition module for acquiring the image at the frequencyDomain feature representation l _F And the semantic feature representation/ _P Obtaining the image representation u, u=αl _F +(1-α)l _P The method comprises the steps of carrying out a first treatment on the surface of the A prediction probability obtaining module for respectively projecting the image representation u to the false news picture target space and the true news picture target space by using the full connection layer to obtain the prediction probability p, and taking the cross entropy error L between the prediction probability p and the true value y as a loss function, wherein p=softmax (W _c u+b _c )，L＝-∑[ylogp+(1-y)log(1-p)]The method comprises the steps of carrying out a first treatment on the surface of the Wherein alpha is a normalized weight,

The present invention also proposes a computer readable storage medium storing computer executable instructions for performing false news detection incorporating multi-scale visual information as described above.

The invention also proposes a data processing apparatus comprising a computer readable storage medium as described above, a processor of the data processing apparatus retrieving and executing computer executable instructions in the computer readable storage medium to perform false news detection incorporating multi-scale visual information.

Drawings

Fig. 1 is a flow chart of a false news detection method of the present invention.

Fig. 2 is a schematic diagram of a false information detection model of the present invention.

FIG. 3 is a schematic diagram of a data processing apparatus of the present invention.

Detailed Description

In order to make the purposes, technical schemes and advantages of the invention more clear, the false news detection method and system for fusing multi-scale visual information provided by the invention are further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

When the inventor researches the visual mode of the false news (namely, the picture allocation of the false news), the false news picture is found to contain not only the maliciously tampered false picture, but also the false picture which is wrong and used for representing the irrelevant event. The prior art is only suitable for modeling a certain type of false news pictures, and the essential characteristics of the false news pictures cannot be captured. The inventor finds that the false news picture has obvious characteristics in physical and semantic layers, and has obvious expression in a frequency domain and a pixel domain respectively. Therefore, the invention designs a corresponding deep learning model aiming at the characteristics of the false news picture, deeply digs potential visual modes of the picture in the frequency domain and the pixel domain, and carries out high-efficiency expression and fusion, thereby improving the effect of automatically screening false news by utilizing visual contents.

The invention aims to effectively and automatically detect false news, and mainly solves the technical problem of how to establish an effective deep learning model for false news detection based on visual contents of news.

The invention has the main key point that a deep learning model is designed, and the multi-scale visual information of the picture in the frequency domain and the pixel domain can be fully captured and fused, so that the automatic detection of false news by utilizing visual content is realized. The method specifically comprises the steps of modeling the physical characteristics of the false news picture and designing two key points for modeling the semantic characteristics of the false news picture:

1) A multi-scale Convolutional Neural Network (CNN) for the frequency domain information is designed for capturing the physical characteristics of different levels of false news pictures.

The false news pictures have the characteristic of low quality on the physical level, such as multiple compression traces, tamper traces and the like, and often have certain periodicity on the frequency domain, so that the modeling can be performed by using CNN. For a typical false news picture, such as a tampered picture, the tampered area of the picture tends to undergo more compression than the untampered area, which results in different portions of the tampered picture exhibiting different compression characteristics. Therefore, in order to comprehensively consider the overall characteristic and the local abnormal characteristic of the picture, the invention designs a multi-scale CNN network aiming at the frequency domain information, which is used for capturing the physical characteristics of different layers of the false news picture.

2) A cyclic fusion network aiming at pixel domain information is designed for effectively extracting and fusing the characteristics of false news pictures on different semantic levels.

The false news picture presents the style characteristics of visual impact and emotion flaring at the semantic level, and the style characteristics can be presented on visual characteristics of different levels, so that the multi-scale visual characteristics should be comprehensively considered for better modeling the semantic characteristics of the false news picture. Different layers of the CNN model can learn multi-scale features with different abstract levels, but when the CNN model learns multi-scale visual features layer by layer, the learned features have limited representation capability due to lack of context information caused by limited receptive fields. Therefore, the invention designs a cyclic fusion network, guides the feature learning of CNN by using global context knowledge, and fuses the multi-scale CNN features, thereby realizing the characteristics of effectively extracting and fusing false news pictures on different semantic levels.

The invention is described below with reference to the drawings and the detailed description.

One of the main targets of the invention is to automatically screen false information of news information released by a user by utilizing visual content, so that a specific task can be defined as whether news belongs to false news or not according to the visual content.

The false news picture has obvious characteristics in the frequency domain and the pixel domain. Therefore, in order to fully model visual characteristics of false news pictures, the invention designs a deep learning model, and deep digs potential visual modes of the pictures in a frequency domain and a pixel domain, and carries out efficient expression and fusion, thereby improving the effect of automatically screening false news by utilizing visual contents.

Fig. 1 is a flow chart of a false news detection method of the present invention. As shown in fig. 1, the false news detection method of the present invention includes:

s1, constructing a frequency domain sub-network model by using a convolutional neural network, and obtaining a frequency domain characteristic representation of an input image through the frequency domain sub-network model; the frequency domain sub-network model consists of two CNN models with similar structures and is used for extracting physical characteristics of different scales of an input image;

the frequency domain sub-network model consists of two similar CNN networks: small scale networks and large scale networks. The invention uses the complete input image for training of a large scale network and uses the 128 (pixels) x 128 (pixels) image block into which the input image is segmented for training of a small scale network. The two single-scale subnetworks have a similar model architecture. Taking a large scale network as an example, for an input image, a block Discrete Cosine Transform (DCT) is first applied to it to obtain a histogram of DCT coefficients for the picture at 64 frequencies. In particular, the present invention performs a one-dimensional fourier transform on these histograms to enhance the effect of CNN. Taking into account that CNN requires a fixed-size input, these histograms are sampled to obtain 64 250-dimensional vectors, denoted as { H ] ₀ ,H ₁ ,…,H ₆₃ }. After pretreatment, each input vector Hi is sent into a large-scale CNN network with shared weight to obtain corresponding characteristic representation w _i . The CNN network consists of three convolution blocks and a full connection layer, each convolution block consisting of a one-dimensional convolution layer and a maximum pooling layer. To accelerate the convergence of the model, the number of filters in the convolution layer is set to be increasing. The eigenvectors { w } of 64 frequency domains ₀ ,w ₁ ,…,w ₆₃ Splicing and fusing to obtain large-scale frequency domain characteristic representation l of the input image _large . In a small scale network, a block DCT is adopted for each image block with the size of 128×128; the first 9 high frequency entries are selected from the 64 frequencies for the rendering of the DCT coefficient histogram for the reduction parameter. Inputting all 128×128 image blocks into a small-scale CNN network, and performing stitching fusion on the obtained feature vectors to further obtain a small-scale frequency domain feature representation l of the input image _small . Finally, l is _large And l _small The splicing and the fusion are carried out,obtaining final characteristic representation l of input image frequency domain _F Further as input to the converged subnetwork.

S2, constructing a pixel domain sub-network model by using a convolutional neural network, and obtaining semantic feature representation of the input image through the pixel domain sub-network model; the pixel domain sub-network is composed of a cyclic fusion network, and the network comprises two stages of GCK (global contextual knowledge, GCK, global context knowledge) guided feature extraction and multi-scale feature fusion, which are respectively used for extracting and fusing feature images of different semantic layers of an input image;

the pixel domain sub-network model is composed of a cyclic fusion network. The main structure of the cyclic fusion network is a simple CNN network, a representation of Global Context Knowledge (GCK) is constructed on the basis of the simple CNN network by fusing multi-scale features, and cyclic connection between different layers of the GCK and the CNN is constructed. Assuming that the basic CNN host structure consists of L layers, each layer gets a feature map X. X is X ^l Is the output of the first layer CNN, which can be written as

X ^l ＝f ^l (W ^l *X ^l-1 )，l∈[1,L]

Wherein, represents convolution operation; w (W) ^l The weight (including the deviation term) of the first convolution layer is randomly initialized and optimized in the training process; f (f) ^l (. Cndot.) is a combined function of a plurality of specific functions, such as activation and pooling. Where X is ⁰ And X ^L Representing the input and final output of the CNN. 4 layers are selected from the L layers, and fusion is carried out by using a cyclic fusion network. The network comprises two stages of multi-scale feature fusion and GCK guided feature extraction. Let s= { r _m ,m∈[1,4]-representing the set of selected layers, and let r _m ∈[1,L]The selected layer is marked. In the multi-scale feature fusion stage, a representation GCK of global context knowledge is first obtained. Specifically, after the input image passes through CNN, a group of multi-scale feature images { X }, are obtained ^r r.epsilon.S.. The present invention employs a 1 x 1 convolution to reduce the number of channels of these feature maps and upsamples feature maps of different scales to the same size. Then, all the amplified feature maps { F ^r Channel splicing is carried out by r epsilon S, and 1 multiplied by 1 convolution is adoptedAnd (3) calculating to promote information fusion among channels and reduce feature dimensions, and finally obtaining the GCK. Formalization of GCK is defined as follows:

where Cat is the channel stitching operation, x represents the convolution operation, W is the weight matrix, and σ is the activation function. In the GCK-guided feature extraction phase, a cyclic connection between the GCK and each selected CNN layer is constructed. By introducing a loop connection, the input of each selected CNN layer includes both the output of the previous layer and the GCK. t represents the number of time steps of the loop network (i.e. the number of loops), then X ^L (l.epsilon.S) can be rewritten as

Wherein X is ^l (t) and GCK (t) represent the output of the first layer CNN and GCK, respectively, at time step t, and x represents the convolution operation, W ^l And f ^l Is a weight matrix and a combination function (including an activation function, a pooling operation and the like) for transferring the feature map of the layer (l-1) to the layer (l), U ^l And g ^l Is a weight matrix and a combination function for obtaining GCK of the first layer, V ^l Is the weight matrix of the 1*1 convolution layer of the first layer, σ is the activation function, cat is the channel splice operation. The model parameters for the multiple time steps are shared. After t iterations, the global context knowledge representation GCK (t) of the last time step is obtained as the final semantic feature representation l of the pixel domain sub-network _p Further as input to the converged subnetwork.

Step S3, fusing the frequency domain feature representation and the semantic feature representation to obtain an image representation of the input image, and obtaining the prediction probability of the input image as a false news picture according to the image representation; the fusion sub-network dynamically fuses the feature vectors acquired from the frequency domain and pixel domain sub-networks by using an attention (attention) mechanism, and classifies the input image as a false news picture or a real news picture;

the physical and semantic features of the picture are complementary when false news is detected, so the invention proposes to fuse the features by fusing the sub-networks, i.e. by using the output of the frequency domain sub-network _F And output of sub-network of pixel domain _P And predicting whether the input picture belongs to the false news picture. Intuitively, not all features play the same role in the detection of false news, meaning that some visual features play a more important role in evaluating whether a given picture is a false news picture or a true news picture. For example, for some tampered pictures with obvious tamper evidence, physical features perform better than semantic features in detecting false news; for some misleading images that have not undergone severe recompression, the semantic features are more efficient. The present invention thus highlights these valuable features through the attention mechanism, and the enhanced image representation u is calculated as follows:

F(l _F )＝v ^T tanh(W _F l _F +b _F )

F(l _P )＝v ^T tanh(W _F l _P +b _F )

u＝αl _F +(1-α)l _P

wherein W is _F Representing a weight matrix, b _F Representing bias, v ^T Representing the transposed weight vector, tanh is the activation function and F (·) is the scoring function that measures the importance of each feature vector. Then, a feature vector l is obtained by a softmax activation function _F And l _p Corresponding normalized weights α and 1- α, and computes a weighted sum of the different feature vectors as the high-level representation u of the image. The vector v is randomly initialized during training and optimized during network training.

The feature vector u is then projected into two types of target spaces using a fully connected layer with Softmax activation: false news pictures and true news pictures, and obtaining probability distribution:

p＝softmax(W _c u+b _c )，

wherein W is _c Representing a weight matrix, b _c Representing the bias. And defining the loss function as the cross entropy error between the predicted probability distribution and the true value:

L＝-∑[ylogp+(1-y)log(1-p)]

where y is the true value of the input image, 1 represents the false news picture, 0 represents the true news picture, and p represents the prediction probability of the false news picture.

The invention also provides a false news detection system, the whole frame of which is shown in figure 2 and mainly comprises three parts: a frequency domain sub-network, a pixel domain sub-network, and a convergence sub-network. The frequency domain sub-network is composed of two CNN models with similar structures and is used for extracting physical characteristics of different scales of an input image; the pixel domain sub-network is composed of a cyclic fusion network, which comprises two stages of GCK (global contextual knowledge, GCK, global context knowledge) guided feature extraction and multi-scale feature fusion, and is used for extracting and fusing feature images of different semantic layers of an input image respectively. The fusion sub-network dynamically fuses feature vectors acquired from the frequency domain and pixel domain sub-networks using an attention (attention) mechanism to classify the input image as a false news picture or a true news picture.

1. Frequency domain sub-network model

Details of the model of the frequency domain sub-network are shown in the upper half of fig. 2, the model consisting of two similar CNN networks: small scale networks and large scale networks. The invention uses the complete input image for training of a large scale network and uses the 128 (pixels) x 128 (pixels) image block into which the input image is segmented for training of a small scale network. The two single-scale subnetworks have a similar model architecture. Taking a large scale network as an example, for an input image, a block Discrete Cosine Transform (DCT) is first applied to it to obtain a histogram of DCT coefficients for the picture at 64 frequencies. In particular, the present invention performs a one-dimensional fourier transform on these histograms to enhance the effect of CNN. Taking into account CNN requires a fixed-size input and samples these histograms to yield 64 250-dimensional vectors, denoted H ₀ ,H ₁ ,…,H ₆₃ }. After pretreatment, each input vector Hi is sent into a large-scale CNN network with shared weight to obtain corresponding characteristic representation w _i . The CNN network consists of three convolution blocks and a full connection layer, each convolution block consisting of a one-dimensional convolution layer and a maximum pooling layer. To accelerate the convergence of the model, the number of filters in the convolution layer is set to be increasing. The eigenvectors { w } of 64 frequency domains ₀ ,w ₁ ,…,w ₆₃ Splicing and fusing to obtain large-scale frequency domain characteristic representation l of the input image _large . In a small scale network, a block DCT is adopted for each image block with the size of 128×128; the first 9 high frequency entries are selected from the 64 frequencies for the rendering of the DCT coefficient histogram for the reduction parameter. Inputting all 128×128 image blocks into a small-scale CNN network, and performing stitching fusion on the obtained feature vectors to further obtain a small-scale frequency domain feature representation l of the input image _small . Finally, l is _large And l _small Performing splicing and fusion to obtain final characteristic representation l of input image frequency domain _F Further as input to the converged subnetwork.

2. Pixel domain subnetwork model

The details of the pixel domain sub-network are shown in the lower part of fig. 2, and mainly comprise a loop fusion network. The main structure of the network is a simple CNN network, on the basis, a representation of Global Context Knowledge (GCK) is constructed by fusing multi-scale features, and cyclic connection between different layers of the GCK and the CNN is constructed. Assuming that the basic CNN host structure consists of L layers, each layer gets a feature map X. X is X ^l Is the output of the first layer CNN, which can be written as

X ^l ＝f ^l (W ^l *X ^l-1 )，l∈[1,L]

Wherein, represents convolution operation; w (W) ^l The weight (including the deviation term) of the first convolution layer is randomly initialized and optimized in the training process; f (f) ^l (. Cndot.) is how much to activate and poolA combination of the specific functions. Where X is ⁰ And X ^L Representing the input and final output of the CNN. 4 layers are selected from the L layers, and fusion is carried out by using a cyclic fusion network. The network comprises two stages of multi-scale feature fusion and GCK guided feature extraction. Let s= { r _m ,m∈[1,4]-representing the set of selected layers, and let r _m ∈[1,L]The selected layer is marked. In the multi-scale feature fusion stage, a representation GCK of global context knowledge is first obtained. Specifically, after the input image passes through CNN, a group of multi-scale feature images { X }, are obtained ^r r.epsilon.S.. The present invention employs a 1 x 1 convolution to reduce the number of channels of these feature maps and upsamples feature maps of different scales to the same size. Then, all the amplified feature maps { F ^r And (3) performing channel splicing by r epsilon S, and adopting 1 multiplied by 1 convolution operation to promote information fusion among channels and reduce feature dimensions, so as to finally obtain the GCK. Formalization of GCK is defined as follows:

Wherein X is ^l (t) and GCK (t) represent the output of the first layer CNN and GCK, respectively, at time step t, and x represents the convolution operation, W ^l And f ^l Is a weight matrix and a combination function (including an activation function, a pooling operation and the like) for transferring the feature map of the layer (l-1) to the layer (l), U ^l And g ^l Is a weight matrix and a combination function for obtaining GCK of the first layer，V ^l Is the weight matrix of the 1*1 convolution layer of the first layer, σ is the activation function, cat is the channel splice operation. The model parameters for the multiple time steps are shared. After t iterations, the global context knowledge representation GCK (t) of the last time step is obtained as the final semantic feature representation l of the pixel domain sub-network _p Further as input to the converged subnetwork.

3. Fusion sub-network model

F(l _F )＝v ^T tanh(W _F l _F +b _F )

F(l _P )＝v ^T tanh(W _F l _P +b _F )

u＝αl _F +(1-α)l _P

wherein W is _F Representing a weight matrix, b _F Representing the bias, tanh is the activation function, v ^T Representing the transposed weight vector, F (·) is a scoring function that measures the importance of each feature vector. Then, the mixture is excited by a softmaxThe living function obtains a feature vector l _F And l _p Corresponding normalized weights α and 1- α, and computes a weighted sum of the different feature vectors as the high-level representation u of the image. The vector v is randomly initialized during training and optimized during network training.

The feature vector u is then projected into two classes of target space using a fully connected layer with Softmax activation functions: false news pictures and true news pictures, and obtaining probability distribution:

p＝softmax(W _c u+b _c )，

L＝-∑[ylogp+(1-y)log(1-p)]

FIG. 3 is a schematic diagram of a data processing apparatus of the present invention. As shown in fig. 3, the embodiment of the present invention further provides a computer-readable storage medium, and a data processing apparatus. The computer readable storage medium of the present invention stores computer executable instructions that, when executed by a processor of a data processing apparatus, implement the above-described false news detection method that fuses multi-scale visual information. Those of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described methods may be performed by a program that instructs associated hardware (e.g., processor, FPGA, ASIC, etc.), which may be stored on a readable storage medium such as read only memory, magnetic or optical disk, etc. All or part of the steps of the embodiments described above may also be implemented using one or more integrated circuits. Accordingly, each module in the above embodiments may be implemented in the form of hardware, for example, by an integrated circuit, or may be implemented in the form of a software functional module, for example, by a processor executing a program/instruction stored in a memory to implement its corresponding function. Embodiments of the invention are not limited to any specific form of combination of hardware and software.

The invention realizes effective discrimination of false news based on the visual content of news messages, and compared with the prior art, the invention realizes great improvement of performance on the premise of not adding extra data. In particular, for the task of detecting false news using visual content, the present invention achieves an accuracy improvement of at least 11.8 percentiles over prior art data sets disclosed in the industry.

The above embodiments are only for illustrating the present invention, not for limiting the present invention, and various changes and modifications may be made by one of ordinary skill in the relevant art without departing from the spirit and scope of the present invention, and therefore, all equivalent technical solutions are also within the scope of the present invention, and the scope of the present invention is defined by the claims.

Claims

1. A false news detection method integrating multi-scale visual information is characterized by comprising the following steps:

a frequency domain characteristic acquisition step, namely constructing a large-scale network of a frequency domain sub-network model by using a convolutional neural network; performing block discrete cosine transform on an input image to obtain a large-scale histogram corresponding to the input image at a plurality of frequencies; sampling the large-scale histogram to obtain a plurality of large-scale multidimensional vectors; fusing the plurality of large-scale multidimensional vectors through the large-scale network to obtain a large-scale frequency domain feature representation of the input image; constructing a small-scale network of the frequency domain sub-network model by using a convolutional neural network; dividing the input image into a plurality of image blocks with the same size, and performing block discrete cosine transform on the image blocks to obtain small-scale histograms corresponding to the image blocks on a plurality of frequencies; selecting a plurality of small-scale histograms in a high frequency band for sampling to obtain a plurality of small-scale multidimensional vectors; fusing the plurality of small-scale multidimensional vectors through the small-scale network to obtain a small-scale frequency domain feature representation of the input image; splicing and fusing the large-scale frequency domain feature representation and the small-scale frequency domain feature representation to obtain a frequency domain feature representation of the input image;

a semantic feature acquisition step, namely constructing a pixel domain sub-network model by using a convolutional neural network, and acquiring semantic feature representation of the input image through the pixel domain sub-network model;

and an image detection step, wherein the frequency domain feature representation and the semantic feature representation are fused to obtain an image representation of the input image, and the prediction probability of the input image as a false news picture is obtained according to the image representation.

2. The false news detection method of claim 1, wherein the semantic feature acquisition step specifically includes:

constructing a cyclic fusion network by using a convolutional neural network; acquiring a first characteristic diagram of the input of the cyclic fusion network on multiple scales, up-sampling the first characteristic diagram to obtain a second characteristic diagram with the same size, and performing channel splicing on the second characteristic diagram to obtain a global context knowledge representation as the output of the cyclic fusion network;

taking the output of the cycle fusion network of the round as the input of the cycle fusion network of the next round, and connecting a plurality of the cycle fusion networks in series to form a sub-network model of the pixel domain;

taking the input image as the input of the pixel domain sub-network model, taking the global context knowledge representation obtained after preset round iteration as the semantic feature representation l of the input image _p 。

3. The false news detection method of claim 1, wherein the image detection step specifically includes:

representing l by the frequency domain feature _F And the semantic feature representation/ _P Obtaining the image representation u, u=αl _F +(1-α)l _P ；

Projecting the image representation u to a false news picture target space and a true news picture target space respectively by using a full connection layer to obtain the prediction probability p, and taking a cross entropy error L between the prediction probability p and a true value y as a loss function, wherein p=softmax #W _c u+b _c )，L＝-∑[ylogp+(1-y)log(1-p)]；

Wherein alpha is a normalized weight,

4. A false news detection system incorporating multi-scale visual information, comprising:

the frequency domain feature acquisition module is used for constructing a frequency domain sub-network model by using a convolutional neural network, and obtaining the frequency domain feature representation of the input image through the frequency domain sub-network model; the method comprises a large-scale frequency domain feature representation acquisition module, a small-scale frequency domain feature representation acquisition module and a splicing and fusing module, wherein:

the large-scale frequency domain feature representation acquisition module is used for acquiring a large-scale frequency domain feature representation of the input image; constructing a large-scale network of the frequency domain sub-network model by using a convolutional neural network; performing a block discrete cosine transform on the input image to obtain a large-scale histogram of the input image corresponding to a plurality of frequencies; sampling the large-scale histogram to obtain a plurality of large-scale multidimensional vectors; fusing the plurality of large-scale multidimensional vectors through the large-scale network to obtain a large-scale frequency domain feature representation of the input image;

the small-scale frequency domain feature representation acquisition module is used for acquiring a small-scale frequency domain feature representation of the input image; constructing a small-scale network of the frequency domain sub-network model by using a convolutional neural network; dividing the input image into a plurality of image blocks with the same size, and performing block discrete cosine transform on the image blocks to obtain small-scale histograms corresponding to the image blocks on a plurality of frequencies; selecting a plurality of small-scale histograms in a high frequency band for sampling to obtain a plurality of small-scale multidimensional vectors; fusing the plurality of small-scale multidimensional vectors through the small-scale network to obtain a small-scale frequency domain feature representation of the input image;

the splicing and fusing module is used for splicing and fusing the large-scale frequency domain feature representation and the small-scale frequency domain feature representation to obtain the frequency domain feature representation of the input image;

the semantic feature acquisition module is used for constructing a pixel domain sub-network model by using a convolutional neural network, and acquiring semantic feature representation of the input image through the pixel domain sub-network model;

and the image detection module is used for fusing the frequency domain feature representation with the semantic feature representation to obtain the image representation of the input image, and obtaining the prediction probability of the input image as a false news picture according to the image representation.

5. The false news detection system of claim 4, wherein the semantic feature acquisition module specifically includes:

the cyclic fusion network construction module is used for constructing a cyclic fusion network by using a convolutional neural network; acquiring a first characteristic diagram of the input of the cyclic fusion network on multiple scales, up-sampling the first characteristic diagram to obtain a second characteristic diagram with the same size, and performing channel splicing on the second characteristic diagram to obtain a global context knowledge representation as the output of the cyclic fusion network;

the loop fusion network serial module is used for taking the output of the loop fusion network of the current loop as the input of the loop fusion network of the next loop, and connecting a plurality of loop fusion networks in series to form the pixel domain sub-network model;

the semantic feature acquisition module is used for taking the input image as the input of the pixel domain sub-network model, and taking the global context knowledge representation obtained after the preset round of iteration as the semantic feature representation l of the input image _p 。

6. The false news detection system of claim 4, wherein the image detection module specifically includes:

an image representation acquisition module for representing the l with the frequency domain features _F And the semantic feature representation/ _P Obtaining the image representation u, u=αl _F +(1-α)l _P ；

A prediction probability obtaining module for respectively projecting the image representation u to the false news picture target space and the true news picture target space by using the full connection layer to obtain the prediction probability p, and taking the cross entropy error L between the prediction probability p and the true value y as a loss function, wherein p=softmax (W _c u+b _c )，L＝-∑[ylogp+(1-y)log(1-p)]；

Wherein alpha is a normalized weight,

7. A computer readable storage medium storing computer executable instructions for performing the false news detection incorporating multi-scale visual information as claimed in any one of claims 1 to 3.

8. A data processing apparatus comprising the computer readable storage medium of claim 7, the processor of the data processing apparatus retrieving and executing computer executable instructions in the computer readable storage medium to perform false news detection incorporating multi-scale visual information.