[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112819645B - Social network false information propagation detection method based on degree of motif - Google Patents

Social network false information propagation detection method based on degree of motif Download PDF

Info

Publication number
CN112819645B
CN112819645B CN202110309368.6A CN202110309368A CN112819645B CN 112819645 B CN112819645 B CN 112819645B CN 202110309368 A CN202110309368 A CN 202110309368A CN 112819645 B CN112819645 B CN 112819645B
Authority
CN
China
Prior art keywords
motif
network
propagation
information
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110309368.6A
Other languages
Chinese (zh)
Other versions
CN112819645A (en
Inventor
许小可
徐铭达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN202110309368.6A priority Critical patent/CN112819645B/en
Publication of CN112819645A publication Critical patent/CN112819645A/en
Application granted granted Critical
Publication of CN112819645B publication Critical patent/CN112819645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a social network false information propagation detection method based on degree of motif, and relates to the field of information detection. According to the invention, firstly, a motif algorithm is provided based on a directed motif structure of a complex network, statistical analysis is performed on a breadth propagation and depth propagation mechanism of a social network based on the motif algorithm, so that the analysis of a propagation mechanism of false information in an online social network is facilitated, the false information propagation characteristic analysis based on the motif is applied to false information detection of a social media information propagation process, early propagation of false news is restrained and other actual scenes, and the false information detection method based on a network structure can play a role in identification of important nodes and detection of robots in news propagation, so that a novel and feasible approach based on network structural features is provided for false information detection.

Description

Social network false information propagation detection method based on degree of motif
Technical Field
The invention relates to the field of information detection, in particular to a social network false information propagation detection method based on a degree of motif.
Background
Social networks are typically complex networks that contain a range of people-to-person connections, where individual users can abstract as nodes in the network and connections abstract as links between nodes [2]. In an online social network, massive information is transmitted through interaction and forwarding behaviors of users, social media is used as a carrier for information transmission, people can share information flow rapidly to acquire news in the current, ideological exchange and information interaction between Fang Bianren and people can become an important channel for spreading false news.
False news is often filled with rumors and misleading false information, and most people face a large amount of information in a network, so that the true or false of the information cannot be accurately judged, and the false information is widely spread. New wave microblogs are important platforms for users in China to read news information and share daily life of individuals, and the users serve as producers and propagators of information propagation, so that the influence of the information can be received, and the influence can be obtained by sending a push message. Users may affect many areas of public opinion, politics, economy, etc. by distributing false information in social media.
False news is also usually accompanied with subjective prejudice and emotion, so that the false news is often an important opportunity for the incubation and diffusion of the false news at the same time of hot event burst, and becomes an almost unavoidable byproduct in information transmission. Today, how to avoid a great deal of false information ingestion and accurately identify the authenticity of information sources, and reveal propagation importance measurement of microblog information and a false news detection mechanism become hot research directions in the field of complex networks. The method has the advantages that the false information network propagation characteristics are deeply mined, and the analysis of the false information propagation mechanism in the online social network is facilitated, so that the research of the false information propagation mechanism has very important scientific significance and practical application value.
Disclosure of Invention
In view of the above, the invention provides a social network false information propagation detection method based on the degree of motif. According to the invention, firstly, a motif algorithm is provided based on a directed motif structure of a complex network, statistical analysis is performed on a breadth propagation and depth propagation mechanism of a social network based on the motif algorithm, so that the analysis of a propagation mechanism of false information in an online social network is facilitated, the method is applied to false information propagation characteristic analysis based on motif, false information detection applied to a social media information propagation process, early propagation of false news suppression and other actual scenes, and the false information detection method is used for detecting the false information by identifying the network structure, so that important node identification and robot detection in news propagation play a role, and a novel and feasible way based on network structural characteristics is provided for false information detection.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the social network false information propagation detection method based on the degree of the motif is characterized by comprising the following steps of:
reading the real information and the false information, forwarding data completely, and cleaning the data;
constructing a time sequence false information forwarding network and a real information forwarding network;
designing a motif algorithm to calculate the breadth motif and the depth motif of a single network;
for all the forwarding data, a module degree algorithm model and a network x built-in network statistic method are applied to extract data characteristics of a real information forwarding network and a false information forwarding network;
training a classification model by utilizing the data characteristics;
and detecting false information of the data in the network by using the classification model.
Preferably, the training steps of the classification model are as follows:
constructing an XGBoost classification model, labeling real information and false information according to the data characteristics, inputting a training set of the labeled real information and the labeled false information into the XGBoost model for training, selecting MSE as a loss function, and obtaining the classification model after iterative parameter adjustment.
Preferably, the false information detection process is specifically as follows: inputting the information to be detected into the classification model, outputting classification probability according to a classification logistic regression objective function, setting a threshold value to be 0.5 as a classification limiting value, judging the classification result to be 0 class if the output probability is smaller than the threshold value of 0.5, otherwise judging the classification result to be 1 class, and evaluating the result through accuracy rate accuracies to realize false information detection on the data in the network.
Preferably, the data features include fusion motif features, structural heterogeneity features, structural viral features, and propagation features.
Preferably, the specific flow of the motif algorithm is as follows:
reading single event forwarding data;
constructing the event forwarding network topology, wherein the constructed network is a maximum communication network with no rights and no rings, and the root node is an event information publisher;
initializing a node breadth motif list, a node depth motif list and a node storage list;
traversing all nodes in the network: calculating the number of breadth propagation motifs and depth propagation motifs which can be generated by each node by taking each node as a father-son node, and recording the number as the node breadth motif degree and the node depth motif degree;
summing all node breadth degree of motif in the list and node depth degree of motif in the list respectively to obtain degree of motif of the network: breadth motif degree and depth motif degree.
Compared with the prior art, the social network false information propagation detection method based on the degree of the die body has the following beneficial effects:
1) The model degree algorithm is realized through design, and the breadth propagation and depth propagation modes of the social network are measured;
2) Qualitatively analyzing the propagation characteristics of real and false information networks;
3) And a plurality of network topological features based on the degree of the motif are fused, so that higher false information detection accuracy is realized, and false information can be effectively identified in an early information propagation period.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of false information detection based on the motif characteristic in the present invention;
FIG. 2 is a diagram showing a die body degree calculation flow chart in the invention;
FIG. 3 is a diagram showing the correlation between network structural features in the present invention;
FIG. 4a is a graph showing a microblog propagation scale distribution diagram in the present invention;
FIG. 4b is a graph showing a profile of structural viral characteristics according to the present invention;
FIGS. 5 (a) -5 (e) are diagrams showing network structures with the same propagation scale in the present invention;
FIG. 6a is a graph showing the temperature distribution of the actual information motif in the present invention;
FIG. 6b is a graph of the temperature distribution of the false information motif in the present invention;
FIG. 7a is a graph showing the distribution of the degree of the breadth motif in the present invention;
FIG. 7b is a graph showing the distribution of depth motif scatter in the present invention;
FIG. 8a is a graph showing the breadth motif profile of the present invention;
FIG. 8b is a graph showing the depth motif profile according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a social network false information propagation detection method based on degree of motif, which is shown in fig. 1 and comprises the following steps:
reading the real information and the false information, forwarding data completely, and cleaning the data;
constructing a time sequence false information forwarding network and a real information forwarding network;
designing a motif algorithm to calculate the breadth motif and the depth motif of a single network;
for all the forwarding data, a module degree algorithm model and a network x built-in network statistic method are applied to extract data characteristics of a real information forwarding network and a false information forwarding network;
training a classification model by utilizing the data characteristics;
and detecting false information of the data in the network by using the classification model.
The training steps of the classification model are as follows:
constructing an XGBoost classification model, labeling real information and false information according to data characteristics, inputting a training set of the labeled real information and the labeled false information into the XGBoost model for training, selecting MSE as a loss function, and obtaining the classification model after iterative parameter adjustment.
The false information detection process specifically comprises the following steps: inputting information to be detected into a classification model, outputting classification probability according to a classification logistic regression objective function, setting a threshold value to be 0.5 as a classification limiting value, judging a classification result to be 0 class if the output probability is smaller than the threshold value of 0.5, judging the classification result to be false information, otherwise judging the classification result to be 1 class, judging the classification result to be real information, and evaluating the result through accuracy rate accuracies to realize false information detection on data in a network.
The data features include fusion motif features, structural heterogeneity features, structural virus features, and propagation features.
As shown in fig. 2, the specific flow of the motif algorithm is as follows:
reading single event forwarding data;
constructing the event forwarding network topology, wherein the constructed network is a maximum communication network with no rights and no rings, and the root node is an event information publisher;
initializing a node breadth motif list, a node depth motif list and a node storage list;
traversing all nodes in the network: calculating the number of breadth propagation motifs and depth propagation motifs which can be generated by each node by taking each node as a father-son node, and recording the number as the node breadth motif degree and the node depth motif degree;
summing all node breadth degree of motif in the list and node depth degree of motif in the list respectively to obtain degree of motif of the network: breadth motif degree and depth motif degree.
Specifically, in this embodiment, an event-based microblog disclosure dataset collected by Ma et al is used, which collects forwarding propagation data of false rumors and real news. The complete forwarding data of the microblog event source is 4664 pieces in total, wherein 2313 pieces of false information microblog and 2351 pieces of real information microblog relate to 2746818 user nodes and 3805656 pieces of microblog forwarding. The false information comes from the micro-blog official website, and if the information reported by the source user of the micro-blog is false news, the micro-blog is regarded as the false information. The data set comprises information such as source microblog id, microblog superior user id, microblog forwarding user id, microblog release and forwarding time, microblog text content and the like. In order to create a cascading forwarding network, a microblog user is selected as a network node, and repeated edges between two nodes are removed and only one effective forwarding is reserved. According to the forwarding behavior among users, a chained propagation relationship is formed, and the following data are firstly extracted to obtain various characteristics of false news and real news:
1) Microblog superordinate user: the parent node of information propagation is in a transfer relationship with the forwarding user, and the source user node is also marked in the network, and is the initial node of information dissemination.
2) Microblog forwarding users: the unique user node for directionally forwarding the upper-level microblog information forwards the information, and the forwarding user has the functions of information propagation and source microblog influence expansion.
Because individual microblogs involve large orders of magnitude of user nodes, the propagation time is long, the propagation time of source microblogs of few hot events reaches 2-4 years, and similar extreme values have specificity in the news propagation process, the complex network structure of the extreme values can greatly influence the calculation of the number of the values and the overall distribution. While sampling a range of data samples will be representative, this embodiment will also only discuss the nature of the propagation of false information by the microblog network over the same propagation range.
Therefore, in 4664 event data of the microblog overall, only the microblog event samples with the number of user nodes within 2000 are researched, and the final use data set is 2133 false information microblog events and 2213 real information microblog events. The sample data account for 93.4% of the total data, can reflect the general rule in a certain transmission range, and basically does not influence the analysis process of the total data.
If the propagation mode of a microblog belongs to breadth propagation, the propagation depth of the microblog is usually lower. If the propagation of the microblog has a deep propagation characteristic, the information is forwarded in multiple stages, and then the information has a large propagation depth. In order to explore the propagation characteristics of the motif index of the microblog network, the embodiment calculates and constructs the measurement of each microblog event propagation network by using the indexes such as propagation depth, propagation scale, structural virus characteristics and the like of each microblog, and carries out correlation analysis based on the pearson coefficient on the indexes, network breadth motif and depth motif.
According to the constructed microblog propagation network, the linear correlation degree between two variables is measured by using a pearson correlation coefficient. Pearson correlation coefficients have been widely used in clustering and feature analysis, defined as:
wherein,and->Average values of sample values of variables X and Y, S X And S is Y Standard deviation of standard deviation sample values for the variable X and Y sample values, respectively. The value range of the pearson correlation coefficient is (-1, 1), when the variable Y increases with the increase of X, namely positive correlation rho is more than 0; when Y decreases with increasing X, namely, the negative correlation ρ < 0; when the coefficient is 0, it indicates that there is no correlation between the two variables.
According to the microblog network structure characteristic indexes of the counted false information and the real information, a characteristic correlation thermodynamic diagram is shown in fig. 3, and the color depth in the correlation thermodynamic diagram 3 represents the intensity of correlation of the corresponding row and column elements.
According to the results of fig. 3, in the structural features of the microblog propagation network, the breadth motif degree has strong positive correlation with the propagation scale of the microblog. Because the huge audience group can accelerate the broadcasting effect of the information after the microblog information is released, the microblog information can be diffused around an information source, and the breadth broadcasting model degree value is correspondingly increased. Meanwhile, correlation analysis shows that the depth motif degree also has a certain positive correlation with the propagation scale and the propagation depth. In all sample data, as the propagation scale is increased, the network depth is increased along with the complexity of the microblog network structure.
The complementary cumulative distribution function CCDF (complementary cumulative distribution function) of a network with a propagation scale greater than 100 in real information microblog and false information microblog is shown in fig. 4a and fig. 4b, wherein the ordinate axis CCDF reflects probability distribution of corresponding variables, which is the sum of occurrence probabilities of all the discrete variables greater than a certain value on the x axis, and fig. 4a and fig. 4b show the sum of corresponding probabilities when the propagation scale and structural virus characteristics are greater than a certain value. As described above, the scale of the transmission of false information can become very large relative to the transmission of real information, and false news is more likely to attract exponential forwarding and transmission. The structural virus characteristics of the true and false news in the same figure 4b also obviously reflect the huge difference of the microblog information in the propagation process, and the difference displayed by the distribution of the value range of the structural virus characteristics also shows that the network structure of the true and false news has the characteristics with obvious difference as a whole.
The difference of true news and false news generated on the propagation mode can be intuitively reflected on a formed network structure, and the false news is more viral than the true news (K-S test-0.610, p-value-0) in terms of structural virus characteristics, has a longer average path, has the average depth of false information larger than the true information (K-S test-0.438, p-value-0) in terms of propagation depth, and meanwhile, the verification is still established on the whole microblog data set, so that the method has certain universality. Table 1 counts details of real and false news data, where the numerical value represents the mean of the false information and the real information network structural features:
TABLE 1
Network architecture features False information Real information
Scale of propagation 334 419
Depth of propagation 6.2 3.5
Structural viral characteristics 3.6 2.4
Breadth die body degree 45722 117275
Degree of depth phantom 161.9 77.8
The data in table 1 illustrates that the false information and the real information network have obvious differences in sample data within the same propagation scale, the false information network is structurally represented to have deeper propagation depth and relatively smaller propagation scale, and the real information tends to have larger breadth degree of motif values in terms of degree of motif, and the depth degree of motif is smaller than that of the false information network.
In the early stages of news diffusion, the microblog network structure shows instability, and false news can also form a star-shaped structure, but in the whole life cycle, the structural characteristics of the two types of information show a certain difference, as shown in table 1. The false news is spread in a deeper and more complex network structure, and the false news is spread in a deeper network, possibly because of more forwarding caused by firm connection among acquaintances, and most of the true news is spread around an information source, such as authority mechanisms of large V authenticated users, official account numbers, government organizations and the like, and part of non-official users also spread the true news, so that burst broadcasting of the news is realized. It can be said that false news is more mobile, and there is a bias and instability, so that a significant difference is presented from the way in which true news is transmitted.
Of course, the spreading of false news also involves the category of news, psychology and sociology, and its spreading pattern is not only dependent on text content and node information, for example, research findings: rumors are topics that are of greater interest and are more perceived as important. The higher the importance of this topic, the wider the popularity of false information and the more dramatic and flaring the rumors are of interest. People tend to prefer news, either dramatic or entertaining, and most of the content of false news runs counter to people's knowledge of objective facts, perhaps one of the reasons it is easy to attract attention. If one manufactures rumors, contrary to one's expectations, the likelihood of obtaining a retransmission is higher, because the propagation of rumors tends to be unintentionally advanced while participating in the topic discussion process, thereby resulting in the spreading of false news.
Microblogs of approximately the same propagation scale can have distinct network structures, and five microblogs of which the propagation scale is in a (99-111) interval are screened out to quantitatively analyze propagation influence factors of the microblog network in the approximately same propagation scale, and a propagation hierarchical structure is constructed, as shown in fig. 5. The networks in fig. 5 (a) and 5 (b) are true information microblogs, and the networks in fig. 5 (c) -5 (e) are false information microblogs. According to microblog network structure analysis, under the approximately equal propagation scale, the information diffusion can be completely driven by breadth propagation, and all users receive a message from one source; and the information can be transmitted through a plurality of offspring and branches to form a longer chain structure.
In the microblog network of fig. 5 (a), the information is completely broadcast-driven, and the microblog appears as public information issued by authentication accounts, such as public institution accounts of people daily necessities, new talents and the like, and a large amount of low-depth forwarding exists, so that the information broadcast is spread to all listeners, and multi-level deep forwarding is not triggered. In the networks of fig. 5 (c) -5 (e), the small-scale forwarding behavior between friends is usually shown, but the multi-level one-to-one information transmission is accompanied, and the finally formed microblog network shows strong deep transmission characteristics, so that a complex network with a plurality of star structures or long-chain transmission is often formed. The above-mentioned network is a special case in all microblog networks, but in practice, most of the microblog propagation processes eventually form network structures in the forms of fig. 5 (b) and 5 (c), that is, the mixture of two propagation modes is a main reason for diffusing the driving information, which indicates that the information propagation is driven by the common driving action of breadth and depth propagation mechanisms.
Due to the difference of microblog life cycles and information contents, the difference of the topological structures formed finally is obvious. In order to describe the die body degree distribution and the specific influence of the die body degree and the propagation scale, in the embodiment, the breadth die body degree and the depth die body degree of the true and false information microblog are respectively projected to a two-dimensional plane, and Euclidean distance normalization from the projection point to the origin of a coordinate system is used as an index for measuring the network propagation importance, and the index reflects the influence generated in the microblog propagation. The content and the event with stronger influence are more easy to obtain a large number of forwarding and spreading, and meanwhile, the breadth spreading and the depth spreading can also have certain influence on the microblog network structure correspondingly. The graph of the heat map of the degree of motif scattered points is shown in fig. 6a and 6b, the red solid line is the average value of the degree of depth motif, and the green solid line is the average value of the degree of breadth motif. And in the overall distribution condition, the depth motif degree of the false information microblog is higher than that of the real information, and the average value of the breadth motif degree of the real information microblog is also higher than that of the false information. Networks with strong propagation importance tend to be the result of the co-action of breadth-propagation and depth-propagation, with breadth-propagation dominate.
To explore the differences in the major driving factors of the propagation scale of false information and real information networks, and to further analyze the laws responsible for such differences, the correlation of different information with propagation scale is reflected by fig. 7a and 7 b. In the figure, black solid lines are theoretical maximum values of the range of the motif values of the star network and the chain network which are cascaded under the current transmission scale respectively, and when the breadth motif degree takes the theoretical maximum value, the scale is B corresponding to the network with n M The theoretical maximum value of the depth motif is n-2, which is (n-1) × (n-2)/2. The true and false news motif degrees in the microblog networks are obviously distinguished under the condition of similar propagation scale, and the breadth motif degrees of the star networks in the real information are in linear relation with the corresponding propagation scale. In contrast to false news, the breadth motif of real news is larger overall than that of false news at the same scale of propagation in fig. 7a, and the distribution is more concentrated, so real news is driven more in the breadth-type form of propagation. The false news network depth motif in fig. 7b is relatively more convergent to the depth motif maxima, while the true news distribution is extremely discrete, indicating that the false news network structure is dominated by the depth propagation form.
In order to more intuitively observe the distribution difference of the die body degree, the network die body degree is subjected to the process of dispersion standardization (min-max normalization), and the proportion of the network in each die body degree interval is counted by scaling the die body degree to be within the [0,1] interval, so that the possibility that the die body degree distribution corresponds to the true news is reflected. As shown in fig. 8a and 8b, the depth motif distribution of the false information network is more concentrated in a region with larger value, compared with the true information distribution, the distribution is more obvious, and the network with larger breadth motif value in the relative true information also has larger duty ratio. In combination with the depth motif definition analysis, if the depth motif of a network approaches its theoretical maximum, the probability that the content corresponding to the network is false news is also greater. This also reveals that the structure of the false news network is more complex, while the true news network exhibits a more stable structural layout, with the breadth of propagation of the single propagation source taking the dominant role of the overall propagation process.
From the perspective of text information, user attributes and time sequence characteristics, rumors are detected by identifying text characteristics of a push text, user attributes and time variation trends and using machine learning and deep learning classification algorithms. The method generally has higher classification accuracy, but neglects the effect of network structural features in false news network detection. The structural heterogeneity characteristic (structural heterogeneity) proposed by Zhao et al is a measurement method based on a network structure, the index reflects the difference between a propagation network and a star network with the same size, and the microblog network of an unknown type of the method has relatively high identification precision in relatively short forwarding time.
The embodiment extracts the degree features of the motif of the microblog propagation network, and constructs a supervised classification model based on XGBoost to classify the true and false information. The XGBoost model is a packaging method based on a feature selection method, and features can be evaluated according to the performance of a classifier by training the classifier model. The detection method based on the breadth and depth motif features is compared with the detection method based on the structural heterogeneity features and the structural virus features, feature calculation is carried out on the complete life cycle (1 st forwarding to the latest 1 forwarding) of microblog news data and the first 3 hours of news release, and 3 false information classification Accuracy rates based on the network structure features and fused with 3 network structure features are compared by using Accuracy Accurcy indexes. As shown in Table 2, the false information detection method based on the motif characteristic is found to have higher classification accuracy in the complete life cycle and early propagation period of the microblog network compared with the structural heterogeneity characteristic and the structural virus characteristic. And after the 3 network structure characteristics are integrated, the model prediction accuracy is further improved.
TABLE 2
To further verify the validity and versatility of false information detection in more social media platform false information propagation network data using structural features based on motif degrees, experiments used two Twitter published data sets based on Ma et al: twitter15 and Twitter16. The data set subdivides rumor data into 4 types, and according to the forwarding relation and time sequence of Twitter rumor information, a cascade propagation tree can be constructed, and the statistical summary of the data set is shown in table 3.
TABLE 3 Table 3
Statistics Twitter15 Twitter16
Number of user nodes 276663 173487
Number of Source twitter 1490 818
Non-rumor 374 205
False rumors 370 205
True rumor 372 205
Unproven rumors 374 203
In this embodiment, the data of Twitter15 and Twitter16 are preprocessed and combined, and all 2308 Twitter pieces are used to perform detection accuracy comparison of Twitter false news based on XGBoost multi-classification model by adopting the same feature extraction method as that of microblog data. Table 4 summarizes the results of the 4 classifications, and also in the classification accuracy comparison based on 3 network structure features, the motif-degree features have better recognition accuracy, and the fused network structure features can more effectively perform the false news detection of the Twitter network.
TABLE 4 Table 4
The research result based on the microblog and Twitter platform shows that the network structure characteristic can obtain higher false information detection accuracy and can realize detection of false news in early news propagation even if text characteristics, user attribute characteristics and time sequence characteristics are not constructed.
In this embodiment, the important performance index is defined as follows:
definition 1. Breadth motif (breadth motif degree, BM) breadth motif reflects the broadcasting effect in the information propagation process and is a main constituent of the star network structure. Typical spreadingThe scattered propagation characteristic is caused by a single influencing node i, and the number of the breadth propagation motifs which can be generated by the node i is node breadth motif bm i And the node breadth motif degree bm corresponding to the total number n of the nodes contained in the network i The sum is the breadth motif degree B of the network M The value range of the breadth die body degree is B M ∈[0,(n-1)*(n-2)/2]When B M And taking the maximum value, the network structure is a complete star-shaped topological structure with the depth of 1.
Definition 2. Depth motif (depth motif degree, DM.) depth motif reflects the effect of depth propagation in the information propagation process, wherein nodes only directly affect adjacent branches, and depth propagation motifs propagate information to deeper layers of the network, making modeling of the network structure more complex and distances between nodes longer. Similarly, the number of depth propagation motifs that can be generated by node i is the node depth motif degree dm i Node depth motif dm corresponding to total number n of nodes i The sum is the depth motif D of the network M The value range is D M ∈[0,n-2],D M When the maximum value is taken, two conditions exist, namely, the network is a chain structure network completely, the root node only forwards once, and the child nodes only spread the information widely.
The breadth degree and the depth degree of the network are global structural characteristics of the propagation network, and the index ignores attribute differences of nodes in the network and only examines macroscopic features of the whole propagation network.
And 3, defining the propagation depth (depth) of the microblog, wherein the distance between adjacent nodes is 1 under the assumption that the microblog forms a directed unweighted graph in the propagation process, and the longest distance from the source user node to other nodes is the propagation depth of the microblog.
Definition 4. Microblog propagation size (scale). The propagation size of a microblog is defined as the sum of the number of all nodes in the microblog propagation network.
Definition 5 structural viral characteristics (structural virality) the index is based on the shortest average distance between all nodes, which is defined as:
where V is the set of all nodes, d (s, t) represents the shortest path length between node s and node t in the network, and n represents the number of nodes in the network. As structural viral features approach 2, their network structure approaches a completely broadcast star-like structure. The structural virus transmission characteristics proposed by Goel et al mainly consider the network structural characteristics caused by the transmission mechanism. The transmission capacity of news may not only depend on the transmission scale, but also the complexity of the network structure formed by transmission can reflect the virus transmission characteristics of the information, and the structural virus characteristics can measure the diversity and complexity of the transmission structure.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined in this embodiment may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (4)

1. The social network false information propagation detection method based on the degree of the motif is characterized by comprising the following steps of:
reading the real information and the false information, forwarding data completely, and cleaning the data;
constructing a time sequence false information forwarding network and a real information forwarding network;
designing a motif algorithm to calculate the breadth motif and the depth motif of a single network;
for all the forwarding data, a module degree algorithm model and a network x built-in network statistic method are applied to extract data characteristics of a real information forwarding network and a false information forwarding network;
training a classification model by utilizing the data characteristics;
false information detection is carried out on data in the network by utilizing a two-class model;
the specific flow of the motif algorithm is as follows:
reading single event forwarding data;
constructing the event forwarding network topology, wherein the constructed network is a maximum communication network with no rights and no rings, and the root node is an event information publisher;
initializing a node breadth motif list, a node depth motif list and a node storage list;
traversing all nodes in the network: calculating the number of breadth propagation motifs and depth propagation motifs which can be generated by each node by taking each node as a father-son node, and recording the number as the node breadth motif degree and the node depth motif degree;
summing all node breadth degree of motif in the list and node depth degree of motif in the list respectively to obtain degree of motif of the network: breadth motif degree and depth motif degree.
2. The social network false information propagation detection method based on the degree of motif according to claim 1, wherein the training step of the classification model is as follows:
constructing an XGBoost classification model, labeling real information and false information according to the data characteristics, inputting a training set of the labeled real information and the labeled false information into the XGBoost classification model for training, selecting MSE as a loss function, and obtaining the classification model after iterative parameter adjustment.
3. The social network false information propagation detection method based on the degree of motif according to claim 1, wherein the false information detection process is specifically as follows: inputting the information to be detected into the classification model, outputting classification probability according to a classification logistic regression objective function, setting a threshold value to be 0.5 as a classification limiting value, judging the classification result to be 0 class if the output probability is smaller than the threshold value of 0.5, otherwise judging the classification result to be 1 class, and evaluating the result through accuracy rate accuracies to realize false information detection on the data in the network.
4. The method for detecting false information propagation of social networks based on degree of motif according to claim 1, wherein the data features include fusion degree features, structural heterogeneity features, structural virus features and propagation features.
CN202110309368.6A 2021-03-23 2021-03-23 Social network false information propagation detection method based on degree of motif Active CN112819645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110309368.6A CN112819645B (en) 2021-03-23 2021-03-23 Social network false information propagation detection method based on degree of motif

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110309368.6A CN112819645B (en) 2021-03-23 2021-03-23 Social network false information propagation detection method based on degree of motif

Publications (2)

Publication Number Publication Date
CN112819645A CN112819645A (en) 2021-05-18
CN112819645B true CN112819645B (en) 2024-03-29

Family

ID=75862322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110309368.6A Active CN112819645B (en) 2021-03-23 2021-03-23 Social network false information propagation detection method based on degree of motif

Country Status (1)

Country Link
CN (1) CN112819645B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114218457B (en) * 2021-11-22 2024-04-12 西北工业大学 False news detection method based on forwarding social media user characterization
CN115205061B (en) * 2022-07-22 2023-05-05 福建师范大学 Social network important user identification method based on network motif

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832353A (en) * 2017-10-23 2018-03-23 同济大学 A kind of social media platform deceptive information recognition methods
CN108153884A (en) * 2017-12-26 2018-06-12 厦门大学 A kind of analysis method of microblogging gossip propagation
CN109685153A (en) * 2018-12-29 2019-04-26 武汉大学 A kind of social networks rumour discrimination method based on characteristic aggregation
CN110493045A (en) * 2019-08-19 2019-11-22 大连民族大学 A kind of directed networks link prediction method merging multimode body information
CN111428151A (en) * 2020-04-20 2020-07-17 浙江工业大学 False message identification method and device based on network acceleration
CN111460144A (en) * 2020-03-12 2020-07-28 南京理工大学 Rumor early detection algorithm based on time sequence cutting and fusion
CN111639252A (en) * 2020-05-18 2020-09-08 华中科技大学 False news identification method based on news-comment relevance analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11310162B2 (en) * 2019-02-22 2022-04-19 Sandvine Corporation System and method for classifying network traffic

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832353A (en) * 2017-10-23 2018-03-23 同济大学 A kind of social media platform deceptive information recognition methods
CN108153884A (en) * 2017-12-26 2018-06-12 厦门大学 A kind of analysis method of microblogging gossip propagation
CN109685153A (en) * 2018-12-29 2019-04-26 武汉大学 A kind of social networks rumour discrimination method based on characteristic aggregation
CN110493045A (en) * 2019-08-19 2019-11-22 大连民族大学 A kind of directed networks link prediction method merging multimode body information
CN111460144A (en) * 2020-03-12 2020-07-28 南京理工大学 Rumor early detection algorithm based on time sequence cutting and fusion
CN111428151A (en) * 2020-04-20 2020-07-17 浙江工业大学 False message identification method and device based on network acceleration
CN111639252A (en) * 2020-05-18 2020-09-08 华中科技大学 False news identification method based on news-comment relevance analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Sign Prediction by Motif Naive Bayes Model in Social Networks;Si-Yuan Liu, et al;《Information Sciences》;第1-19页 *

Also Published As

Publication number Publication date
CN112819645A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
Shi et al. Detecting malicious social bots based on clickstream sequences
Budak et al. Structural trend analysis for online social networks
Ferrara Measuring social spam and the effect of bots on information diffusion in social media
Hristakieva et al. The spread of propaganda by coordinated communities on social media
CN112819645B (en) Social network false information propagation detection method based on degree of motif
CN104866586B (en) The method and system of node importance are propagated for calculating information in Social Media
Cao et al. Collusion-aware detection of review spammers in location based social networks
Li et al. Modeling review spam using temporal patterns and co-bursting behaviors
Liu et al. SDHM: A hybrid model for spammer detection in Weibo
Bródka A method for group extraction and analysis in multilayer social networks
Wang et al. Profiling the followers of the most influential and verified users on Sina Weibo
Li et al. Model-based non-Gaussian interest topic distribution for user retweeting in social networks
Dong et al. The analysis of influencing factors of information dissemination on cascade size distribution in social networks
Cui et al. Identification of Micro-blog Opinion Leaders based on User Features and Outbreak Nodes.
Weber et al. A general method to find highly coordinating communities in social media through inferred interaction links
Foysal et al. Classification of AI powered social bots on Twitter by sentiment analysis and data mining through SVM
Zhang et al. A social network water army detection model based on artificial immunity
Ehsani Chimeh et al. Spam Detection from Big Data based on Evolutionary Data Mining Systems
Raja et al. Detecting national political unrest on twitter
Yuan et al. On predicting event propagation on weibo
CN114817762B (en) Malicious body identification method for public opinion information of bulk commodity in microblog
Changjun 2 The rules of information diffusion in social networks
Wan Anomaly detection method of social media user information based on data mining
Liu et al. Modeling of political web pages spreading in WeChat networks
Wang et al. User credibility assessment based on trust propagation in microblog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant