WO2020000847A1

WO2020000847A1 - News big data-based method and system for monitoring and analyzing risk perception index

Info

Publication number: WO2020000847A1
Application number: PCT/CN2018/113857
Authority: WO
Inventors: 郑晴晓; 程国艮
Original assignee: 中译语通科技股份有限公司
Priority date: 2018-06-25
Filing date: 2018-11-03
Publication date: 2020-01-02
Also published as: CN112765442A

Abstract

A big news data-based method and system for monitoring and analyzing a risk perception index, comprising: employing a social risk amplification theory and panic psychology propagation to perform real-time statistics on data in a database according to risk perception index division dimensions and indicators so as to obtain specific values of the indicators; utilizing a neural network model to construct a calculation model of the risk perception index, inputting a corpus, and determining the risk perception index by means of machine learning, matching a weight of each dimension and comprehensively calculating the indicators. On the basis of big data monitoring technology, after collecting the data, the data in the library is further processed, and the concept of social panic is quantified according to the theoretical basis of social and sexual science, so that the risk perception index becomes a measurable indication of the degree of social panic. The present invention may more easily and quickly show the psychological state of a society and guide policy making in all aspects of politics and economy.

Description

Method and system for monitoring and analyzing panic index based on news big data

Technical field

The invention belongs to the technical field of big data monitoring analysis and emotion measurement, and particularly relates to a panic index monitoring analysis method and system based on news big data.

Background technique

Social panic refers to a wide range of fear and anxiety caused by something unexpected, such as public safety panic caused by “911”, public health panic caused by “bird flu”, etc. These events have passed various media It has different degrees of impact on the global public, and the level of public panic has also changed over time and other factors. The well-known Chicago Board Options Exchange Volatility Index (VIX-Chicago Board Options Exchange Index) and China Po Index (iVIX) and GRPI are both called "panic indexes", and the biggest difference between them and the GPRI global panic index is that ——The compilation of VIX and iVIX is calculated based on the prices of the subscription and put options in different months and adjacent months of the S & P500 index. It is based on the implied volatility of the options. The difference is that GRPI can be basically independent of indexes such as S & P500 when measuring panic. GRPI has a completely separate data warehouse. It uses global news big data and calculates through complex algorithms. The efficiency of GRPI calculation depends greatly on the big data. Scale and structure.

The GRPI Global Panic Index (Global Risk Index) is an index standard used to measure the degree of panic fluctuations of global media and netizens on events under historical and timely conditions. It is calculated using media report data and netizen social activity track data. Social panic refers to a wide range of fear and anxiety caused by something unexpected, such as the public safety panic caused by "911", the degree of public panic also changes with time and other factors, and due to "avian flu" Caused by public health panic, etc., these events have affected the global public to varying degrees through various media. Internet public opinion is transmitted through the Internet, and the public has strong influence and tendentious opinions and opinions on some hot spots and focus issues in real life. Internet public opinion is mainly reflected and strengthened through online news and social media.

With the rapid development of the Internet, online media, as a new form of information dissemination, has penetrated into people's daily lives. People are used to using the Internet to receive and publish information. Once major domestic and international events occur, online public opinion will be formed immediately. When people express their opinions and spread their ideas through the Internet, they sometimes form huge social forces. With the characteristics of fast timeliness and strong interactivity of online media, social panic is concentratedly reflected in online public opinion. In recent years, with the development of big data technology, some methods and systems for monitoring and analyzing public opinion based on big data have emerged. For example, the existing big data-based public opinion analysis method is based on the Hadoop distributed computing platform, which collects network data, then performs data preprocessing, and extracts hot events, public opinion analysis and deduction. Among them, Hadoop is an open source distributed computing platform, and its core includes HDFS (Hadoop Distributed File Systems). HDFS's high fault tolerance and high scalability allow users to deploy Hadoop on low-cost hardware, build distributed clusters, and form distributed systems. HBase (Hadoop DataBase, Hadoop database) is a distributed database system built on a distributed file system HDFS that provides high reliability, high performance, column storage, scalability, and real-time read and write. It is mainly used to store unstructured and Semi-structured loose data. Existing big data-based Internet public opinion monitoring and analysis systems obtain network information collected by network information through a network information collation module, and perform keyword extraction on the network information. The website credit evaluation module evaluates the website of the network information source in real time, and the public opinion tendency analysis module refers to the trustworthy weight when calculating the emotional tendency. Obtain public opinion information based on the network, classify the acquired information according to keywords, and make overall judgments on public opinion based on emotional tendencies.

At present, there are many monitoring systems for news or social media platforms at home and abroad, but the monitoring results remain in the areas of simple hot news / topic presentation, heat trend forecasting, etc. The information that can be provided is relatively superficial. If you need to make a decision, you need further A lot of manual analysis and processing of information. The monitoring of the degree of panic caused by special social events is mainly focused on the monitoring of crowd panic caused by emergencies in the field of public safety, including image processing analysis of monitoring pictures and simulation of crowd behavior in panic situations. The monitoring of the degree of panic caused by special social events is the micro-monitoring of the panic crowd. It is mainly used to solve the problem of the expansion of the consequences of an accident, such as stamping out. Not applicable in major events. In addition, the current global monitoring system indicators are not comprehensive enough and the calculation accuracy is low.

Summary of the invention

Aiming at the problems existing in the prior art, the present invention provides a method and system for monitoring and analyzing network emotional fluctuations based on news big data, which aims to solve the current method of monitoring the degree of panic caused by special social events from a micro perspective. The monitoring results of panic crowds are mainly used to solve the expansion of the consequences of accidents such as trampling, and it is impossible to observe the impact of the event on the whole society in a macroscopic way. The problem of low accuracy.

The present invention is achieved in this way. A method for monitoring and analyzing Internet emotional fluctuations based on news big data, adopts social risk amplification theory and panic psychological transmission, and performs real-time statistics on data in the database by dividing dimensions and indicators according to the panic index. The specific value of the index is obtained; the calculation model of the panic index is established by using the neural network model, the corpus is input, the weights of each dimension are matched by machine learning, and the index is comprehensively calculated to determine the panic index.

Further, the method for monitoring and analyzing online mood fluctuation based on news big data specifically includes:

Step 1: Establish a massive database: including:

a) Collection: Use existing technology to collect news media and social media. Social media is mainly based on the Python programming language for full collection through the open data interface of Weibo or Facebook, and is directly stored in the data center through the formation of news queues; news media must be collected through domestic and overseas news web pages through the dispatcher, Browser, task manager, text parsing, storage, data management, etc., to traverse a given news data source, find out the list pages for further collection, and the scheduler sends the list pages to each collector through the task manager to collect The crawler crawls the list page to get the html page of the article.

b) Processing: Structured data is processed through the data management algorithm of the data center to obtain structured data;

c) Storage: Finally, the post-structured data is stored in the Nosql database, and finally the data in the database is imported into the data governance module through the message queue; the data governance module labels the data with a response through the governance algorithm and applies the mainstream sentiment algorithm A sentiment calculation is performed on each article to get sentiment labels. For example: News media tags include title, abstract, body text, keywords, time, emotion; social media includes account name, post content, repost content, comment content, likes, fans, etc., post time, emotion, etc. The processed data can be used for data calling, mining, and machine learning based on further needs.

d) Use big data to collect all relevant media information to form a database. Collect all news and social media data including monitoring keywords. The data collection tag contains data content, publishing media or social network publishing account, publishing time, regional attributes, and put it into the database. Let the database collection be, in chronological order Permutation, where the news media information is N {n ₁ , n ₂ , n ₃ ...... n _n }, and the social media information is S {s ₁ , s ₂ , s ₃ ...... s _m } To get a data set of W {N, S}.

Step 2: Divide the dimensions and indicators according to the panic index and perform real-time statistics to obtain the specific values of each indicator;

Step 3: Use the neural network model to build a calculation model of the panic index, input the existing corpus, and rely on machine learning to repeatedly and repeatedly train the panic index model. This includes:

A. Neural network-based machine learning model construction: Before using the neural network model to build a calculation model of the panic index, the data needs to be normalized; according to the normalized features, the model is trained using a multilayer fully connected neural network structure . According to the normalized features, a multi-layer fully connected neural network structure is used to train the model; where Layer L ₁ is the input layer, which represents the value corresponding to each feature; Layer L ₂ is the hidden layer, and the hidden features are calculated; Layer L ₃ is Output layer, output the final result;

B. Manually mark a batch of corpora that can be used for machine learning: Put no less than 10,000 corpora into a neural network-based machine learning model for machine learning and training, so that the calculation result of each group a1 ～ h4 data is equal to the GPRI corresponding to each group As a result, after the "machine-to-index calculation result" passes the model, the obtained result can approach the goal of "expert group score result" indefinitely. Based on the trained model, for all news and social media texts at the current moment, the panic index calculation model uses feature extraction to extract corresponding features; the extracted features are input to the input layer of the multi-layer fully connected neural network structure training model, The result is obtained by the forward propagation algorithm and used as the input of the next layer model. After the calculation of the three layer model, the final panic value is obtained.

Another object of the present invention is to provide a security monitoring system using the method for monitoring and analyzing network mood fluctuations based on news big data.

Another object of the present invention is to provide a security analysis system using the method for monitoring and analyzing network mood fluctuations based on news big data.

Another object of the present invention is to provide a security early warning system using the method for monitoring and analyzing network mood fluctuations based on news big data.

The advantages and positive effects of the invention are:

(1) The integrated monitoring method for news media and social media of the present invention adds text-based semantic analysis and feature extraction on the basis of statistics and analysis of public opinion volume, emotions, and hotspots, which enriches the dimension of public opinion monitoring and enables event monitoring. The degree of panic was tracked and monitored more accurately, which resolved the problem of insufficient comprehensive monitoring system indicators. In addition, after the model is trained, the calculation time of the model parameters and weights of the present invention is equivalent to the current monitoring system time in practical applications, reducing the complexity of the model in actual application. The invention analyzes the panic degree of the public from the monitoring of network public opinion and assists various decision-making. For enterprises, the degree of social panic in related fields is an important indicator of market changes and an important criterion for investment development. If social panic is not valued, it may affect the survival of the company. For relevant government departments, the immediate monitoring of the degree of public panic in online public opinion has important practical significance for the active resolution of the crisis of online public opinion, the maintenance of social stability, and the promotion of national development.

(2) The present invention uses crawler technology and other data sources to cover the network and other types of data. Computer data is used to automatically collect, intelligently analyze, all-round structured, and mass storage, which solves the problem of massive coverage and analysis of information sources. accumulation. In order to improve the accuracy of monitoring results, the invention continuously updates the reserve data and algorithm learning iterative basis; the monitoring process takes the keywords entered by the user as the core, and counts the dimensions of time, content, quantity, identity and other dimensions in the dissemination of public opinion. Comprehensive analysis of transmission characteristics, comprehensive analysis of the multi-factor effect and common effect of panic in the dissemination of public opinion, the monitoring results are more accurate.

(3) The present invention compares real-time and historical data with a database through semantic analysis technology, covering more details of public opinion, more comprehensively analyzing the content tendency of users in public opinion, and better for monitoring the degree of panic. Master; collect and analyze massive data through big data technology, expand the analysis of sample data and cases, make full use of a large number of historically accumulated cases, and start from the social amplification theory of risk, divide panic into statistical models of multiple indicators, and then The neural network learning to generate the panic index calculation model is more scientific and reasonable. The statistical indicators and calculation models have been continuously improved and reached a certain degree of accuracy.

(4) The present invention adds a statistical module and calculation based on the big data network monitoring technology (based on the Hadoop distributed computing platform, which collects network data and then performs data pre-processing "collection and pre-processing system"). The module uses the preset monitoring indicators, standardized statistical models, and intelligent algorithm models of neural networks to monitor the degree of panic in the public opinion expressed by the public in a specific event occurrence and development scenario.

(5) The present invention integrates automatic collection and feature extraction to determine multiple dimensions and multiple monitoring indicators of an event. Through statistical analysis of news and social media text information obtained within a certain time range, real-time information about a specific event is obtained. Panic Index. Through the data service provided by the present invention, the government, enterprises, and related organizations can grasp the change of the panic index of the incident at the first time, and when the panic value exceeds a certain range, it can timely make a reasonable response.

(6) Based on the big data monitoring system, the present invention monitors social panic caused by events from a macro perspective through big data real-time collection technology, big data database technology, big data processing and statistical technology, and neural network algorithms. The invention overcomes the existing shortcomings of manual methods for combing, discriminating, analyzing, and inefficiently relying on knowledge and experience after presenting data from a big data monitoring system; it expands the traditional monitoring scope of social panic and is no longer limited to a specific time The panic in the place is another way to achieve the social panic level through big data and semantic analysis technology using neural network algorithms to greatly improve the recognition accuracy, discrimination efficiency and applicable scenarios of social panic.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of a method for monitoring and analyzing a panic index based on news big data provided by the present invention.

FIG. 2 is a diagram for training a model using a multi-layer fully connected neural network structure according to normalized features provided by the implementation of the present invention; FIG.

In the figure: Layer L ₁ is the input layer, which represents the value corresponding to each feature; Layer L ₂ is the hidden layer, and the hidden features are calculated; Layer L ₃ is the output layer.

FIG. 3 is a diagram of a forward propagation algorithm for outputting a final result in a training model using a multi-layer fully connected neural network structure provided by the implementation of the present invention.

FIG. 4 is a schematic diagram of an operation process in the calculation of the global panic index provided by the implementation of the present invention.

FIG. 5 is a schematic diagram of a panic index monitoring and analysis system based on news big data provided by the present invention.

In the figure: 1. Database formation module; 2. Panic index value acquisition module; 3. Panic index acquisition module.

detailed description

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention.

The present invention adds text-based semantic analysis and feature extraction on the basis of statistics and analysis of global sound volume, emotion, and content. The subject and length of news reports, social commentary, news and social sentiment, regional characteristics, transmission time, and path And stock market indexes, which have enriched the dimension of big data monitoring, sensitive and accurate tracking and monitoring of the global network as a whole, specific events or panic levels within a specified time range, and resolved the current global monitoring system indicators are not comprehensive enough, and the calculation accuracy is not high. High problem. After the model is trained, the calculation time of the model parameters and weights in the present invention is equivalent to the current monitoring system time in practical applications, reducing the complexity of the model in actual application.

The invention collects data through information channels such as the network, and establishes a corresponding database. The database construction method is as follows: Establish a thesaurus by a linguistic expert: ① Construct a multilingual thesaurus for words that express "protesting behavior", "radical speech", "political controversy", and "political mobilization". ② Multilingual expressions including "collective petitions", "collective strikes", "violent group fights", "violent assaults", "political rallies", "demonstrations", "ethnic conflicts", "religious conflicts" and "turmoil" Thesaurus building. ③ Construct multilingual thesaurus for vocabulary expressing helplessness such as "helplessness", "can't change" and "what can be done". ④ Construct multilingual thesaurus for vocabulary that does not understand such as "how can", "don't understand", "don't understand", "unscientific". ⑤ Construct multilingual thesaurus for vocabulary expressing anxiety such as "worry", "annoyance" and "anxiety".

Based on big data monitoring, the present invention quantifies the concept of panic into a panic index, and divides the measurement of panic into multiple dimensions and indicators based on a statistical model of data based on the theory of social risk amplification and the spread of panic psychology. A neural network model is used to build a panic index calculation model. Input the existing corpus and rely on machine learning to form a complete algorithm. The panic index algorithm provided by the embodiment of the present invention is based on big data monitoring technology. After the data is collected, the data in the database is further processed. According to the theoretical foundation of society and psychology, the concept of social panic is quantified, so that the panic index becomes Measurable indicators of social panic. It can more easily and conveniently show the social psychological state and guide decision-making in all aspects of politics and economy.

The monitoring and analysis method provided by the present invention is described in detail below with reference to the accompanying drawings.

As shown in FIG. 1, a method for monitoring and analyzing a panic index based on news big data according to an embodiment of the present invention includes the following steps:

S101: Establish a massive database.

S102: According to the theory of social risk amplification and the theory of panic psychology, the data in the database is divided into multiple dimensions and multiple indicators according to the panic index to perform real-time statistics, and specific values of each indicator are obtained.

S103: Use the neural network model to build a calculation model of the panic index, input the existing corpus, and rely on machine learning to repeatedly and repeatedly train the panic index model to match the weights of various indicators.

S104: Each time the panic index of the target object is calculated, the statistics of multiple indicators are calculated based on the massive data for each target of the target, the statistical results are put into the trained panic model for calculation, and the panic index is finally output.

In step S101, the method for forming a database mainly includes the following steps:

The first step is to collect: use existing technology to collect news media and social media. Social media is mainly based on the Python programming language for full collection through Weibo or Facebook's open data interface, and it is directly stored in the data center through message queues. The collection of news media mainly includes the breadth traversal of a given news data source through domestic and overseas news web pages, through schedulers, collectors, task managers, text parsing, storage, data management, etc. The scheduler sends the list page to each collector through the task manager, and the collector crawls the list page to get the html webpage of the article.

The second step is processing: through the data management algorithm of the data center, the data is structured to obtain structured data. For example, the text parsing module performs text parsing on the html webpage, extracts the article title, article publication time, and article body content, and removes garbled characters in the article.

The third step is to store: store the post-processed data in the Nosql database, and import the data in the database to the data management module through the message queue. The data management module labels the data with a response through the management algorithm, and applies mainstream sentiment algorithms to Each article is sentiment calculation to get sentiment labels. For example: News media tags include title, abstract, body text, keywords, time, emotion; social media includes account name, post content, repost content, comment content, likes, fans, etc., post time, emotion, etc. The processed data can be used for data calling, mining, and machine learning based on further needs.

The fourth step is to use big data to collect all relevant media information to form a database. Collect all news and social media data including monitoring keywords. The data collection tag contains data content, publishing media or social network publishing account, publishing time, regional attributes, and put it into the database. Let the database collection be, in chronological order Permutation, where the news media information is N {n ₁ , n ₂ , n ₃ ...... n _n }, and the social media information is S {s ₁ , s ₂ , s ₃ ...... s _m } To get a data set of W {N, S}.

In step S102, the eight dimensions divided by the panic index are: degree of harm, degree of attention, degree of concentration, degree of subjectivity, degree of out of control, degree of strangeness, degree of agitation, and degree of trust. The specific calculation methods of the eight dimensions are as follows:

(1) Harm

The direct harm caused by the panic incident includes the number of people affected, the number of casualties, the size of the affected area, the length of time affected (whether it will harm future generations), and the direct economic loss and social consequences.

(1.1) Number of people affected a ₁ :

a ₁ = argmax (TFa ₁ );

Grab the keyword "a ₁ person is affected" in N, and there will be ≥0 values of a ₁ in N, and use TFA _{1 to} indicate that each a ₁ value corresponds to the frequency of occurrence, then a ₁ = ar gmax ( TFa ₁ ) indicates that a ₁ takes a value with the highest frequency of occurrence.

(1.2) Number of casualties a ₂ , a ₃ :

Number of severe injuries a ₂ = ar gmax (TFa ₂ );

Number of deaths a ₃ = ar gmax (TFa ₃ );

In the same way, the keyword "a ₂ people are seriously injured" is captured in N, Tfa ₂ represents the frequency of occurrence of each a ₂ , a ₂ = ar gmax (TFa ₂ ) represents the value with the highest occurrence frequency of a ₂ . In all the news reports collected, the keyword “a ₃ deaths” is captured, TDa ₃ represents the frequency of occurrence of each a, and a ₃ = ar gmax (TFa ₃ ) represents the value with the highest frequency of occurrence of a ₃ .

(1.3) Hazardous area size a ₄ :

When the monitoring area is limited to a certain country,

Hazardous area size:

Among them, Z = "town / county / township / district", S = "city", Sh = "province", and G = "country";

In the set N, through the named entity recognition technology, after identifying the data of the area information in the set N, the frequency of the occurrence of the "town / county / township / district", "city", "province", and "country" is counted. . When "town / county / township / district" has the most frequency, a ₄ = 1, when "city" appears the most, a ₄ = 2, when "province" appears the most, a ₄ = 3, and "country" appears the most, a ₄ = 4.

When the monitoring area is global:

Count the areas where IP appears in {W}, taking the country as a unit. If {W} appears in one country, it will be counted as f (1), if it appears in two countries, it will be counted as f (2), and so on. The number of countries represented by x, now:

(1.4) Harm area size:

Hazard duration a ₅ :

a ₅ = ar gmax (TFa ₅ );

Grab the keyword "expected a ₅ can be recovered" in {N}, and take the value corresponding to the highest frequency a ₅ .

Direct economic loss a ₆ :

a ₆ = ar gmax (TFa ₆ );

Grab the keyword "loss a ₆ yuan" in {N}, and take the value corresponding to the highest frequency a ₆ .

Direct social consequences a ₇ :

Among them, the protest vocabulary lexicon K is a collection of lexicons explaining the incident at the stage of public opinion.

The resistance vocabulary lexicon F is a collection of lexicons that describe the rise of an event into an action phase.

(2) Attention

Attention is used to calculate the degree of news media attention brought by the panic incident.

(2.1) The number of relevant news reports b ₁ is the number of the set keywords in the news media; it is known that N {n ₁ , n ₂ , n ₃ ...... n _n }, b ₁ is right N counts:

b ₁ = n.

(2.2) The number of related social discussions b ₂ is the number of social keywords in which the set keywords appear; it is known that S {s ₁ , s ₂ , s ₃ ...... s _m }, b ₂ is right S counts:

b ₂ = m.

(2.3) Report types of related news:

Average word count per article:

Where, n = number of news reports Total (b ₁₎ articles; Z _i = the i-th word stories.

(2.4) News report duration b ₄ :

T _Nn = time of the current news report, T _N1 = time of the first relevant news report.

(2.5) Social discussion duration b ₅ :

T _Sm = time of the current relevant social media; T _S1 = time of the first relevant social media.

(3) Concentration

The suddenness of panic outbreaks is reflected in the sharp increase or slow increase in the number of reports. This indicator uses the concept of half-life statistics, that is, the time taken from the beginning of the report to half of the total of all reports so far.

(3.1) News concentration c ₁ :

The amount of time it took from the beginning of news coverage to half of all current news.

(3.2) Social concentration c ₂ :

The time it took from the start of social discussions to half of all current social discussions.

(4) Subjectivity

The subjective attitude of individuals in the social media S {s ₁ , s ₂ , s ₃ ...... s _m } to panic events.

(4.1) The sentiment d ₁ of the relevant social comment is the mean sentiment of the social media content where the keywords appear:

E∈ (1,5);

Let the emotion of each piece in the set S be E = {E1, E2, E3 ... Em}, i ∈ [1, m], E ∈ (1, 5), and each of 1-5 The sentiment value specifies the vocabulary, where r is the frequency (number of articles) corresponding to each E value, and m is the total number of social media articles.

(4.2) The proportion of negative words in social media d ₂ :

Where Z _neg = total number of negative words, Z = total number of reported words.

(4.3) Proportion of the number of speeches from important urban users in the social discussion to the total number of social media speeches d ₃ :

Tag = {New York, Washington, Silicon Valley, London, Paris, Tokyo, Beijing, Shanghai, Shenzhen ...}.

(5) Out of control

Individuals in social media S {s ₁ , s ₂ , s ₃ ...... s _m } judge whether they can control or influence the development trend of panic events.

(5.1) Uncertainty and helplessness in social media related speech e:

dic3 = UC = {helpless, can't change, what can be done, ...};

P _{UC> 2} indicates the number of articles in which “any vocabulary in the collection UC appears twice or more”, and m indicates the total number of social media articles.

(6) Unfamiliarity

The degree of personal understanding of the event in social media S {s ₁ , s ₂ , s ₃ ...... s _m }, that is, whether the event occurred newly or has occurred, can it be explained by known scientific principles .

(6.1) The proportion of words with unknown meanings in the table f ₁ :

UK = {"Unprecedented", "First appearance", ...};

P _{UK> 2} indicates the number of "any vocabulary in the set UK appears twice or more", and m indicates the total number of social media articles.

(6.2) Table does not understand the meaning vocabulary ratio f ₂ :

DU = {"how can I" "don't understand" "don't understand" "unscientific", ...};

P _{DU> 2} indicates the number of “an arbitrary word in the set DU appears twice or more”, and m indicates the total number of social media articles.

(7) Agitation

The degree of anxiety and anxiety reaction of netizens to the panic incident in social media S {s1, s2, s3 ... sm}.

(7.1) Worry vocabulary proportion g ₁

AN = "Worried", "annoying", "anxious", ...}

P _{AN ＞ 2} indicates the number of "any vocabulary in the set AN appeared twice or more", and m indicates the total number of social media articles;

(7.2) Blame vocabulary g ₂

BL = {"All blame", "responsible", "take responsibility", "throw the pot", ...}

P _{BL> 2} indicates the number of articles in which "any vocabulary in the collection BL has appeared more than twice", and m indicates the total number of articles on social media

(7.3) Proportion of vocabulary g ₃

PR = {"Protest", "Opposition", "Rejection", ...}

P _{PR> 7} indicates the number of articles in which "any vocabulary in the collection PR appears twice or more", and m indicates the total number of articles in social media.

(8) Trust

Represents the degree of public trust in government and experts in social media S {s1, s2, s3 ... sm}.

(8.1) Number of official speeches by government and social organizations h ₁

A = {A ₁ , A ₂ , ..., A _u }

G = {"Government", "Association", "Organization", ...}

Among them, A = all account names in the set;

G = thesaurus representing official identity;

(8.2) Proportion of meaning vocabulary in official comments of government and social organizations h ₂

B = {B ₁ , B ₂ , ..., B _v }

Y = {"disagree" "don't believe" "trouble to explain" "please explain", ...}

B = all social content whose account name contains official identity words;

Y = a database of all "opposing" meaning words;

(8.3) Number of expert speeches h ₃

A = {A ₁ , A ₂ , ..., A _u }

X = {"Expert", "Teacher", "Scholar", ...}

Among them, A = a database of all social account names;

X = a database of all scholarly vocabulary;

(8.4) Proportion of vocabulary against meaning in comments made by experts h ₄

C = {C ₁ , C ₂ , ..., C _q }

Y = {"Opposition" "I don't believe" "trouble to explain" "please explain", ...}

C = all social content whose account name contains scholarly vocabulary;

Y = a database of all "opposing" meaning words;

In step S103, a calculation model of the panic index is established by using a neural network model, inputting an existing corpus, and relying on machine learning to repeatedly and repeatedly train the panic index model, specifically including:

(1) Machine learning model based on neural network:

(1.1) Before using the neural network model to build a calculation model of the panic index, the data is normalized; the normalized processing formula is calculated to obtain:

Where x is the value of the current sample, x _mean represents the average value of the current feature, x _max represents the maximum value in the current sample, and x _min represents the minimum value of the current sample;

(1.2) According to the normalized features, a multi-layer fully connected neural network structure is used to train the model. Its model structure is shown in Figure 2 below. Layer L ₁ is an input layer, and in the present invention represents a value corresponding to each feature. Layer L ₂ is a hidden layer that calculates hidden features. Layer L ₃ is the output layer and outputs the final result.

(1.3) The training phase of the multi-layer fully connected neural network structure training model uses a forward propagation algorithm and a back propagation algorithm:

First, the calculation formula of the forward propagation algorithm:

Z ^(l) = W ^(l-1) x ^(l-1) + b ^(l-1) ;

a ^(l) = f (Z ^(l) );

h _{W, b} (x) = a ^(L-1) ;

among them:

l is the first layer, L is the last layer, x ⁽¹⁾ is the feature of the input, W, b are the weights and offsets, h _{W, b} (x) is the output.

Second, the calculation formula of the back propagation algorithm:

The multi-layer fully connected neural network structure training model uses the back-propagation algorithm to optimize the objective function according to the objective function to obtain the optimal model:

Then update the parameters according to the back-propagation algorithm to get the optimal model:

Update parameters:

among them

Is the corresponding weight of the i-th node and the l-th layer,

J is the partial derivative of the weight W in the J function, m is the number of samples, b is the bias value, and λ is the regular term parameter, taking 0.1, for all news and social media text at the current moment, panic prediction uses feature extraction The corresponding features are extracted by the method; the extracted features are input into the input layer of a multi-layer fully connected neural network structure training model, and the result is obtained through the forward propagation algorithm and used as the input of the next layer model; Panic value.

(2) Manually mark a batch of corpora that can be used for machine learning.

(2.1) Machine learning based on neural network models is trained through a corpus. The larger the corpus, the higher the accuracy of the trained model.

The corpus contains the following fields (23 in total), each field sets a different event theme, and searches for text.

For example, for the topic of "Politics", retrieve X press releases of "So-and-so incident, June-August 2017, media of a certain country". The X manuscripts were counted according to step ②, and the values of multiple indicators about “some event, June to August 2017, media of a certain country” were obtained. The expert group gave a unified standard manual scoring of the panic emotion of each group of events, and gave the scoring basis at the same time.

According to the similar operations above, the labeled corpus is used for training of sentiment analysis models based on machine learning. The distribution of categories in the labeled corpus should be as uniform as possible, suitable for training of classifiers, and most articles in the labeled corpus must have polarity. Since most corpus resources come from the Internet, there are often some irregularities in its encoding, format, and content, such as: abuse of punctuation marks, multiple spaces, typos, and so on. Therefore, it is necessary to correct these irregular formats before labeling, and uniformly adopt UTF-8 encoding.

(2.2) Model training: Put no less than 10,000 corpora into a neural network-based machine learning model for machine learning and training, so that the calculation result of each group a1 ～ h4 data is equal to the corresponding result of each group GPRI, and then achieve the "machine pair After the "multiple index calculation results" pass the model, the obtained results can approach the purpose of "expert group score results" indefinitely.

(2.3) Panic prediction: Based on the trained model, for all news and social media texts at the current moment, the panic index calculation model uses feature extraction to extract corresponding features; the extracted features are input into a multi-layer fully connected neural network structure training model In the input layer, the result is obtained through the forward propagation algorithm and used as the input of the next layer model; the calculation of the three layer model is used to obtain the final panic value.

In order to further prove the feasibility and scientificity of the monitoring method provided by the present invention, the theoretical basis and design principle of the present invention will be further described below with reference to the accompanying drawings.

As shown in FIG. 2, according to the normalized features provided by the implementation of the present invention, a multi-layer fully connected neural network structure is used to train a model diagram.

As shown in FIG. 3, the present invention provides a forward propagation algorithm graph for outputting final results in a training model using a multi-layer fully connected neural network structure.

(1) Theory and algorithm principle

In short, GRPI's algorithm definition indicators are based on the principles of journalism, psychosocial risks, and Internet emotion fluctuations. Before selecting a series of factors by selecting dimensions and measures that have a strong influence on Internet emotion fluctuations, Measure and analyze the weights, and then use large-scale machine learning to formulate the weights of the indicators that are consistent with the goal during the conversion of the indicators into mathematical models. Finally, under the conditions of comprehensive calculation of data mining and data analysis based on global news big data Calculate the GRPI index, that is, the existing neural network-based machine algorithm is applied. The machine learning method is set as shown in Figure 3. Layer L1 corresponds to multiple indicators, Layer L2 is an hidden layer, and Layer 3 Hw, b (X) corresponds to the indicator. The output result. During machine training, set a training scale of 10,000 groups, manually input values to 10,000 groups of Layer L1, and match 10,000 corresponding Layer 3 output values, and then use the machine to learn in the hidden layer of the Layer 2 network, match the coefficient weights and finally output result.

(2) Factors and indicators

The main factors considered by the GRPI index are: global news volume and total news volume, global social volume and total social volume, global news volume growth and social volume growth rate, strong or general positive and negative news and social growth rate, Trigger factors and keywords that measure the direct physical impact of an event (such as the degree of death and injury, the duration of the impact, economic loss, the geographical scope of radiation, etc.), the degree of attention to the event or subject in news media reports, the event or The subject ’s concentration of topics, the cycle of decline, and a series of individual and group attitudes expressed by media and social network behaviors, such as the subject ’s ability to control developments, the degree of agitation and anxiety, and the degree of trust in those in power. A general description of multiple indicators.

As shown in Figure 4, the running process specifically includes: When analyzing the panic index of a target event or subject, the background will first retrieve N articles related to the event or subject, and then select the neighboring dimension based on the attributes of the event and Operation: In predicting the future panic index trend, the background is based on statistical machine learning methods, such as regression or classification, to calculate the historical data of various factors, and then use the machine learning model to calculate the data to realize the calculation of the panic trend indicator.

S201: Enter the subject / event subject / target area / target time, etc., and set it as Corpus 1;

S202: The X news articles and social content about Corpus 1 are retrieved in the background. (News media information is N {n ₁ , n ₂ , n ₃ ...... n _n }, and social media information is S {s ₁ , s ₂ , s ₃ ...... s _m }, then One can get W {N, S});

S203: The system performs data statistics on multiple index contents of corpus 1 to obtain data group C1;

S204: Manually score the panic index of Corpus 1, and score E2;

S205: Perform the second process S201, select corpus 2, and repeat the processes S201-S204 to obtain C2 and E2;

S206: The process repeats the process S205 to obtain C3, E3; C4, E4 ..., Cn, En. (The larger n is, the better);

S207: C1 ~ Cn ... E1 ~ En are trained in a neural network-based machine learning model. Make the calculation result of each group C equal to E, and then achieve the purpose that the results obtained by each new group C approach the E infinitely;

S208: The model is ready for use after learning. Enter the subject / event subject / target area / target time, for example, enter “unmanned”, set the panic index monitoring period to January 2018, and the region range to any country;

S209: Retrieve W news articles and social content about "driverless" in the background;

S210: The system extracts and counts data on the topic of "unmanned driving" in this area and time range to obtain a data group;

S211: The statistical result is input into a machine, and the machine calculates a panic index on the subject of "unmanned driving".

S212: The machine outputs a panic value about "unmanned".

As shown in FIG. 5, a panic index monitoring and analysis system based on news big data according to an embodiment of the present invention includes:

Database formation module 1 collects all media information related to big data to form a database;

The panic index index value acquisition module 2 performs real-time statistics on the data in the database according to multiple dimensions and multiple indexes divided by the panic index to obtain specific values of each index;

The panic index acquisition module 3 uses a neural network model and a machine learning algorithm to match the weights of each dimension to form a complete model. When calculating the panic index, the required data is retrieved from the database and statistical indicators are calculated. Output the panic index.

The following specific application examples further prove the feasibility and reliability of the monitoring method provided by the present invention, which has strong theoretical and practical value.

The invention relies on big data resources of news media and social platforms in more than 200 countries and more than 60 languages worldwide, combines panic and social risk cognitive models, applies theoretical models to the algorithm level, and can be used for various types of customizable topics The event is to calculate the panic index and monitor the development status.

The present invention is based on a mature theory of social panic and risk cognition. The calculation index of the panic index refers to the research results of top scholars at home and abroad on social panic and network monitoring and measurement. In terms of algorithm update, on the one hand, top experts and scholars in academic circles at home and abroad will regularly discuss and test the measurement of indicators and propose amendments; on the other hand, through background technical means, the present invention combines a large number of authoritative data related to panic to automatically compare Match machine learning to verify and update algorithm weights.

The invention adopts a customized visual application method, which can perform macro-monitoring, and realize real-time semantic retrieval and calculation in 5 major risk areas (environment, economy, society, regional politics and technology) and 30 panic topics. For example, in the economic field, it mainly monitors "energy price shocks", "unemployed", "asset bubble", "deflation", "financial crisis", etc .; in the technical field, it mainly monitors "network attacks", "data fraud or theft" Themed risk labels.

The invention can also perform micro-customization, and the panic monitoring platform can provide customized search according to the needs of different users. Panic monitoring platforms can provide decision-making references in national security, corporate security, corporate crisis emergency management, and industry development crisis scenarios.

For example, at the national security level, a change in the panic index caused by an event is a case. The security issues caused by Bitcoin once caused panic among investors. In essence, Bitcoin's hurricane and slump are the result of a concentrated release of worry, fear, and greed.

At the industry level, panic monitoring platforms can show expected changes in the market, such as the real estate industry prices and panic indexes in Hong Kong and Xi'an, respectively.

Example 1: Monitoring of panic index of enterprise development

The development of enterprises has a great relationship with the panic index. The picture shows LeEco's panic index and the number of users. In August 2016, a certain company's capital chain crisis broke out. In January 2017, "Sunac China" invested 15 billion yuan in shares. In July 2017, a certain company completely withdrew. A certain company's corporate value crisis broke out-arrears of employees' salaries and financial reports. Substantial losses, etc., have led to multiple funds lowering the valuation of a certain company, and the panic index of a certain company has fluctuated significantly.

For example, on July 5th, a certain company pledged its equity to a certain company, and the change of manager of the company led to a large shock of the panic index. At this time, the number of users of a certain company continued to decrease, which reflected to some extent the users' dishonesty to a certain company. From the monthly coverage of a certain company, it can be seen that the overall number of users decreased in 2017. In July, the number of users decreased by 32% from the previous month, and the number of users was severely lost. At this time, the panic index showed extremely large turbulence.

Example 2: Monitoring of social emergency panic index:

Since GRPI's algorithm incorporates most of the tags of news big data, it can be split and used at the same time. For example, the entity naming and semantic recognition technology behind GRPI can help people quickly obtain or compare the scope of the earthquake and the extent of death and injury (grades 1-10) in all historical data. The degree of protest (1-10) was discussed, and the relationship between the emotional fluctuations of the people's network and social phenomena such as immigration and economic phenomena such as the rise and fall of Bitcoin were analyzed.

Example 3: Stock market panic index monitoring:

The GRPI index, as an index for monitoring and predicting the public's emotional volatility, has a certain guiding role in the stock market. The data results show that there is a certain negative correlation between the two. That is, when the GRPI index of a listed company is at a high level, its stock price is often in a downward trend, and when the stock index is strengthened, the GRPI index is mostly at a low level. The extreme state of the GRPI index is particularly worthy of attention. When the GRPI is at a high extreme value, it usually indicates that major events are brewing or happening. For example, the above picture shows the global monitoring of the GRPI. Appeared in 2008 when the global financial crisis broke out.

In the above embodiments, the method and system provided by the present invention may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in whole or in part in the form of a computer program product, the computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions according to the embodiments of the present invention are wholly or partially generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, a computer, a server, or a data center. Transmission by wire (such as coaxial cable, fiber optic, digital subscriber line (DSL) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration. The available medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (Solid State Disk (SSD)), and the like.

The above description is only the preferred embodiments of the present invention and is not intended to limit the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention shall be included in the protection of the present invention. Within range.

Claims

A panic index monitoring analysis method based on big data, which uses the theory of social risk amplification and the theory of panic psychological transmission, which is characterized by:

Step 1: forming the collected media data into a database;

Step 2: Divide the data in the database into eight dimensions and 27 indicators according to the panic index and perform real-time statistics to obtain the specific values of each indicator;

Step 3: Use the neural network model to build a panic index calculation model, input corpus, and use machine learning to match the weights of each dimension and comprehensively calculate the index to obtain the panic index.
The method according to claim 1, wherein step 1 specifically comprises:

a) Collection: Collect news media and social media; use Python programming language to collect data through the open data interface of social media for full collection, and store it directly to the data center by means of message queues; news media collects domestic and overseas news through Web page, through the scheduler, collector, task manager, text parsing, storage, data management, breadth traversal of a given news data source, find out the list page for further collection, and the scheduler sends the list page through the task manager For each collector, the collector obtains the HTML page of the article by crawling the list page;

b) Processing: Structured data is processed through the data management algorithm of the data center to obtain structured data;

c) Storage: Store the structured data in Nosql database, and import the data in the database to the data management module through the message queue; the data management module tags the data through the management algorithm and applies the emotional algorithm to each article Perform sentiment calculation to get sentiment labels;

d) Use big data to collect all media information related to form a database; collect all news and social media data including monitoring keywords, data collection tags include data content, release media or social network release account, release time, regional attributes , And put it into the database; let the database set be W, arranged in chronological order, where the news media information is N {n 1 , n 2 , n 3 ...... n n }, and the social media information is S {s 1 , s 2 , s 3 ...... s m } to get a data set of W {N, S}.
The method according to claim 1, wherein step three specifically comprises:

A. Neural network-based machine learning model construction: Before using the neural network model to build a panic index calculation model, normalize the data; according to the normalized features, use a multi-layer fully connected neural network structure to train the model; Among them, Layer L 1 is the input layer, which represents the value corresponding to each index feature; Layer L 2 is the hidden layer, and the hidden features are calculated; Layer L 3 is the output layer, and the final result is output;

B. Manually mark a batch of corpora that can be used for machine learning: Put more than or equal to 10,000 corpora into a machine learning model based on neural network for machine learning and training;

Based on the trained model, for all news and social media texts at the current moment, the panic index calculation model uses feature extraction to extract corresponding features; the extracted features are input to the input layer of the multi-layer fully connected neural network structure training model, The result is obtained by the forward propagation algorithm and used as the input of the next layer model. After the calculation of the three layer model, the final panic value is obtained.
The method according to claim 1 or 2 or 3, wherein the eight dimensions are: degree of harm, degree of attention, degree of concentration, degree of subjectivity, degree of out of control, degree of strangeness, degree of agitation, degree of trust;

Among them: the indicators of harm include: the number of affected people a 1 , the number of severe injuries a 2 , the number of deaths a 3 , the size of the hazardous area a 4 , the duration of the harm a 5 , the direct economic loss a 6 and the direct social consequences a 7

The indicators of attention include: the number of relevant news reports b 1 , the number of relevant social discussions b 2 , the average number of reported words b 3 , the length of news reports b 4 , and the length of social discussions b 5 ;

The indicators of concentration include: news concentration c 1 and social concentration c 2 ;

Including the degree of subjective indicators: the relevant social commentary emotional d 1, negative word social media share d 2, as well as social discussion papers speak several important cities in the proportion of the total number of users of social media statement published value d 3;

Out-of-control indicators include: the proportion of uncertain, helpless words e in social media-related speeches;

Unfamiliarity includes indicators: the proportion of words with unknown meanings f 1 , and the proportion of words with incomprehensible meanings f 2

The indicators of irritability include: the proportion of anxiety words g 1 , the proportion of blame words g 2 , and the proportion of protest words g 3 ;

The index of trust includes: the number of official speeches of government and social organizations h 1 , the proportion of words opposed to meanings in official comments of government and social organizations h 2 , the number of experts' speeches h 3 , and the meaning of words in comments made by experts. Proportion h 4 .
The method according to claim 4, wherein the calculation method of the indicators included in the hazard degree is:

1) Number of people involved a 1 = arg max (TFa 1 )

TFa 1 represents the frequency of occurrence of each a 1 value, a 1 = arg max (TFa 1 ) represents the value with the highest frequency of occurrence of a 1 value;

2) Number of serious injuries a 2 = arg max (TFa 2 )

TFa 2 represents the frequency of occurrence of each a 2 , a 2 = arg max (TFa 2 ) represents the value with the highest frequency of occurrence of a 2 value;

3) Number of deaths a 3 = arg max (TFa 3 )

TFa 3 represents the frequency of occurrence of each a 3 , a 3 = arg max (TFa 3 ) represents the value with the highest frequency of occurrence of a 3 value;

4) Harm area size a 4 :

When the monitoring area is any country

Hazardous area size:

Among them, Z = "town / county / township / district", S = "city", Sh = "province", and G = "country";

When the monitoring area is global,

Count the IP appearance area in W, taking the country as the unit; count the IP address appearing in 1 country, count as f (1), appear in 2 countries, count as f (2); x indicates the number of countries appearing :

Hazardous area size:

5) Harm duration a 5 = arg max (TFa 5 )

Grab the keyword "expected recovery of a 5 " in N, and take the corresponding value of a 5 with the highest frequency;

6) Direct economic loss a 6 = arg max (TFa 6 )

Grab the keyword "loss a 6 yuan" in N, and take the value corresponding to the highest frequency a 6 ;

7) Direct social consequences a 7 :

Among them, the protest vocabulary lexicon K is a collection of lexicons explaining the incident at the stage of public opinion:

The resistance vocabulary vocabulary F is a collection of vocabularies describing the rise of events into action stages:
The method according to claim 4, wherein a calculation method of the index included in the attention degree is:

The total number of relevant news reports b 1 is the number of articles with keywords in the news media;

Given N {n 1 , n 2 ...... n n }, b 1 counts N, and has:

b 1 = n;

The number of relevant social discussion articles b 2 is the number of articles with keywords in social media;

Given that S {s 1 , s 2 ...... s m }, b 2 counts S, then:

b 2 = m;

Type b 3 of related news: Average word count per report:
Among them, n = the total number of related news reports b 1 ; Z i = the number of words in the i-th report;

News report duration b 4 :

= Time of current news report,
= Time of the first relevant news report;

Duration of social discussion b 5 :

= Time of current relevant social media;
= Time of the first relevant social media.
The method according to claim 4, wherein the calculation method of the index included in the concentration degree is:

News concentration c 1 :

Indicates when there was news coverage from the beginning,
Indicate the time taken to half of all current news coverage;

Social concentration c 2 :

Indicates that there has been a social discussion since the beginning,
Represents the time taken to half of all current social discussions.
The method according to claim 4, wherein the calculation method of the index included in the subjective degree is:

Subjectivity indicates the individual's subjective attitude towards the panic event in social media S {s 1 , s 2 ...... s m };

The sentiment d 1 of the relevant social comment is the mean sentiment of the social media content where the keywords appear:

E∈ (1,5);

Let the emotion of each article in the set S be E = {E 1 , E 2 ...... E m }, and E∈ (1,5) is a vocabulary for each emotion value of 1-5, where r Is the number of articles corresponding to each E value, and m is the total number of articles on social media;

Social media negative words d 2 :

among them

Z neg = total number of negative words, Z = total number of reported words;

The value of the proportion of the speeches of important urban users in the social discussions to the total social media speeches d 3 :

Tag = {New York, Washington, Silicon Valley, London, Paris, Tokyo, Shanghai, Beijing, Shenzhen}.
The method according to claim 4, wherein the calculation method of the index included in the out-of-control degree is:

The degree of out of control means the individual's judgment on social media S {s 1 , s 2 ...... s m } as to whether he can control or influence the development trend of the panic event;

Vocabulary of uncertainty or helplessness in social media related speeche:

dic3 = UC = {helpless, can't change, what can be done};

P UC > 2 indicates the number of articles in which "any vocabulary in the collection UC appears twice or more", and m indicates the total number of social media articles.
The method according to claim 4, wherein a calculation method of the index included in the strangeness is:

Unfamiliarity indicates the degree of personal understanding of the event in social media S {s 1 , s 2 ...... s m };

Table unknown meaning vocabulary proportion f 1 :

UK = {unprecedented, first appearance};

P UK ＞ 2 indicates the number of “any vocabulary in the collection UK appears more than twice”, and m indicates the total number of social media articles;

Table does not understand the meaning of vocabulary proportion f 2 :

DU = {how, do n’t understand, do n’t understand, unscientific};

P DU > 2 indicates the number of articles in which "any vocabulary in the set DU appears twice or more", and m indicates the total number of articles in social media.
The method according to claim 4, wherein the calculation method of the index included in the irritability is:

Aggression indicates the degree of anxiety and anxiety response of the netizens to the panic incident in social media S {s 1 , s 2 ...... s m };

Express anxiety vocabulary proportion g 1 ;

AN = {Worry, Annoyance, Anxiety};

P AN ＞ 2 indicates the number of "an arbitrary word in the set AN appeared twice or more", and m indicates the total number of social media articles;

Blame word proportion g 2 ;

BL = {all blame, responsible, bear responsibility, dump the pot};

P BL > 2 indicates the number of articles in which "any vocabulary in the collection BL has appeared more than twice", and m indicates the total number of articles in social media

Protest vocabulary accounted g 3;

PR = {protest, objection, rejection};

P PR > 2 indicates the number of articles in which "any vocabulary in the set PR appears twice or more", and m indicates the total number of articles in social media.
The method according to claim 4, wherein the calculation method of the index included in the trust degree is:

The degree of trust indicates the degree of public trust in government and experts in social media S {s 1 , s 2 ...... s m };

Number of official speeches from government and social organizations h 1 ;

A = {A 1 , A 2 ......, A u };

G = {Government, Association, Organization};

Among them, A = all account names in the set;

G is thesaurus of official identity;

The official speeches of the government and social organizations commented against the proportion of meaning words h 2 ;

B = {B 1 , B 2 ......, B v };

Y = {disagree, do not believe, please explain, please explain};

Among them, B = all social content whose account name contains official identity words

Y = database of all "opposing" meaning words

Number of experts' speeches h 3 ;

A = {A 1 , A 2 ......, A u };

X = {expert, teacher, scholar};

Among them, A = a database of all social account names;

X = a database of all scholarly vocabulary;

Proportion of vocabulary against meaning in comments made by experts h 4

C = {C 1 , C 2 ......, C q };

Y = {disagree, do not believe, please explain, please explain};

Among them, C = all social content whose account name contains scholarly vocabulary;

Y = database of all "opposing" meaning words.
The method according to any one of claims 3 to 12, characterized in that before the calculation model of the panic index is constructed using a neural network model, the data is normalized:

Where x is the value of the current sample, x mean represents the average value of the current feature, x max represents the maximum value in the current sample, and x min represents the minimum value of the current sample;

According to the normalized features, a multi-layer fully connected neural network structure is used to train the model, where Layer L 1 is the input layer and represents the value corresponding to each feature; Layer L 2 is the hidden layer and the hidden features are calculated; Layer L 3 Is the output layer, which outputs the final result;

The training phase of the multi-layer fully connected neural network structure training model includes: forward propagation algorithm and back propagation algorithm:

Formula of forward propagation algorithm:

z (l) = W (l-1) x (l-1) + b (l-1) ;

a (l) = f (z (l) );

h W, b (x) = a (L-1) ;

among them:

l is the first layer, L is the last layer, x (l) is the input feature, W, b are the weight and offset, and h W, b (x) is the output;

Backpropagation calculation formula:

The multi-layer fully connected neural network structure training model uses the back-propagation algorithm to optimize the objective function according to the objective function to obtain the optimal model:

Then update the parameters according to the back-propagation algorithm to get the optimal model:

Update parameters:

Where W i (l) is the corresponding weight of the i-th node at the l-th layer,
J is the partial derivative of the weight W in the J function, m is the number of samples, b is the offset value, λ is the regular term parameter, and λ = 0.1 is taken;

Panic prediction uses feature extraction to perform corresponding feature extraction; the extracted features are input to the input layer of a multi-layer fully connected neural network structure training model, the results are obtained through a forward propagation algorithm, and used as the input of the next layer model; after a three-layer model Calculation to get the final panic value.
A big data-based panic index monitoring and analysis system capable of implementing the monitoring and analysis method according to any one of claims 1-3, wherein the system includes:

Database formation module, which uses big data to collect all relevant media information to form a database;

The panic index indicator value acquisition module divides the data in the database into eight dimensions and 27 indicators according to the panic index for real-time statistics to obtain the specific values of each index;

The panic index acquisition module uses neural network models and machine learning algorithms to match the weights of each dimension to form a complete model. When calculating the panic index, the required data is retrieved from the database and statistical indicators are calculated. The statistical results of each dimension are put into the model and finally output Panic Index.
A safety monitoring system capable of implementing the method according to any one of claims 1-13.
A security analysis system capable of implementing the method according to any one of claims 1-13.
A security early warning system capable of implementing the method according to any one of claims 1-13.