[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114265962A - Method and system for analyzing target event based on social topic - Google Patents

Method and system for analyzing target event based on social topic Download PDF

Info

Publication number
CN114265962A
CN114265962A CN202111419966.5A CN202111419966A CN114265962A CN 114265962 A CN114265962 A CN 114265962A CN 202111419966 A CN202111419966 A CN 202111419966A CN 114265962 A CN114265962 A CN 114265962A
Authority
CN
China
Prior art keywords
text data
target text
target
keywords
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111419966.5A
Other languages
Chinese (zh)
Inventor
赵菁淳
董亮亮
周珅珅
梁宵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aisino Corp
Original Assignee
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aisino Corp filed Critical Aisino Corp
Priority to CN202111419966.5A priority Critical patent/CN114265962A/en
Publication of CN114265962A publication Critical patent/CN114265962A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for analyzing a target event based on social topics, wherein the method comprises the following steps: determining a target event in a social topic, and extracting an initial keyword of the target event; identifying target text data associated with the initial keywords according to a predetermined time period; processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods; and visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods.

Description

Method and system for analyzing target event based on social topic
Technical Field
The invention relates to the technical field of information technology application, in particular to a method and a system for analyzing a target event based on social topics.
Background
With the rapid development of internet technology, people tend to acquire recent hot events through a certain social media. However, there are several problems: (1) the topics are updated quickly every day, and the overall development trend of fermentation of a certain hot event along with time cannot be obtained quickly and intuitively. (2) The traditional topic discovery only detects hot topics aiming at a certain collected data set, and the full automation from data collection to result analysis cannot be achieved. (3) The existing topic discovery technology is suitable for mining hot events in a plurality of different events, has coarse granularity and is not easy to analyze and decide.
The prior art (application publication No. CN113064990A) discloses an automated system from data collection to hotspot event mining analysis, comprising: preprocessing a text, and dividing the text content into a plurality of phrases; performing text vectorization processing on the text divided by the phrase to form a vectorized event set; aggregating the event sets subjected to vector quantization by adopting an unsupervised clustering algorithm to form an event cluster of the hot spot; vectorizing each event cluster by adopting a deep learning algorithm and aggregating by using an unsupervised clustering algorithm again; and generating topic cluster description by using a new word discovery algorithm. However, the prior art cannot analyze the development change of the hot topic, and the accuracy of extracting the keywords cannot be controlled.
Therefore, how to achieve fine-grained automation of the whole process when analyzing the hot topics and simultaneously perform hot discovery and evolution of social topics in a time dimension become problems which need to be solved urgently.
Disclosure of Invention
The technical scheme of the invention provides a method and a system for analyzing a target event based on a social topic, which aim to solve the problem of how to analyze the target event of the social topic.
In order to solve the above problem, the present invention provides a method for analyzing a target event based on social topics, the method comprising:
determining a target event in a social topic, and extracting an initial keyword of the target event;
identifying target text data associated with the initial keywords according to a predetermined time period;
processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods;
and visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods.
Preferably, the target text data associated with the initial keyword is identified by a Python crawling algorithm according to a preset time period, and the target text data is stored in a database.
Preferably, the processing the target text data based on the type of the target text data includes:
determining a data format of the target text data;
and analyzing the content of the sample text according to the data format of each target text data, thereby obtaining the body content of each target text data.
Preferably, the processing the target text data based on the type of the target text data includes:
and cleaning the target text data, and filtering invalid text data in the target text data.
Preferably, the processing the target text data based on the type of the target text data includes:
and converting the traditional text data of the target text data into a Chinese simplified text.
Preferably, the performing word segmentation processing on the processed target text data includes:
and optimizing a bidirectional LSTM word segmentation algorithm based on an attention mechanism to perform word segmentation processing on the target text data.
Preferably, the extracting keywords in the target text data in different time periods includes:
and extracting the keywords in the target text data in different time periods based on the stop word dictionary with the self-defined threshold value.
Preferably, the target event keywords are visually displayed based on the time dimensions of different time periods, wherein the display mode includes: pie chart, thermodynamic diagram, bar chart.
Based on another aspect of the present invention, the present invention provides a system for analyzing a target event based on social topics, the system comprising:
the system comprises an initial unit, a search unit and a search unit, wherein the initial unit is used for determining a target event in a social topic and extracting an initial keyword of the target event;
an identifying unit configured to identify target text data associated with the initial keyword at a predetermined time period;
the processing unit is used for processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods;
and the result unit is used for visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods.
Preferably, the initial unit is further configured to identify target text data associated with the initial keyword according to a predetermined time period through a Python crawling algorithm, and store the target text data in a database.
Preferably, the processing unit is configured to process the target text data based on a type of the target text data, and is further configured to:
determining a data format of the target text data;
and analyzing the content of the sample text according to the data format of each target text data, thereby obtaining the body content of each target text data.
Preferably, the processing unit is configured to process the target text data based on a type of the target text data, and is further configured to:
and cleaning the target text data, and filtering invalid text data in the target text data.
Preferably, the processing unit is configured to process the target text data based on a type of the target text data, and is further configured to:
and converting the traditional text data of the target text data into a Chinese simplified text.
Preferably, the processing unit is configured to perform word segmentation processing on the processed target text data, and is further configured to:
and optimizing a bidirectional LSTM word segmentation algorithm based on an attention mechanism to perform word segmentation processing on the target text data.
Preferably, the processing unit is configured to extract keywords in the target text data in different time periods, and is further configured to:
and extracting the keywords in the target text data in different time periods based on the stop word dictionary with the self-defined threshold value.
Preferably, the result unit is configured to visually display the keyword of the target event based on time dimensions of different time periods, where a display manner includes: pie chart, thermodynamic diagram, bar chart.
The technical scheme of the invention provides a method and a system for analyzing a target event based on social topics, wherein the method comprises the following steps: determining a target event in the social topic, and extracting an initial keyword of the target event; identifying target text data associated with the initial keyword at a predetermined time period; processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods; and visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods. The technical scheme of the invention aims to realize the automatic analysis of the hot events of the social topics, the stop word dictionary with the self-defined threshold value is designed by the hot event discovery method, the filtering granularity can be flexibly controlled, and meanwhile, the bidirectional LSTM word segmentation technology optimized by the attention mechanism is adopted, so that the data can be accurately segmented and recognized. Compared with a traditional hot spot event detection system, the full-automatic analysis system for the technical scheme of the invention has the advantages that the whole process is completely and automatically processed, the system is more convenient and efficient, and the efficiency of information acquisition is improved by an interactive page operation mode.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a flowchart of a method for analyzing a target event based on social topics in accordance with a preferred embodiment of the present invention;
FIG. 2 is a flow diagram of a method for data processing of target text data in accordance with a preferred embodiment of the present invention; and
fig. 3 is a diagram illustrating a system for analyzing a target event based on social topics according to a preferred embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a flowchart of a method for analyzing a target event based on social topics according to a preferred embodiment of the present invention. In order to automatically analyze the hot event, the embodiment of the invention realizes the accurate acquisition of the key words and the analysis of the development of the hot event based on the time dimension.
As shown in fig. 1, the present invention provides a method for analyzing a target event based on social topics, which includes:
step 101: determining a target event in the social topic, and extracting an initial keyword of the target event; preferably, the target text data associated with the initial keyword is identified by a Python crawling algorithm in a predetermined time period and stored in the database.
Step 102: identifying target text data associated with the initial keyword at a predetermined time period;
according to the method, the Python crawling algorithm is called by hot event keywords transmitted from the front-end page, the target text data related to the keywords are identified, and the identified target text data are stored in the database.
Step 103: processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods; preferably, the processing of the target text data based on the type of the target text data includes:
determining a data format of target text data;
and analyzing the content of the sample text according to the data format of each target text data, thereby obtaining the body content of each target text data.
Preferably, the processing of the target text data based on the type of the target text data includes:
and cleaning the target text data, and filtering invalid text data in the target text data.
Preferably, the processing of the target text data based on the type of the target text data includes:
and converting the traditional text data of the target text data into the Chinese simplified text.
Preferably, the word segmentation processing is performed on the processed target text data, and includes:
and optimizing a bidirectional LSTM word segmentation algorithm based on an attention mechanism to perform word segmentation processing on target text data.
Preferably, extracting keywords in the target text data in different time periods includes:
and extracting the keywords in the target text data in different time periods based on the stop word dictionary with the self-defined threshold value.
The method and the device perform data cleaning on the acquired social target text data, filter invalid texts, or convert traditional texts into Chinese simplified texts and the like. The method analyzes the format of the target text data, for example, the target text data can be in HTML, PDF or word format, and obtains the text content of the target text data through the analyzed target text data. And performing word segmentation processing on the text data by using an attention mechanism optimized bidirectional LSTM word segmentation algorithm, designing a stop word dictionary with a custom threshold value to screen keywords, and finally extracting hot topics. The stop word dictionary can define the threshold value by user, and realizes flexible control of the filtering granularity.
Step 104: and visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods.
Preferably, the keywords of the target event are visually displayed based on time dimensions of different time periods, wherein the display mode includes: pie chart, thermodynamic diagram, bar chart.
The method utilizes an open source framework Echarts technology to display the extracted text data in the forms of a pie chart, a thermodynamic diagram, a bar chart and the like, supports mouse suspension prompt, selects a click entry data detail display page, and displays the whole change process of a hot spot event from beginning to end in the dimension of time.
The invention provides an analysis method of a full-automatic process for acquiring, analyzing and displaying data, and designs a stop word dictionary with a self-defined threshold, optimizes a bidirectional LSTM word segmentation algorithm by using an attention mechanism, and improves the precision of word segmentation.
The hot event automatic analysis system based on the social topics integrates data acquisition, data storage, data analysis and visual display, provides a full-automatic flow, and solves the problem that data are inconvenient to process manually in a traditional analysis method. Meanwhile, in order to improve the word segmentation precision of the invention, an attention mechanism is used for optimizing a bidirectional LSTM word segmentation algorithm, so that the accuracy of the algorithm is improved.
Fig. 2 is a flowchart of a method of data processing of target text data according to a preferred embodiment of the present invention.
The method and the device perform data cleaning on the acquired target text data, remove invalid texts in the target text data, and perform word segmentation on the cleaned target text data. The method constructs the stop word dictionary, and self-defines the threshold value according to the word segmentation precision. The stop words are filtered by the segmented target text data through the stop word dictionary, so that the hot event keywords are screened. The method constructs a visual data structure and displays the hot event keywords with different time dimensions.
Fig. 3 is a diagram illustrating a system for analyzing a target event based on social topics according to a preferred embodiment of the present invention. As shown in fig. 3, the present invention provides a system for analyzing a target event based on social topics, which includes:
an initial unit 301, configured to determine a target event in a social topic, and extract an initial keyword of the target event; the initial unit 301 is further configured to identify target text data associated with the initial keyword according to a predetermined time period through a Python crawling algorithm, and store the target text data in the database.
An identifying unit 302 for identifying target text data associated with the initial keyword at a predetermined time period;
according to the method, the Python crawling algorithm is called by hot event keywords transmitted from the front-end page, the target text data related to the keywords are identified, and the identified target text data are stored in the database.
A processing unit 303, configured to process the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods;
preferably, the processing unit 303 is configured to process the target text data based on the type of the target text data, and is further configured to:
determining a data format of target text data;
and analyzing the content of the sample text according to the data format of each target text data, thereby obtaining the body content of each target text data.
Preferably, the processing unit 303 is configured to process the target text data based on the type of the target text data, and is further configured to:
and cleaning the target text data, and filtering invalid text data in the target text data.
Preferably, the processing unit 303 is configured to process the target text data based on the type of the target text data, and is further configured to:
and converting the traditional text data of the target text data into the Chinese simplified text.
Preferably, the processing unit 303 is configured to perform word segmentation processing on the processed target text data, and is further configured to:
and optimizing a bidirectional LSTM word segmentation algorithm based on an attention mechanism to perform word segmentation processing on target text data.
Preferably, the processing unit 303 is configured to extract keywords in the target text data in different time periods, and is further configured to:
and extracting the keywords in the target text data in different time periods based on the stop word dictionary with the self-defined threshold value.
The method and the device perform data cleaning on the acquired social target text data, filter invalid texts, or convert traditional texts into Chinese simplified texts and the like. The method analyzes the format of the target text data, for example, the target text data can be in HTML, PDF or word format, and obtains the text content of the target text data through the analyzed target text data. And performing word segmentation processing on the text data by using an attention mechanism optimized bidirectional LSTM word segmentation algorithm, designing a stop word dictionary with a custom threshold value to screen keywords, and finally extracting hot topics. The stop word dictionary can define the threshold value by user, and realizes flexible control of the filtering granularity.
And the result unit 304 is configured to visually display the development of the target event through the keywords of the target event based on the time dimensions of different time periods.
Preferably, the result unit 304 is configured to visually display the keywords of the target event based on time dimensions of different time periods, where the display manner includes: pie chart, thermodynamic diagram, bar chart.
The method utilizes an open source framework Echarts technology to display the extracted text data in the forms of a pie chart, a thermodynamic diagram, a bar chart and the like, supports mouse suspension prompt, selects a click entry data detail display page, and displays the whole change process of a hot spot event from beginning to end in the dimension of time.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a// the [ device, component, etc ]" are to be interpreted openly as at least one instance of a device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

Claims (16)

1. A method of analyzing a target event based on social topics, the method comprising:
determining a target event in a social topic, and extracting an initial keyword of the target event;
identifying target text data associated with the initial keywords according to a predetermined time period;
processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods;
and visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods.
2. The method of claim 1, identifying target text data associated with the initial keyword by a Python crawling algorithm at a predetermined time period and storing the target text data in a database.
3. The method of claim 1, the processing the target text data based on the type of the target text data comprising:
determining a data format of the target text data;
and analyzing the content of the sample text according to the data format of each target text data, thereby obtaining the body content of each target text data.
4. The method of claim 1, the processing the target text data based on the type of the target text data comprising:
and cleaning the target text data, and filtering invalid text data in the target text data.
5. The method of claim 1, the processing the target text data based on the type of the target text data comprising:
and converting the traditional text data of the target text data into a Chinese simplified text.
6. The method of claim 1, wherein the performing word segmentation processing on the processed target text data comprises:
and optimizing a bidirectional LSTM word segmentation algorithm based on an attention mechanism to perform word segmentation processing on the target text data.
7. The method of claim 1, the extracting keywords in the target text data over different time periods, comprising:
and extracting the keywords in the target text data in different time periods based on the stop word dictionary with the self-defined threshold value.
8. The method according to claim 1, wherein the keywords of the target event are visually displayed based on time dimensions of different time periods, wherein the display manner includes: pie chart, thermodynamic diagram, bar chart.
9. A system for analyzing a target event based on social topics, the system comprising:
the system comprises an initial unit, a search unit and a search unit, wherein the initial unit is used for determining a target event in a social topic and extracting an initial keyword of the target event;
an identifying unit configured to identify target text data associated with the initial keyword at a predetermined time period;
the processing unit is used for processing the target text data based on the type of the target text data; performing word segmentation processing on the processed target text data, and extracting keywords in the target text data in different time periods;
and the result unit is used for visually displaying the development of the target event through the keywords of the target event based on the time dimensions of different time periods.
10. The system of claim 9, the initial unit further configured to identify target text data associated with the initial keyword by Python crawling algorithm at a predetermined time period and store the target text data in a database.
11. The system of claim 9, the processing unit to process the target text data based on a type of the target text data, further to:
determining a data format of the target text data;
and analyzing the content of the sample text according to the data format of each target text data, thereby obtaining the body content of each target text data.
12. The system of claim 9, the processing unit to process the target text data based on a type of the target text data, further to:
and cleaning the target text data, and filtering invalid text data in the target text data.
13. The system of claim 9, the processing unit to process the target text data based on a type of the target text data, further to:
and converting the traditional text data of the target text data into a Chinese simplified text.
14. The system of claim 9, wherein the processing unit is configured to perform word segmentation on the processed target text data, and further configured to:
and optimizing a bidirectional LSTM word segmentation algorithm based on an attention mechanism to perform word segmentation processing on the target text data.
15. The system of claim 9, the processing unit to extract keywords in the target text data over different time periods, further to:
and extracting the keywords in the target text data in different time periods based on the stop word dictionary with the self-defined threshold value.
16. The method of claim 9, wherein the result unit is configured to visually display the keywords of the target event based on time dimensions of different time periods, and the display manner includes: pie chart, thermodynamic diagram, bar chart.
CN202111419966.5A 2021-11-26 2021-11-26 Method and system for analyzing target event based on social topic Pending CN114265962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111419966.5A CN114265962A (en) 2021-11-26 2021-11-26 Method and system for analyzing target event based on social topic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111419966.5A CN114265962A (en) 2021-11-26 2021-11-26 Method and system for analyzing target event based on social topic

Publications (1)

Publication Number Publication Date
CN114265962A true CN114265962A (en) 2022-04-01

Family

ID=80825747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111419966.5A Pending CN114265962A (en) 2021-11-26 2021-11-26 Method and system for analyzing target event based on social topic

Country Status (1)

Country Link
CN (1) CN114265962A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system
CN111046141A (en) * 2019-12-03 2020-04-21 新华智云科技有限公司 Text library keyword refining method based on historical time characteristics
CN113468868A (en) * 2021-07-07 2021-10-01 西北大学 NLP-based real-time network hotspot content analysis method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system
CN111046141A (en) * 2019-12-03 2020-04-21 新华智云科技有限公司 Text library keyword refining method based on historical time characteristics
CN113468868A (en) * 2021-07-07 2021-10-01 西北大学 NLP-based real-time network hotspot content analysis method

Similar Documents

Publication Publication Date Title
US7577963B2 (en) Event data translation system
Stamatatos et al. Automatic authorship attribution
CN106874292B (en) Topic processing method and device
CN109145260B (en) Automatic text information extraction method
CN112581006A (en) Public opinion engine and method for screening public opinion information and monitoring enterprise main body risk level
JP6394388B2 (en) Synonym relation determination device, synonym relation determination method, and program thereof
WO2017051425A1 (en) A computer-implemented method and system for analyzing and evaluating user reviews
Chen et al. A study of language modeling for Chinese spelling check
US20040158558A1 (en) Information processor and program for implementing information processor
CN107357765A (en) Word document flaking method and device
CN115114916A (en) User feedback data analysis method and device and computer equipment
Jui et al. A machine learning-based segmentation approach for measuring similarity between sign languages
Scharkow Content analysis, automatic
CN116756688A (en) Public opinion risk discovery method based on multi-mode fusion algorithm
Patel et al. Approaches of anonymisation of an SMS corpus
CN114265962A (en) Method and system for analyzing target event based on social topic
CN117391071B (en) News topic data mining method, device and storage medium
Gutsche Automatic weak signal detection and forecasting
Dwivedi et al. Mental health in messages: Unravelling emotional patterns through advanced text analysis
CN110765107A (en) Question type identification method and system based on digital coding
CN111126034A (en) Medical variable relation processing method and device, computer medium and electronic equipment
CN111341404B (en) Electronic medical record data set analysis method and system based on ernie model
KR101838089B1 (en) Sentimetal opinion extracting/evaluating system based on big data context for finding welfare service and method thereof
CN116805148A (en) Method and system for searching context of objective questions of legal examination
US20090037487A1 (en) Prioritizing documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination