[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104281607A - Microblog hot topic analyzing method - Google Patents

Microblog hot topic analyzing method Download PDF

Info

Publication number
CN104281607A
CN104281607A CN201310284081.8A CN201310284081A CN104281607A CN 104281607 A CN104281607 A CN 104281607A CN 201310284081 A CN201310284081 A CN 201310284081A CN 104281607 A CN104281607 A CN 104281607A
Authority
CN
China
Prior art keywords
microblog
analysis
data
hot
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310284081.8A
Other languages
Chinese (zh)
Inventor
肖江
严时浪
肖伦文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI RUIYING SOFTWARE TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI RUIYING SOFTWARE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI RUIYING SOFTWARE TECHNOLOGY Co Ltd filed Critical SHANGHAI RUIYING SOFTWARE TECHNOLOGY Co Ltd
Priority to CN201310284081.8A priority Critical patent/CN104281607A/en
Publication of CN104281607A publication Critical patent/CN104281607A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a microblog hot topic analyzing method. The method comprises the following steps that a microblog collection module obtains microblog data in the mode of combination of a web spider and a microblog third-party api technology according to a collection strategy; key words and sensitive words are called from a word bank through a word segmentation technology, and key words and sensitive words are analyzed out from microblog text data; the microblog webpage text data are filtered according to the analyzed key words, the analyzed sensitive words and emotional tendency words; a hot topic module marks the content involved between the symbols of # and # and between the symbols of [] as a topic through a clustering analysis technology, so that the number of microblog comments is counted; a hot people module analyzes the number of microblog fans and the number of the comments through the clustering analysis technology; a microblog early warning module analyzes out microblog information related to the key words and the sensitive words from the network microblog; an analyzing and counting module automatically generates a brief report through relevant data analyzed out from the system. The accuracy of topic analysis is improved, and detection efficiency is improved.

Description

Microblog hot topic analysis method
Technical Field
The invention relates to an analysis method, in particular to a microblog hot topic analysis method. .
Background
The microblog is an information sharing, spreading and acquiring platform based on user relationship, and a user can update information in characters of about 140 words through WEB, WAP and various client side components and realize instant sharing. The microblog is used as a network platform for fast sharing and spreading, and has the characteristics of huge information amount, diverse information dispersion and the like. In China, the Sina microblog and the Teng microblog are the hottest microblog systems, and according to public data, the Sina microblog has more than 2 hundred million registered users, and the Teng microblog has more than 3 hundred million registered users. The public opinion analysis system based on the microblog social network can gather hot topics in microblog opinions, track and analyze the hot topics, and provide a public opinion early warning function. At present, the main ways for discovering discussion hotspots on a microblog platform include: a hot topic finding method and a text classification method based on word frequency. Wherein,
word frequency statistics is a main mode for finding discussion hot spots on a current microblog platform. The method is derived from the traditional tf-idf indexing method. In a certain time range, the platform carries out word segmentation and word screening on microblogs issued by all users, and establishes an inverted index, then the words are sorted according to frequency, the words with higher frequency sorting become hot topics on the microblogs, and the users can use the words provided by the platform to find out related microblog entries on the microblog platform through the internal inverted index. The traditional hot word discovery system workflow chart frequency statistical method is simple and easy to implement, has better working efficiency under manual intervention, and is widely adopted in service providers at present. However, the frequency statistical method is basically incapable of dealing with semantic phenomena such as synonyms and word ambiguity, which greatly interfere with the synonyms and word ambiguity. The method based on word matching has the phenomenon of false alarm or missing report in text matching. On a microblog platform, due to the fact that the content is large, the user personality is strong, and therefore the accuracy of hot topic finding work based on text matching cannot be well guaranteed. In addition, the single hot word can only bring one-sided information to the user, and more likely to provide the user with an information index rather than the information itself. In order to improve the user experience, a certain amount of manual screening work is required to be added, so that the efficiency of the system is reduced; moreover, the frequency statistical method can hardly provide effective assistance for meeting the increasingly-rising user personalized recommendation requirements.
The traditional text classification method can also be applied to a microblog platform for hot spot information screening, and the automatic classifiers widely used at present comprise a Bayesian classifier, an example-based kNN classifier, a support vector machine and the like. Due to the fact that the number of microblog users is quite large, topics concerned by the users are quite wide, obvious mutual influence relationships exist among the users, and the whole user network can capture hot events quite quickly. If a classifier can be designed to fit the current hot spot event, the variation trend of the information in the category can be detected in real time. However, hotspot events and topics are unknown before they occur, so the problem shifts to fixed monitoring of some specific, sensitive topics. The classifier method has a good effect on screening specific topics, however, since the distribution range of text contents on the microblog is very wide, it is almost impossible to design a complete dictionary-type classifier so that all information falls into specific categories. Hot topic discovery requires rapid capture of multiple different topics, and a general classifier is not adequate for such tasks. In addition, due to the burstiness and uncertainty of news information, if the change trend of hot spot information on a microblog is to be tracked, the result of the classifier must be monitored at a low cost.
As described above, the conventional microblog hot topic analysis algorithm has the following two problems:
firstly, the traditional microblog hot topic analysis method does not pay attention to the word accuracy of the search result, namely the traditional method is limited by the mutual connection among the essential split words, so that the phenomena of great interference on synonyms and word ambiguity are basically impossible to process, and the user experience is influenced to a great extent. Because the characters adopted by human beings during narration have high randomness and uncertainty, users are often troubled by results with similar texts and substantially irrelevant contents when searching for massive information. The microblog hot topic analysis must consider the word accuracy of the search result, and the search result must consider the difference of the similar words.
Secondly, the traditional microblog hot topic analysis method does not pay attention to the real-time performance of the search results, namely the generation time of the hot topic analysis results has no or little influence on the result ranking. However, the microblog messages have strong real-time performance and are dynamically generated by microblog users, and the contents of the microblog messages often relate to real-time messages and contents, so that the real-time performance of search results must be considered in the microblog hot topic analysis method, and the generation time of the search results must be used as a basis for ranking.
However, research in the related field of microblog hot topic analysis methods is limited, and the current research work mainly focuses on passive data acquisition of known topics, so that timeliness of microblog public opinion discovery cannot be guaranteed. The work of public opinion analysis and early warning usually needs a large amount of network crawlers to collect mass data to read and write out, and the traditional file storage or database storage cannot meet the performance requirements of the public opinion analysis work.
Disclosure of Invention
The invention aims to provide a microblog hot topic analysis method, and the microblog hot topic analysis method is used for solving the technical problem.
The invention solves the technical problems through the following technical scheme: a microblog hot topic analysis method is characterized by comprising the following steps:
the method comprises the following steps that firstly, a microblog acquisition module acquires microblog data in a mode of combining a web crawler and a microblog third-party api technology according to an acquisition strategy;
step two, calling keywords and sensitive words from a word bank by using a word segmentation processing technology, and analyzing the keywords and the sensitive words from microblog text data;
thirdly, filtering the microblog webpage text data according to the analyzed keywords, sensitive words and emotional tendency words, and storing filtering records;
fourthly, the hot topic module marks the included content between the # # and the [ ] symbol as a topic through a cluster analysis technology, and analyzes the current hot topic according to statistics of the number of microblog comments, the forwarding times and the like, so that the accuracy of topic analysis is greatly improved;
step five, the hot character module analyzes the number of microblog fans and the number of comments by a clustering analysis technology to determine the hot characters under specified conditions;
a microblog early warning module analyzes microblog information related to the keywords and the sensitive words from the network microblog and timely gives early warning notification to the user;
and step seven, the analysis and statistics module automatically generates a brief report for analysis and use on the related data analyzed in the system.
Preferably, the data collected in the step one not only include domestic newwave and flight microblog, but also include data of foreign twitter microblog.
Preferably, the keywords in the second step are defined by the user in addition to the sensitive words specified by the relevant national laws and regulations.
Preferably, the hot topics of interest in the fourth step can be viewed not only by content, but also by source and propagation trend.
Preferably, the sending of the warning notification in the sixth step is sent through a mailbox, a website prompt and a mobile phone.
Preferably, after the required information is analyzed in the seventh step, the microblog system user is bound with the system through a microblog account.
Preferably, the microblog hot topic analysis method is applied to a microblog early warning system, and the microblog early warning system comprises a microblog acquisition module, a microblog analysis module, a microblog service module and a microblog data warehouse.
The positive progress effects of the invention are as follows: the invention provides a breadth-first webpage acquisition technology based on time judgment. By adding the time analyzer in the webpage collection process, whether the time in a to-be-collected page is earlier than a preset time point is judged, and therefore whether the page is only subjected to breadth collection is determined. The method avoids the early collection of useless information, improves the collection efficiency and ensures the collection coverage rate. And providing an agglomeration type hierarchical clustering algorithm for topic detection. According to the characteristic of flexible words used in the microblog, the cluster analysis model is used for analyzing the current hot topic, so that the topic analysis accuracy is greatly improved, the detection efficiency is improved, and the topic detection quality is improved. The invention provides a method for monitoring microblog information by a microblog early warning system, which is characterized in that data acquisition is carried out on three microblog systems, namely a green wave system, a flight system and a twitter system on the Internet by a microblog data acquisition technology, word segmentation processing, sensitive word processing and text clustering analysis are carried out on acquired mass data, and the current hot topic is analyzed, so that a user can timely and conveniently browse the latest microblog hot spots, track the microblog source, check sensitive microblogs and trend analysis, carry out early warning on dangerous information, and finally can self-set concerned content to display a statistical report. According to the invention, the technologies of webpage collection, text analysis and mining are applied to microblog information public opinion analysis, a discovery model of network hot topics is researched, a public opinion analysis system based on a microblog social network is realized, the requirement of current microblog public opinion analysis is met, and the blank of important public opinion source mining is filled.
Drawings
FIG. 1 is a flowchart of a microblog hot topic analysis method.
Detailed Description
The following provides a detailed description of the preferred embodiments of the present invention with reference to the accompanying drawings.
As shown in fig. 1, the microblog hot topic analysis method of the invention includes the following steps:
the method comprises the following steps that firstly, a microblog acquisition module acquires microblog data in a mode of combining a web crawler and a microblog third-party api technology according to an acquisition strategy;
step two, calling keywords and sensitive words from a word bank by using a word segmentation processing technology, and analyzing the keywords and the sensitive words from microblog text data;
thirdly, filtering the microblog webpage text data according to the analyzed keywords, sensitive words and emotional tendency words, and storing filtering records;
fourthly, the hot topic module marks the included content between the # # and the [ ] symbol as a topic through a cluster analysis technology, and analyzes the current hot topic according to statistics of the number of microblog comments, the forwarding times and the like, so that the accuracy of topic analysis is greatly improved;
step five, the hot character module analyzes the number of microblog fans and the number of comments by a clustering analysis technology to determine the hot characters under specified conditions;
a microblog early warning module analyzes microblog information related to the keywords and the sensitive words from the network microblog and timely gives early warning notification to the user;
and step seven, the analysis and statistics module automatically generates a brief report for analysis and use on the related data analyzed in the system.
The data collected in the first step not only comprise domestic Xinlang and Teng-Wen microblog, but also comprise data of foreign twitter microblog.
In the second step, the user can define the keywords and the sensitive words except the sensitive words specified by the relevant national laws and regulations.
In the fourth step, not only the content but also the source and the propagation trend of the interested hot topic can be viewed.
The sending of the early warning notification in the sixth step can be sent through various ways such as a mailbox, a website prompt, a mobile phone and the like.
After the required information is analyzed in the seventh step, the microblog system user can be bound with the system through the microblog account number, and operations similar to those on the newwave, vacation, twitter microblog, such as paying attention, commenting, publishing a microblog and the like are performed.
According to the characteristics of strong timeliness of microblog information, high information updating and transmission speed and strong user interactivity, the invention designs a breadth-first webpage acquisition technology based on time judgment. The core idea of the acquisition technology comprises two aspects, namely, link information is automatically acquired from webpages through the link relation among the webpages of the microblog, original webpages are automatically acquired according to the links, and the original webpages in the whole microblog are acquired through continuous circulation; and secondly, if the information time of one page is all earlier than the preset time, deep acquisition is not carried out, and only breadth acquisition is carried out through the page.
The invention can be applied to a microblog early warning system, is set as a microblog early warning monitoring system of colleges and universities through a system user interface, monitors all microblog information related to the colleges and universities, pays attention to hot topics and hot characters of colleges and universities, tracks emergencies related to the colleges and universities in time, gives early warning to microblog contents with negative influence on the designated colleges and universities, maintains the image of the colleges and universities, improves the education quality and maintains the social harmony and stability.
The microblog early warning system applied to the invention comprises a microblog acquisition module, a microblog analysis module, a microblog service module, a microblog data warehouse and the like.
The microblog collection module comprises: the system is in charge of real-time acquisition, tracking and monitoring of three microblog systems of Xinlang, Tencent and twitter on the Internet, one key technology in a microblog acquisition module is an intelligent information acquisition technology, intelligent distributed cooperative crawlers are adopted, the number of crawler servers and the number of crawlers can be dynamically configured, calculation resources used for acquisition are dynamically increased and decreased under different acquisition requirements, microblog information is acquired on the Internet through a crawler module in a webpage acquisition subsystem, the crawler module can be provided with the number of crawlers, the acquisition speed, the initial URL, the regular expression of the URL meeting the acquisition requirements, the termination condition of the crawler thread and other constraints to acquire related webpage information, and the acquired webpage information is cleaned through a webpage cleaning module to extract microblog texts, link addresses, copyright descriptions and other noise data in the related webpage, Collecting data such as time and the like.
(II) a microblog analysis module: and carrying out information duplication elimination, propagation chain analysis, trend analysis and the like on the information obtained by the microblog acquisition module through a microblog analysis module to obtain valuable microblog information, analyzing public opinion hotspots in real time and mastering certain trends of the microblog information. The microblog analysis module specifically comprises:
the page filtering can be used for analyzing and filtering the content of the microblog webpage, automatically removing useless information and accurately acquiring the main information of the target content;
analyzing a propagation chain, tracking the source, the reprinting amount, the publisher and other related information elements of a certain hot topic for a period of time, and finally forming a propagation chain analysis graph;
automatic classification, namely traversing and scanning microblog contents according to a keyword rule defined by a user, identifying microblogs where the keywords are located, automatically classifying identifications, obtaining a classification feature vector space model according to sample training, and then realizing automatic classification identification of the microblogs according to feature vectors of the microblogs;
performing multiple clustering, namely performing multiple clustering analysis on the content of the microblog by adopting a multiple clustering algorithm, and performing intelligent classification processing on massive microblog information;
finding hot spots and key words, analyzing the hot degree of the microblog by adopting a hot spot weight calculation model, automatically finding hot spot words in the microblog and helping a user to intuitively know network hot spots;
trend analysis, namely for high-attention events caused by microblogs, the outbreak points and the situations of the microblogs can be mastered in time, and hot events in different time periods are provided;
analyzing tendency, namely performing cluster analysis and commendatory and derogatory analysis on the netizen comments of the microblog by adopting a text cluster and commendatory and derogatory analysis technology, analyzing and inducing main viewpoints of the netizen, and counting the commendatory and derogatory tendency distribution condition of the netizen;
the public opinion research and judgment is based on the analysis function, and performs source analysis, authenticity analysis, classification analysis, directional analysis, correction analysis and the like, so that various hotspots and public opinion trends can be comprehensively known and mastered in time on the whole, and various social emergencies and public opinion crises can be flexibly dealt with.
And (III) the microblog service module is visually experienced by a user, can clearly know the function of the microblog early warning system, can more specifically and conveniently know the latest hot spot of the whole microblog through the operation of the user, can set keywords for the matters concerned by the user, searches the keywords and timely acquires some required information. The microblog service module specifically comprises:
monitoring setting, namely monitoring related information of a microblog user through keyword setting, key people setting, area setting and key monitoring word setting;
topic tracking, namely analyzing hot topics by a microblog system according to a microblog acquired from a network;
the microblog system analyzes the hot character according to the microblog acquired from the network;
an emergency, an event that occurs in a short time (within 24 hours) causing a large reverberation on the network;
searching microblogs, wherein a user can search all microblogs captured by a microblog system to obtain microblog data wanted by the user;
statistical analysis, wherein corresponding modules of the microblog system are statistically analyzed as follows: marking statistics, marking reports, topic statistics, topic reports, monitoring word statistics and user behavior statistics;
a microblog early warning step, wherein a microblog system analyzes a microblog according to a keyword set by a user and displays the microblog on a microblog early warning page;
on-line microblog, the microblog system user can perform operations similar to those on the Xinlang, Tencent and twitter microblogs, such as paying attention, commenting and publishing microblogs.
And (IV) the microblog data warehouse can store massive unstructured information, and a real-time dynamic indexing technology is adopted, so that indexes are quickly and synchronously updated during data addition, deletion and modification, the whole index and local reconstructed indexes are not required to be reconstructed, namely, the data can be immediately retrieved after being changed, the real-time performance and effectiveness of information search are ensured, and the core retrieval requirement of public opinion application is met. The microblog data warehouse specifically comprises:
a database storage service capable of storing massive unstructured information and calling the information of the database at any time;
the data index service adopts a real-time dynamic index technology, and ensures the real-time performance and effectiveness of information search.
The microblog early warning system has the following specific functions:
(1) collecting microblog information, collecting data of three microblog systems of new wave, Tencent and twitter on the Internet, and sending the collected data to the step (2) for analysis.
(2) And microblog analysis, namely performing information duplication removal, propagation chain analysis, trend analysis and the like on the acquired information. Extracting effective intelligence data, and then transmitting the intelligence data to (3) for intelligence mining and analysis.
(3) And (4) information mining, namely further performing information mining on the information, such as information of a target and dynamic mining, and then further processing the information through the steps (4) and (5).
(4) The microblog service displays information required by a user through an interface according to the requirements of the user, and the functions available to the user include monitoring setting, topic tracking, hot people, emergencies, microblog searching, statistical analysis, online microblog, microblog early warning and the like.
(5) And the microblog data warehouse stores the excavated information in the microblog data warehouse, and waits for the searching and using of the user at any time, so that the real-time performance and the effectiveness of information searching are ensured.
Compared with the prior art, the invention has the following advantages and beneficial effects: the microblog acquisition module acquires data of three microblog systems, namely, a new wave system, an vacation system and a twitter system on the Internet, and then transmits the data to the microblog analysis module for information duplication elimination, trend analysis and the like. After effective information is extracted, the information is displayed to a user through an interface, and functions available for the user include monitoring setting, topic tracking, hot people, emergencies, microblog searching, statistical analysis, online microblog, microblog early warning and the like. The user is more humanized in the operation of the interface, the realized functions are many, the microblog system can be monitored in an all-around manner, hot topics can be fed back in real time, and the excessive speeches can be tracked and early warned. The crawler-based intelligent distributed collaborative crawler system adopts an intelligent information acquisition technology, intelligently and distributively collaborates the crawlers, can dynamically configure the number of the crawler servers and the number of the crawlers, and dynamically increases and decreases the computing resources used for acquisition under different acquisition requirements. The system acquires microblog information on the Internet through a crawler module in a webpage acquisition subsystem, and can set the number, the acquisition speed, the initial URL, the regular expression of the URL meeting the acquisition requirement, the crawler thread termination condition and other constraints on the crawler module to acquire related webpage information. And for the acquired webpage, eliminating noise data such as advertisements, navigation information, pictures, copyright descriptions and the like in the webpage through a webpage cleaning module, extracting data such as microblog texts, link addresses, acquisition time and the like in the related webpage, and storing the data in a database.
Performing the following operations on each piece of microblog data acquired by a microblog search engine:
the data acquired in the step 1-1) are mainly stored in two types, one type is
The User data User, the other is microblog data Tweet;
step 1-2) using a relational database to store User and sweet data for follow-up
And associating the query.
Step 2-1) using Chinese word segmentation technology to process microblog content in Tweet data
content carries out word segmentation;
step 2-2) establishing an inverted index by using a full-text retrieval technology, and making a search for data analysis
Inquiring the index;
step 2-3) extracting the content while establishing index for the content field
Content tag bracketed by "#" and "[ signs;
step 2-4) and establishing an inverted index for the tag field;
step 3-1) establishing a timer program, and performing data entry on the Tway data every 1 hour
Performing query, counting all collected tag data within one hour, wherein the query condition is time = [ now () -1h TO now () ] & face.field = tag;
step 3-2) performing reverse sequencing according to the data amount tag _ count of the tag, and taking out the data before
100 tags;
step 4-1) traversing the 100 tags extracted in the step 3-2), and using Chinese word segmentation
Performing word segmentation by the technology, wherein each term after word segmentation is term;
and 4-2) continuously querying the full text retrieval server. When term is less than 3, it is required
All term must match, if term is greater than 3, then it is required that at least 75% of term must match. If the term number is less than or equal TO 3, the query condition is (content = term1 AND term2 AND term3) & time = [ now () -24h TO now () ]; if the number of term is greater than 3, the query condition should be (content = (term1 AND term2 AND term3) OR (term4 OR term5 …) & time = [ now () -24h TO now () ];
and 4-3) inquiring current microblog data corresponding to 100 tags by using the method, and then sequencing the current microblog data in a reverse order according to the number t _ count of the microblogs corresponding to the 100 tags to obtain 100 hot topics on the day.
The invention has the advantages that: by means of the cluster analysis technology, the accuracy of the current microblog retrieval result is improved. The calculation method of the analysis statistics is simple and efficient, the real-time performance is remarkably improved, the microblog system can be monitored in all directions in time, hot topics can be fed back in real time, and intelligent tracking and early warning can be performed on some over-excited speeches.
In one embodiment, the collector may periodically collect microblog messages. However, collecting all users periodically makes the collector inefficient, because a large part of microblog users have long posting periods, such as updating once every few days, and if the part of users is many, collecting once for example for 3 minutes by the collector will bring about a great drop in efficiency.
Various modifications and changes may be made to the present invention by those skilled in the art. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (7)

1. A microblog hot topic analysis method is characterized by comprising the following steps:
the method comprises the following steps that firstly, a microblog acquisition module acquires microblog data in a mode of combining a web crawler and a microblog third-party api technology according to an acquisition strategy;
step two, calling keywords and sensitive words from a word bank by using a word segmentation processing technology, and analyzing the keywords and the sensitive words from microblog text data;
thirdly, filtering the microblog webpage text data according to the analyzed keywords, sensitive words and emotional tendency words, and storing filtering records;
fourthly, the hot topic module marks the included content between the # # and the [ ] symbol as a topic through a cluster analysis technology, and analyzes the current hot topic according to statistics of the number of microblog comments, the forwarding times and the like, so that the accuracy of topic analysis is greatly improved;
step five, the hot character module analyzes the number of microblog fans and the number of comments by a clustering analysis technology to determine the hot characters under specified conditions;
a microblog early warning module analyzes microblog information related to the keywords and the sensitive words from the network microblog and timely gives early warning notification to the user;
and step seven, the analysis and statistics module automatically generates a brief report for analysis and use on the related data analyzed in the system.
2. The microblog-based emergency analysis method according to claim 1, wherein the data collected in the first step includes not only domestic newwave and flight microblog but also foreign twitter microblog data.
3. The microblog-based emergency analysis method according to claim 1, wherein the keywords in the second step define keywords and sensitive words by the user in addition to the sensitive words specified by the national relevant laws and regulations.
4. The microblog-based emergency analysis method according to claim 1, wherein in the fourth step, not only the content but also the source and the propagation trend of the interested hot topic can be viewed.
5. The microblog-based emergency analysis method according to claim 1, wherein the sending of the warning notice in the sixth step is sent through a mailbox, a website prompt, and a mobile phone.
6. The method for analyzing emergency events based on microblogs according to claim 1, wherein in the seventh step, after the required information is analyzed, the user of the microblog system is bound with the system through a microblog account.
7. The microblog-based emergency analysis method according to claim 1, wherein the microblog hot topic analysis method is applied to a microblog early warning system, and the microblog early warning system comprises a microblog acquisition module, a microblog analysis module, a microblog service module and a microblog data warehouse.
CN201310284081.8A 2013-07-08 2013-07-08 Microblog hot topic analyzing method Pending CN104281607A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310284081.8A CN104281607A (en) 2013-07-08 2013-07-08 Microblog hot topic analyzing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310284081.8A CN104281607A (en) 2013-07-08 2013-07-08 Microblog hot topic analyzing method

Publications (1)

Publication Number Publication Date
CN104281607A true CN104281607A (en) 2015-01-14

Family

ID=52256483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310284081.8A Pending CN104281607A (en) 2013-07-08 2013-07-08 Microblog hot topic analyzing method

Country Status (1)

Country Link
CN (1) CN104281607A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899324A (en) * 2015-06-19 2015-09-09 成都国腾实业集团有限公司 Sample training system based on IDC (internet data center) harmful information monitoring system
CN104965894A (en) * 2015-06-19 2015-10-07 成都国腾实业集团有限公司 Data analysis system for IDC hazardous information monitoring platform
CN105791091A (en) * 2016-03-02 2016-07-20 四川长虹电器股份有限公司 System and method for evaluating operation quality of official microblog and wechat public numbers
CN106202222A (en) * 2016-06-28 2016-12-07 北京小米移动软件有限公司 The determination method and device of focus incident
WO2016206395A1 (en) * 2015-06-25 2016-12-29 中兴通讯股份有限公司 Weekly report information processing method and device
CN106302407A (en) * 2016-08-02 2017-01-04 四川秘无痕信息安全技术有限责任公司 A kind of method monitoring wechat circle of friends transmission data
CN106339389A (en) * 2015-07-09 2017-01-18 天津市国瑞数码安全系统股份有限公司 Control method of sensitive information based on microblog website
CN106354846A (en) * 2016-08-31 2017-01-25 成都广电视讯文化传播有限公司 Intelligent news manuscript selection method and system based on big data
CN106779827A (en) * 2016-12-02 2017-05-31 上海晶樵网络信息技术有限公司 A kind of Internet user's behavior collection and the big data method of analysis detection
CN106777236A (en) * 2016-12-27 2017-05-31 北京百度网讯科技有限公司 The exhibiting method and device of the Query Result based on depth question and answer
CN106886579A (en) * 2017-01-23 2017-06-23 北京航空航天大学 Real-time streaming textual hierarchy monitoring method and device
CN106980692A (en) * 2016-05-30 2017-07-25 国家计算机网络与信息安全管理中心 A kind of influence power computational methods based on microblogging particular event
CN107168943A (en) * 2017-04-07 2017-09-15 平安科技(深圳)有限公司 The method and apparatus of topic early warning
CN107341160A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 A kind of method and device for intercepting reptile
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN108335110A (en) * 2017-01-17 2018-07-27 阿里巴巴集团控股有限公司 Chat message processing method and processing device
CN109241380A (en) * 2018-08-24 2019-01-18 北京信息科技大学 A kind of acquisition method of the microblog data combined based on web crawlers and Sina API
CN110083701A (en) * 2019-03-20 2019-08-02 重庆邮电大学 A kind of cyberspace Mass disturbance early warning system based on average influence
CN110502703A (en) * 2019-07-12 2019-11-26 北京邮电大学 Social networks incident detection method based on character string dictionary building
CN111401648A (en) * 2020-03-20 2020-07-10 李惠芳 Event prediction method under condition of mutual influence of internet hotspots
CN111783468A (en) * 2020-06-28 2020-10-16 百度在线网络技术(北京)有限公司 Text processing method, device, equipment and medium
CN111931098A (en) * 2019-04-28 2020-11-13 北京仝睿科技有限公司 Monitoring object determination method and device and electronic equipment
CN112115263A (en) * 2020-09-08 2020-12-22 浙江嘉兴数字城市实验室有限公司 NLP-based social management big data monitoring and early warning method
CN112632361A (en) * 2020-12-29 2021-04-09 中科院计算技术研究所大数据研究院 Iterative data acquisition method
CN112818234A (en) * 2021-02-02 2021-05-18 中慧绿浪科技(天津)集团有限公司 Network public opinion information analysis processing method and system
CN112929235A (en) * 2021-02-06 2021-06-08 珠海市鸿瑞信息技术股份有限公司 Network monitoring system based on internet
CN113010689A (en) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 Buddhism knowledge discrimination method, device, equipment and storage medium
CN113127576A (en) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 Hotspot discovery method and system based on user content consumption analysis
CN117093762A (en) * 2023-07-18 2023-11-21 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method
CN117216418A (en) * 2023-11-08 2023-12-12 一网互通(北京)科技有限公司 Method and device for extracting popular phrase data in real time based on emotion and propagation force

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965894A (en) * 2015-06-19 2015-10-07 成都国腾实业集团有限公司 Data analysis system for IDC hazardous information monitoring platform
CN104899324B (en) * 2015-06-19 2018-09-11 成都国腾实业集团有限公司 One kind monitoring systematic sample training system based on IDC harmful informations
CN104899324A (en) * 2015-06-19 2015-09-09 成都国腾实业集团有限公司 Sample training system based on IDC (internet data center) harmful information monitoring system
WO2016206395A1 (en) * 2015-06-25 2016-12-29 中兴通讯股份有限公司 Weekly report information processing method and device
CN106339389A (en) * 2015-07-09 2017-01-18 天津市国瑞数码安全系统股份有限公司 Control method of sensitive information based on microblog website
CN105791091A (en) * 2016-03-02 2016-07-20 四川长虹电器股份有限公司 System and method for evaluating operation quality of official microblog and wechat public numbers
CN107341160B (en) * 2016-05-03 2020-09-01 北京京东尚科信息技术有限公司 Crawler intercepting method and device
CN107341160A (en) * 2016-05-03 2017-11-10 北京京东尚科信息技术有限公司 A kind of method and device for intercepting reptile
CN106980692A (en) * 2016-05-30 2017-07-25 国家计算机网络与信息安全管理中心 A kind of influence power computational methods based on microblogging particular event
CN106980692B (en) * 2016-05-30 2020-12-08 国家计算机网络与信息安全管理中心 Influence calculation method based on microblog specific events
CN106202222A (en) * 2016-06-28 2016-12-07 北京小米移动软件有限公司 The determination method and device of focus incident
CN106302407A (en) * 2016-08-02 2017-01-04 四川秘无痕信息安全技术有限责任公司 A kind of method monitoring wechat circle of friends transmission data
CN106302407B (en) * 2016-08-02 2019-05-17 四川秘无痕信息安全技术有限责任公司 A method of monitoring wechat circle of friends sends data
CN106354846A (en) * 2016-08-31 2017-01-25 成都广电视讯文化传播有限公司 Intelligent news manuscript selection method and system based on big data
CN106779827A (en) * 2016-12-02 2017-05-31 上海晶樵网络信息技术有限公司 A kind of Internet user's behavior collection and the big data method of analysis detection
CN106777236A (en) * 2016-12-27 2017-05-31 北京百度网讯科技有限公司 The exhibiting method and device of the Query Result based on depth question and answer
CN106777236B (en) * 2016-12-27 2020-11-03 北京百度网讯科技有限公司 Method and device for displaying query result based on deep question answering
CN108335110B (en) * 2017-01-17 2022-04-12 阿里巴巴集团控股有限公司 Chat information processing method and device
CN108335110A (en) * 2017-01-17 2018-07-27 阿里巴巴集团控股有限公司 Chat message processing method and processing device
CN106886579B (en) * 2017-01-23 2020-01-14 北京航空航天大学 Real-time streaming text grading monitoring method and device
CN106886579A (en) * 2017-01-23 2017-06-23 北京航空航天大学 Real-time streaming textual hierarchy monitoring method and device
CN107168943A (en) * 2017-04-07 2017-09-15 平安科技(深圳)有限公司 The method and apparatus of topic early warning
US11205046B2 (en) 2017-04-07 2021-12-21 Ping An Technology (Shenzhen) Co., Ltd. Topic monitoring for early warning with extended keyword similarity
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN107622333B (en) * 2017-11-02 2020-08-18 北京百分点信息科技有限公司 Event prediction method, device and system
CN109241380A (en) * 2018-08-24 2019-01-18 北京信息科技大学 A kind of acquisition method of the microblog data combined based on web crawlers and Sina API
CN110083701A (en) * 2019-03-20 2019-08-02 重庆邮电大学 A kind of cyberspace Mass disturbance early warning system based on average influence
CN111931098A (en) * 2019-04-28 2020-11-13 北京仝睿科技有限公司 Monitoring object determination method and device and electronic equipment
CN110502703A (en) * 2019-07-12 2019-11-26 北京邮电大学 Social networks incident detection method based on character string dictionary building
CN111401648B (en) * 2020-03-20 2021-01-19 李惠芳 Event prediction method under condition of mutual influence of internet hotspots
CN111401648A (en) * 2020-03-20 2020-07-10 李惠芳 Event prediction method under condition of mutual influence of internet hotspots
CN111783468A (en) * 2020-06-28 2020-10-16 百度在线网络技术(北京)有限公司 Text processing method, device, equipment and medium
CN111783468B (en) * 2020-06-28 2023-08-15 百度在线网络技术(北京)有限公司 Text processing method, device, equipment and medium
CN112115263A (en) * 2020-09-08 2020-12-22 浙江嘉兴数字城市实验室有限公司 NLP-based social management big data monitoring and early warning method
CN112632361A (en) * 2020-12-29 2021-04-09 中科院计算技术研究所大数据研究院 Iterative data acquisition method
CN112818234A (en) * 2021-02-02 2021-05-18 中慧绿浪科技(天津)集团有限公司 Network public opinion information analysis processing method and system
CN112929235A (en) * 2021-02-06 2021-06-08 珠海市鸿瑞信息技术股份有限公司 Network monitoring system based on internet
CN113010689A (en) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 Buddhism knowledge discrimination method, device, equipment and storage medium
CN113127576A (en) * 2021-04-15 2021-07-16 微梦创科网络科技(中国)有限公司 Hotspot discovery method and system based on user content consumption analysis
CN113127576B (en) * 2021-04-15 2024-05-24 微梦创科网络科技(中国)有限公司 Hot spot discovery method and system based on user content consumption analysis
CN117093762A (en) * 2023-07-18 2023-11-21 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method
CN117093762B (en) * 2023-07-18 2024-02-13 南京特尔顿信息科技有限公司 Public opinion data evaluation analysis system and method
CN117216418A (en) * 2023-11-08 2023-12-12 一网互通(北京)科技有限公司 Method and device for extracting popular phrase data in real time based on emotion and propagation force

Similar Documents

Publication Publication Date Title
CN104281607A (en) Microblog hot topic analyzing method
CN105447184B (en) Information extraction method and device
Vargiu et al. Exploiting web scraping in a collaborative filtering-based approach to web advertising.
CN102208992B (en) The malicious information filtering system of Internet and method thereof
US10269024B2 (en) Systems and methods for identifying and measuring trends in consumer content demand within vertically associated websites and related content
CN105022827B (en) A kind of Web news dynamic aggregation method of domain-oriented theme
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN102254265A (en) Rich media internet advertisement content matching and effect evaluation method
CN105718587A (en) Network content resource evaluation method and evaluation system
CN101751458A (en) Network public sentiment monitoring system and method
US20080104034A1 (en) Method For Scoring Changes to a Webpage
CN105117484A (en) Internet public opinion monitoring method and system
WO2013030133A1 (en) Search and discovery system
WO2007015990A2 (en) Techniques for analyzing and presenting information in an event-based data aggregation system
CN110705288A (en) Big data-based public opinion analysis system
CN103365924A (en) Method, device and terminal for searching information
CN103365839A (en) Recommendation search method and device for search engines
CN104778208A (en) Method and system for optimally grasping search engine SEO (search engine optimization) website data
CN111447575B (en) Short message pushing method, device, equipment and storage medium
WO2018237098A1 (en) Methods and systems for identifying markers of coordinated activity in social media movements
Dongo et al. A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis
CN107330076B (en) Network public opinion information display system and method
CN116401459A (en) Internet information processing method, system and recording medium
Zhao et al. Web information credibility: From web 1.0 to web 2.0

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150114