CN103488683B - Microblog data management system and implementation method thereof - Google Patents
Microblog data management system and implementation method thereof Download PDFInfo
- Publication number
- CN103488683B CN103488683B CN201310367762.0A CN201310367762A CN103488683B CN 103488683 B CN103488683 B CN 103488683B CN 201310367762 A CN201310367762 A CN 201310367762A CN 103488683 B CN103488683 B CN 103488683B
- Authority
- CN
- China
- Prior art keywords
- user
- friend
- module
- friends
- community
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000013523 data management Methods 0.000 title claims abstract description 12
- 238000004458 analytical method Methods 0.000 claims abstract description 17
- 238000013475 authorization Methods 0.000 claims abstract description 13
- 238000005065 mining Methods 0.000 claims description 15
- 238000005516 engineering process Methods 0.000 claims description 11
- 238000013481 data capture Methods 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 9
- 230000006872 improvement Effects 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 5
- 238000004220 aggregation Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 238000013075 data extraction Methods 0.000 abstract 1
- 230000008713 feedback mechanism Effects 0.000 abstract 1
- 238000012423 maintenance Methods 0.000 abstract 1
- 230000000007 visual effect Effects 0.000 abstract 1
- 239000002699 waste material Substances 0.000 abstract 1
- 230000008569 process Effects 0.000 description 5
- 239000013598 vector Substances 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a microblog data management system and an implementation method thereof, and provides a service of managing microblog data through automatic friend grouping for a microblog user. The system consists of five modules, i.e. a user authorization module, a data extraction module, a community structure finding module, a grouping analysis and exhibition module, a feedback module and a microblog data management module. The system and the method have the advantages that the problems of waste at time and labor and difficult maintenance of traditional manual microblog data management are solved; the friends of the user are intelligently grouped by a community finding technique, so the accuracy is high, the overlapped communities can be found, and the like; a result is analyzed by the method to provide the visual and easy-understanding user friend grouping basis; in addition, the system provides a feedback mechanism to further improve the reliability of the system through introducing the feedback of the user into the system.
Description
Technical Field
The invention relates to a microblog data management system based on a community discovery technology and an implementation method thereof, and belongs to the technical field of data mining.
Background
In social networks such as microblogs, users face a great deal of information every day as the number of friends of the users increases. For the users with more microblog users, a good data management method is to establish groups according to the social circles of the users in real life and manage the groups according to different groups to which friends belong. After the grouping is established, content filtering, privacy setting, and the like can be performed according to the grouping. At present, main microblog service providers such as tengchin microblogs, new wave microblogs and the like all provide the mechanism to manage data. However, the existing method is mainly performed by grouping and managing friends manually by a user. This method is too time consuming and requires a lot of manual labor by the user. When the user has new friends, the update is difficult. Meanwhile, manual management may cause malfunction.
Disclosure of Invention
The technical problem of the invention is solved: the system and the method can efficiently and accurately mine potential grouping information, and users can conveniently manage the microblog data.
The technical solution of the invention is as follows: a microblog data management system, as shown in fig. 1, includes:
a user authorization module: authorization is performed using the Oauth protocol. By utilizing the safety mechanism provided by the Oauth technology, the system can not be contacted with the private information of the user.
A data capture module: and acquiring the data of the interrelation between the friends of the user and the data of the user profile information by utilizing an API provided by the microblog. First, the friends of the user are grabbed. Then, for each friend, capturing the common friend information of the friend and the user, thereby obtaining the mutual relation among all the friends and forming a user social relation network consisting of the friend relations. The module inputs a user name of a user on a microblog and outputs a social relationship network of the user. Each node in the network represents a friend of the user, and the edges between the nodes represent the relationship between two friends of the user. The finally obtained user social relationship network is output to a database for being called by a community structure detection module;
the community structure mining module: and a graph formed by the user friend relations is obtained for the data capturing module, and potential community structures of the friends are mined out from the social relations among the friends according to a community detection technology and serve as grouping bases. One of the communities is a collection of friends, with a greater density of friend relationships among friends within the community and fewer friend relationships among friends within the community. The module uses a community detection technology and consists of two parts, namely basic community structure search and community aggregation. No parameter is required to be set by a user or any parameter is required. The input of the module is a friend relationship network obtained by the data capture module, and friend groups generated by the processing of the module are output to the group analysis display module;
a grouping analysis display module: and analyzing the user friend groups generated by the community structure discovery module. The module is used for intelligently mining the semantic information of the groups. According to the semantic information of the user friend group, the group is abstracted into four categories of celebrities, friends, classmates and colleagues. The analysis module is used for grouping each group generated by the community structure mining module, and determining the type of the group by utilizing the user information of the members in the group, the content of the microblog and the forwarding relation characteristics. And as a grouping analysis result, the display community structure mining module displays the result of the analysis module to the user.
A feedback module: and (4) grouping each user friend, setting a feedback, and collecting user evaluation. And the user scores and evaluates the effect of the system, collects user feedback information, and stores the user id, the grouping result and the user feedback as a record in a database so as to provide a basis for improving the system and improving the user experience in the future.
A microblog data management method comprises the following implementation steps:
(1) and (3) user authorization: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;
(2) data capture: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, obtaining mutual relations among all friends, and forming a user social relation network consisting of the friend relations; each node in the network represents a friend of the user, edges between the nodes represent the relationship between two friends of the user, and finally the obtained network is output to a database;
(3) excavating a community structure: for the friend relationship network obtained in the step (2), according to a community detection technology, firstly carrying out depth-first search on the network to dig out a basic community structure of the network, then carrying out hierarchical aggregation on the basic community structure, and digging out a potential community structure of the network from social relations among friends as a grouping basis, wherein one community is a set of some friends, the friends in the community have a friend relationship with a higher density, and the friends in the community have a fewer friend relationship, so that a user friend group is obtained;
(4) and (3) packet analysis and presentation: and (4) analyzing the user friend groups generated in the step (3), wherein the module is used for intelligently mining the semantic information of the groups. Abstracting the group into four categories of celebrities, stars, friends, classmates and colleagues, grouping each user and friend generated in the step (3), determining the grouping category by using the user information, microblog content and forwarding relation characteristics of members in the group, and showing the grouping category as a grouping basis to the user;
(5) and (3) carrying out feedback: and setting a feedback for each user-friendly group, and collecting user feedback information so as to provide a basis for system improvement and user experience improvement in the future.
Compared with the prior art, the invention has the advantages that:
(1) the method and the device can automatically analyze the friend relationship of the user and discover potential groups of the user, so that microblog data are managed according to the groups. The whole process does not need manual participation, helps a user to save a large amount of tedious and repeated labor, saves time and improves efficiency.
(2) The invention adopts the community detection theory and technology in the process of automatically discovering the grouping, only uses the friend relation information among users, does not use the information of user data and the like, thereby avoiding the grouping error caused by the incompleteness and timeliness of the user data.
(3) According to the invention, the grouping result is analyzed into the category which is easy to understand by the user, and the user can intuitively manage the microblog data according to the category.
Drawings
FIG. 1 is an architectural diagram of the system of the present invention;
FIG. 2 is a flow diagram of a data capture module implementation of the present invention;
FIG. 3 is a flow chart of an implementation of a community structure mining module in the present invention;
FIG. 4 is a flow chart of the implementation of the packet parsing presentation module in the present invention.
Detailed Description
As shown in fig. 1, the microblog data management system and method based on the community discovery technology of the invention is composed of a user authorization module, a data capture module, a community structure mining module, a grouping analysis presentation module and a user feedback module.
The specific implementation process of each module is as follows:
1. user authorization module
(1) The user inputs account information;
(2) and sending the account information to a microblog server for verification, if the account information passes the verification, returning to the accesstocken, completing the authorization, and obtaining the data capturing permission by using the accesstocken through a data capturing module.
2. According to the grasping module, as shown in fig. 2,
(1) and initializing a hash table H for storing the social relationship network data of the user. Acquiring an attention list of a user;
(2) for each uid of the user interest list obtained in (1), a common interest list2 is taken with the user. Initializing the hash table entry with the key uid to be a null set;
(3) adding each uid2 of the co-interest list2 obtained in the step (2) into a set corresponding to the key uid in the hash table;
(4) repeating step (3) until each item in the common interest list is processed;
(5) repeating the step (2) until each item in the user interest list is processed;
(6) and writing the data in the hash table into a database, outputting the data to a community structure mining module, and finishing data capture.
3. The community structure mining module, as shown in figure 3,
(1) initializing an empty set c, taking a hash table obtained by the data capture module, taking one table entry, and executing the steps (2) to (3);
(2) this process is a recursive depth-first traversal process. And (4) adding the key of the selected hash table item in the step (1) into the set. And sequentially taking out each uid of the selected key from the values corresponding to the selected key in the hash table, judging whether the uid is in the set, and if not, judging whether the uid exists in the hash value corresponding to each uid in the set, and judging whether each uid exists in the hash value corresponding to each uid in the set. If all exist, adding the same into the set, and then starting from the uid, continuing to execute (2);
(3) if the number of elements in the set is larger than 3 at the moment, a community structure c is found, and the result is stored in the community set S. And (4) continuing to circularly execute according to the step (1).
(4) The threshold value threshold =0.99 is set. Calculating any two community structures c for the community structures obtained in the previous three stepsi,cjThe similarity between the two is calculated by the following formula:
wherein,
where E is the set of all users in the user social relationship network, and V is the set of concern relationships among all users. Xm,nRepresenting community structure ci,cjWhether the user m in (1) is related to the annotation relation of the user n, if so, Xm,n1 is ═ 1; otherwise Xm,n=0;
(4.1) if the similarity is greater than the threshold value, merging two community structures, and after the calculation and execution of any two community structures are finished, executing the step (4.2);
(4.2) reducing the value of threshold, wherein the threshold is set to be 0.05;
(4.3) judging the value of the threshold, and if the value of the threshold is greater than 0.27, turning to the step (4.1); otherwise, executing the step (5);
(5) and outputting the obtained community structure to a grouping analysis display module.
4. The packet parsing presentation module, as shown in figure 4,
(1) for the community structure mined by each community structure mining module, storing the user resources of the members in a vector, wherein each dimension of the vector represents information;
(2) calculating the dimensionality with the maximum number of vectors with the same value in each group of vectors;
(2.1) if the most dimensionality is school and is over half, then the category is analyzed as classmate;
(2.2) if the most dimensions are working and over half, then its category resolves to colleague;
(2.3) if the most dimensionality is plus V and half, then its category is celebrity;
and (2.4) otherwise, resolving the category of the friend.
(3) And after the semantic analysis is finished, displaying the result.
5. Feedback module
(1) Obtaining user feedback data, namely scoring information (1-5 points) of a user;
(2) storing the feedback information into a database;
portions of the invention not described in detail are well within the skill of the art.
Claims (2)
1. A microblog data management system is characterized by comprising: user authorization module, data capture module, community structure mining module, grouping analysis show module and feedback module, wherein:
a user authorization module: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;
a data capture module: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, thereby obtaining mutual relations among all friends, forming a user social relation network consisting of the friend relations, and finally storing the obtained user social relation network in a database and outputting the user social relation network to a community structure mining module for calling;
the community structure mining module: for a friend relationship network obtained by the data capturing module, according to a community detection technology, a potential community structure of the friend network is mined from social relationships among friends and used as a basis for grouping the friends of the user; the adopted community detection technology consists of two parts of basic community structure search and community aggregation, and the user friend groups generated by processing are output to the group analysis display module; according to the community detection technology, firstly, a network is subjected to depth-first search to excavate a basic community structure of the network, then, hierarchy aggregation is carried out on the basic community structure, potential community structures of the network are excavated from social relations among friends as grouping bases, one community is a set of some friends, the friends in the community have friend relations with higher density, and the friends in the community have fewer friend relations, so that user friend groups are obtained; a grouping analysis display module: analyzing the user friend groups generated by the community structure discovery module according to the user friend groups, abstracting the group groups into four categories of celebrities, friends, classmates and colleagues according to semantic information of the user friend groups, grouping the user friend groups generated by the community structure mining module by group analysis, determining the grouping categories by utilizing user data of members in the group, and showing the results of the community structure mining module and the analysis module to the user by a showing module as a grouping analysis result;
a feedback module: and grouping each user friend, setting a feedback, collecting user evaluation, making a score evaluation on the effect of the system by the user, collecting user feedback information, and storing the user id, the grouping result and the user feedback as a record in a database so as to provide a basis for system improvement and user experience improvement in the future.
2. A method for managing microblog data by adopting the microblog data management system of claim 1 is characterized by comprising the following steps:
(1) and (3) user authorization: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;
(2) data capture: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, obtaining mutual relations among all friends, and forming a user social relation network consisting of the friend relations; each node in the network represents a friend of the user, edges between the nodes represent the relationship between two friends of the user, and finally the obtained network is output to a database;
(3) excavating a community structure: for the friend relationship network obtained in the step (2), according to a community detection technology, firstly carrying out depth-first search on the network to dig out a basic community structure of the network, then carrying out hierarchical aggregation on the basic community structure, and digging out a potential community structure of the network from social relations among friends as a grouping basis, wherein one community is a set of some friends, the friends in the community have a friend relationship with a higher density, and the friends in the community have a fewer friend relationship, so that a user friend group is obtained;
(4) and (3) packet analysis and presentation: analyzing the user friend groups generated in the step (3), wherein the module is used for intelligently digging out semantic information of the groups, abstracting the groups into four categories of celebrities, friends, classmates and colleagues, and determining the categories of the groups according to each user friend group generated in the step (3) by utilizing the user information, microblog content and forwarding relation characteristics of members in the group, and showing the categories to the users as the basis of the groups;
(5) and (3) carrying out feedback: and setting a feedback for each user-friendly group, and collecting user feedback information so as to provide a basis for system improvement and user experience improvement in the future.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310367762.0A CN103488683B (en) | 2013-08-21 | 2013-08-21 | Microblog data management system and implementation method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310367762.0A CN103488683B (en) | 2013-08-21 | 2013-08-21 | Microblog data management system and implementation method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103488683A CN103488683A (en) | 2014-01-01 |
CN103488683B true CN103488683B (en) | 2017-05-10 |
Family
ID=49828909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310367762.0A Expired - Fee Related CN103488683B (en) | 2013-08-21 | 2013-08-21 | Microblog data management system and implementation method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103488683B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104052651B (en) * | 2014-06-03 | 2017-09-12 | 西安交通大学 | A kind of method and apparatus for setting up social groups |
CN104202319B (en) * | 2014-08-28 | 2018-05-29 | 北京淘友天下科技发展有限公司 | A kind of social networks recommend method and device |
CN104965878B (en) * | 2015-06-12 | 2018-11-27 | 微梦创科网络科技(中国)有限公司 | A kind of method and device carrying out the excavation of user job unit based on grouping information |
CN105262822A (en) * | 2015-10-28 | 2016-01-20 | 维沃移动通信有限公司 | Method and apparatus for assisting user to identify identity of friend |
CN105430020A (en) * | 2015-12-31 | 2016-03-23 | 南京邮电大学 | Access group-based privacy protection-supporting access authorization method |
CN106411572B (en) * | 2016-09-06 | 2019-05-07 | 山东大学 | A kind of community discovery method of combination nodal information and network structure |
CN109783715A (en) * | 2019-01-08 | 2019-05-21 | 鑫涌算力信息科技(上海)有限公司 | Network crawler system and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009092222A1 (en) * | 2007-12-27 | 2009-07-30 | Tencent Technology (Shenzhen) Company Limited | A method,a client and a communication system for sharing a communication object |
CN102122291A (en) * | 2011-01-18 | 2011-07-13 | 浙江大学 | Blog friend recommendation method based on tree log pattern analysis |
CN102708176A (en) * | 2012-05-08 | 2012-10-03 | 山东大学 | Microblog data mining method based on active users |
-
2013
- 2013-08-21 CN CN201310367762.0A patent/CN103488683B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009092222A1 (en) * | 2007-12-27 | 2009-07-30 | Tencent Technology (Shenzhen) Company Limited | A method,a client and a communication system for sharing a communication object |
CN102122291A (en) * | 2011-01-18 | 2011-07-13 | 浙江大学 | Blog friend recommendation method based on tree log pattern analysis |
CN102708176A (en) * | 2012-05-08 | 2012-10-03 | 山东大学 | Microblog data mining method based on active users |
Also Published As
Publication number | Publication date |
---|---|
CN103488683A (en) | 2014-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103488683B (en) | Microblog data management system and implementation method thereof | |
US20210319329A1 (en) | Method and apparatus for generating knowledge graph, method for relation mining | |
US9928155B2 (en) | Automated anomaly detection service on heterogeneous log streams | |
CN102722709B (en) | Method and device for identifying garbage pictures | |
US10824676B2 (en) | Hybrid graph and relational database architecture | |
US11354325B2 (en) | Methods and apparatus for a multi-graph search and merge engine | |
CN103927398A (en) | Microblog hype group discovering method based on maximum frequent item set mining | |
WO2014107988A1 (en) | Method and system for discovering and analyzing micro-blog user group structure | |
CN112580831B (en) | Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph | |
Al-Taie et al. | Online data preprocessing: A case study approach | |
WO2023123182A1 (en) | Multi-source heterogeneous data processing method and apparatus, computer device and storage medium | |
CN108847957A (en) | It was found that the method and system with presentation network application access information | |
CN114637989A (en) | APT attack tracing method and system based on distributed system and storage medium | |
CN104516954A (en) | Visualized evidence obtaining and analyzing system | |
BRPI1105271A2 (en) | Method, and one or more computer readable media. | |
US11334626B1 (en) | Hybrid graph and relational database architecture | |
Keyvanpour | A survey on community detection methods based on the nature of social networks | |
CN109299340B (en) | Microblog user forwarding relation importing and visualizing method based on graph database | |
CN113779261B (en) | Quality evaluation method and device of knowledge graph, computer equipment and storage medium | |
Zheng et al. | Analysis of criminal social networks with typed and directed edges | |
CN112765313B (en) | False information detection method based on original text and comment information analysis algorithm | |
CN110704612B (en) | Social group discovery method and device and storage medium | |
CN112395431B (en) | Method for constructing behavior model, electronic device and electronic equipment | |
Chaudhary et al. | Tools for social network analysis and mining | |
Kumar et al. | Real-time analysis and visualization of online social media dynamics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170510 |