CN103488683B

CN103488683B - Microblog data management system and implementation method thereof

Info

Publication number: CN103488683B
Application number: CN201310367762.0A
Authority: CN
Inventors: 王静远; 高飞; 李超; 欧阳元新; 熊璋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2013-08-21
Filing date: 2013-08-21
Publication date: 2017-05-10
Anticipated expiration: 2033-08-21
Also published as: CN103488683A

Abstract

The invention relates to a microblog data management system and an implementation method thereof, and provides a service of managing microblog data through automatic friend grouping for a microblog user. The system consists of five modules, i.e. a user authorization module, a data extraction module, a community structure finding module, a grouping analysis and exhibition module, a feedback module and a microblog data management module. The system and the method have the advantages that the problems of waste at time and labor and difficult maintenance of traditional manual microblog data management are solved; the friends of the user are intelligently grouped by a community finding technique, so the accuracy is high, the overlapped communities can be found, and the like; a result is analyzed by the method to provide the visual and easy-understanding user friend grouping basis; in addition, the system provides a feedback mechanism to further improve the reliability of the system through introducing the feedback of the user into the system.

Description

Microblog data management system and implementation method thereof

Technical Field

The invention relates to a microblog data management system based on a community discovery technology and an implementation method thereof, and belongs to the technical field of data mining.

Background

In social networks such as microblogs, users face a great deal of information every day as the number of friends of the users increases. For the users with more microblog users, a good data management method is to establish groups according to the social circles of the users in real life and manage the groups according to different groups to which friends belong. After the grouping is established, content filtering, privacy setting, and the like can be performed according to the grouping. At present, main microblog service providers such as tengchin microblogs, new wave microblogs and the like all provide the mechanism to manage data. However, the existing method is mainly performed by grouping and managing friends manually by a user. This method is too time consuming and requires a lot of manual labor by the user. When the user has new friends, the update is difficult. Meanwhile, manual management may cause malfunction.

Disclosure of Invention

The technical problem of the invention is solved: the system and the method can efficiently and accurately mine potential grouping information, and users can conveniently manage the microblog data.

The technical solution of the invention is as follows: a microblog data management system, as shown in fig. 1, includes:

a user authorization module: authorization is performed using the Oauth protocol. By utilizing the safety mechanism provided by the Oauth technology, the system can not be contacted with the private information of the user.

A data capture module: and acquiring the data of the interrelation between the friends of the user and the data of the user profile information by utilizing an API provided by the microblog. First, the friends of the user are grabbed. Then, for each friend, capturing the common friend information of the friend and the user, thereby obtaining the mutual relation among all the friends and forming a user social relation network consisting of the friend relations. The module inputs a user name of a user on a microblog and outputs a social relationship network of the user. Each node in the network represents a friend of the user, and the edges between the nodes represent the relationship between two friends of the user. The finally obtained user social relationship network is output to a database for being called by a community structure detection module;

the community structure mining module: and a graph formed by the user friend relations is obtained for the data capturing module, and potential community structures of the friends are mined out from the social relations among the friends according to a community detection technology and serve as grouping bases. One of the communities is a collection of friends, with a greater density of friend relationships among friends within the community and fewer friend relationships among friends within the community. The module uses a community detection technology and consists of two parts, namely basic community structure search and community aggregation. No parameter is required to be set by a user or any parameter is required. The input of the module is a friend relationship network obtained by the data capture module, and friend groups generated by the processing of the module are output to the group analysis display module;

a grouping analysis display module: and analyzing the user friend groups generated by the community structure discovery module. The module is used for intelligently mining the semantic information of the groups. According to the semantic information of the user friend group, the group is abstracted into four categories of celebrities, friends, classmates and colleagues. The analysis module is used for grouping each group generated by the community structure mining module, and determining the type of the group by utilizing the user information of the members in the group, the content of the microblog and the forwarding relation characteristics. And as a grouping analysis result, the display community structure mining module displays the result of the analysis module to the user.

A feedback module: and (4) grouping each user friend, setting a feedback, and collecting user evaluation. And the user scores and evaluates the effect of the system, collects user feedback information, and stores the user id, the grouping result and the user feedback as a record in a database so as to provide a basis for improving the system and improving the user experience in the future.

A microblog data management method comprises the following implementation steps:

(1) and (3) user authorization: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;

(2) data capture: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, obtaining mutual relations among all friends, and forming a user social relation network consisting of the friend relations; each node in the network represents a friend of the user, edges between the nodes represent the relationship between two friends of the user, and finally the obtained network is output to a database;

(3) excavating a community structure: for the friend relationship network obtained in the step (2), according to a community detection technology, firstly carrying out depth-first search on the network to dig out a basic community structure of the network, then carrying out hierarchical aggregation on the basic community structure, and digging out a potential community structure of the network from social relations among friends as a grouping basis, wherein one community is a set of some friends, the friends in the community have a friend relationship with a higher density, and the friends in the community have a fewer friend relationship, so that a user friend group is obtained;

(4) and (3) packet analysis and presentation: and (4) analyzing the user friend groups generated in the step (3), wherein the module is used for intelligently mining the semantic information of the groups. Abstracting the group into four categories of celebrities, stars, friends, classmates and colleagues, grouping each user and friend generated in the step (3), determining the grouping category by using the user information, microblog content and forwarding relation characteristics of members in the group, and showing the grouping category as a grouping basis to the user;

(5) and (3) carrying out feedback: and setting a feedback for each user-friendly group, and collecting user feedback information so as to provide a basis for system improvement and user experience improvement in the future.

Compared with the prior art, the invention has the advantages that:

(1) the method and the device can automatically analyze the friend relationship of the user and discover potential groups of the user, so that microblog data are managed according to the groups. The whole process does not need manual participation, helps a user to save a large amount of tedious and repeated labor, saves time and improves efficiency.

(2) The invention adopts the community detection theory and technology in the process of automatically discovering the grouping, only uses the friend relation information among users, does not use the information of user data and the like, thereby avoiding the grouping error caused by the incompleteness and timeliness of the user data.

(3) According to the invention, the grouping result is analyzed into the category which is easy to understand by the user, and the user can intuitively manage the microblog data according to the category.

Drawings

FIG. 1 is an architectural diagram of the system of the present invention;

FIG. 2 is a flow diagram of a data capture module implementation of the present invention;

FIG. 3 is a flow chart of an implementation of a community structure mining module in the present invention;

FIG. 4 is a flow chart of the implementation of the packet parsing presentation module in the present invention.

Detailed Description

As shown in fig. 1, the microblog data management system and method based on the community discovery technology of the invention is composed of a user authorization module, a data capture module, a community structure mining module, a grouping analysis presentation module and a user feedback module.

The specific implementation process of each module is as follows:

1. user authorization module

(1) The user inputs account information;

(2) and sending the account information to a microblog server for verification, if the account information passes the verification, returning to the accesstocken, completing the authorization, and obtaining the data capturing permission by using the accesstocken through a data capturing module.

2. According to the grasping module, as shown in fig. 2,

(1) and initializing a hash table H for storing the social relationship network data of the user. Acquiring an attention list of a user;

(2) for each uid of the user interest list obtained in (1), a common interest list2 is taken with the user. Initializing the hash table entry with the key uid to be a null set;

(3) adding each uid2 of the co-interest list2 obtained in the step (2) into a set corresponding to the key uid in the hash table;

(4) repeating step (3) until each item in the common interest list is processed;

(5) repeating the step (2) until each item in the user interest list is processed;

(6) and writing the data in the hash table into a database, outputting the data to a community structure mining module, and finishing data capture.

3. The community structure mining module, as shown in figure 3,

(1) initializing an empty set c, taking a hash table obtained by the data capture module, taking one table entry, and executing the steps (2) to (3);

(2) this process is a recursive depth-first traversal process. And (4) adding the key of the selected hash table item in the step (1) into the set. And sequentially taking out each uid of the selected key from the values corresponding to the selected key in the hash table, judging whether the uid is in the set, and if not, judging whether the uid exists in the hash value corresponding to each uid in the set, and judging whether each uid exists in the hash value corresponding to each uid in the set. If all exist, adding the same into the set, and then starting from the uid, continuing to execute (2);

(3) if the number of elements in the set is larger than 3 at the moment, a community structure c is found, and the result is stored in the community set S. And (4) continuing to circularly execute according to the step (1).

(4) The threshold value threshold =0.99 is set. Calculating any two community structures c for the community structures obtained in the previous three steps_i,c_jThe similarity between the two is calculated by the following formula:

wherein,

where E is the set of all users in the user social relationship network, and V is the set of concern relationships among all users. X_m,nRepresenting community structure c_i,c_jWhether the user m in (1) is related to the annotation relation of the user n, if so, X_m,n1 is ═ 1; otherwise X_m,n＝0；

(4.1) if the similarity is greater than the threshold value, merging two community structures, and after the calculation and execution of any two community structures are finished, executing the step (4.2);

(4.2) reducing the value of threshold, wherein the threshold is set to be 0.05;

(4.3) judging the value of the threshold, and if the value of the threshold is greater than 0.27, turning to the step (4.1); otherwise, executing the step (5);

(5) and outputting the obtained community structure to a grouping analysis display module.

4. The packet parsing presentation module, as shown in figure 4,

(1) for the community structure mined by each community structure mining module, storing the user resources of the members in a vector, wherein each dimension of the vector represents information;

(2) calculating the dimensionality with the maximum number of vectors with the same value in each group of vectors;

(2.1) if the most dimensionality is school and is over half, then the category is analyzed as classmate;

(2.2) if the most dimensions are working and over half, then its category resolves to colleague;

(2.3) if the most dimensionality is plus V and half, then its category is celebrity;

and (2.4) otherwise, resolving the category of the friend.

(3) And after the semantic analysis is finished, displaying the result.

5. Feedback module

(1) Obtaining user feedback data, namely scoring information (1-5 points) of a user;

(2) storing the feedback information into a database;

portions of the invention not described in detail are well within the skill of the art.

Claims

1. A microblog data management system is characterized by comprising: user authorization module, data capture module, community structure mining module, grouping analysis show module and feedback module, wherein:

a user authorization module: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;

a data capture module: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, thereby obtaining mutual relations among all friends, forming a user social relation network consisting of the friend relations, and finally storing the obtained user social relation network in a database and outputting the user social relation network to a community structure mining module for calling;

the community structure mining module: for a friend relationship network obtained by the data capturing module, according to a community detection technology, a potential community structure of the friend network is mined from social relationships among friends and used as a basis for grouping the friends of the user; the adopted community detection technology consists of two parts of basic community structure search and community aggregation, and the user friend groups generated by processing are output to the group analysis display module; according to the community detection technology, firstly, a network is subjected to depth-first search to excavate a basic community structure of the network, then, hierarchy aggregation is carried out on the basic community structure, potential community structures of the network are excavated from social relations among friends as grouping bases, one community is a set of some friends, the friends in the community have friend relations with higher density, and the friends in the community have fewer friend relations, so that user friend groups are obtained; a grouping analysis display module: analyzing the user friend groups generated by the community structure discovery module according to the user friend groups, abstracting the group groups into four categories of celebrities, friends, classmates and colleagues according to semantic information of the user friend groups, grouping the user friend groups generated by the community structure mining module by group analysis, determining the grouping categories by utilizing user data of members in the group, and showing the results of the community structure mining module and the analysis module to the user by a showing module as a grouping analysis result;

a feedback module: and grouping each user friend, setting a feedback, collecting user evaluation, making a score evaluation on the effect of the system by the user, collecting user feedback information, and storing the user id, the grouping result and the user feedback as a record in a database so as to provide a basis for system improvement and user experience improvement in the future.

2. A method for managing microblog data by adopting the microblog data management system of claim 1 is characterized by comprising the following steps:

(4) and (3) packet analysis and presentation: analyzing the user friend groups generated in the step (3), wherein the module is used for intelligently digging out semantic information of the groups, abstracting the groups into four categories of celebrities, friends, classmates and colleagues, and determining the categories of the groups according to each user friend group generated in the step (3) by utilizing the user information, microblog content and forwarding relation characteristics of members in the group, and showing the categories to the users as the basis of the groups;