[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN103488683B - Microblog data management system and implementation method thereof - Google Patents

Microblog data management system and implementation method thereof Download PDF

Info

Publication number
CN103488683B
CN103488683B CN201310367762.0A CN201310367762A CN103488683B CN 103488683 B CN103488683 B CN 103488683B CN 201310367762 A CN201310367762 A CN 201310367762A CN 103488683 B CN103488683 B CN 103488683B
Authority
CN
China
Prior art keywords
user
friend
module
friends
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310367762.0A
Other languages
Chinese (zh)
Other versions
CN103488683A (en
Inventor
王静远
高飞
李超
欧阳元新
熊璋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201310367762.0A priority Critical patent/CN103488683B/en
Publication of CN103488683A publication Critical patent/CN103488683A/en
Application granted granted Critical
Publication of CN103488683B publication Critical patent/CN103488683B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a microblog data management system and an implementation method thereof, and provides a service of managing microblog data through automatic friend grouping for a microblog user. The system consists of five modules, i.e. a user authorization module, a data extraction module, a community structure finding module, a grouping analysis and exhibition module, a feedback module and a microblog data management module. The system and the method have the advantages that the problems of waste at time and labor and difficult maintenance of traditional manual microblog data management are solved; the friends of the user are intelligently grouped by a community finding technique, so the accuracy is high, the overlapped communities can be found, and the like; a result is analyzed by the method to provide the visual and easy-understanding user friend grouping basis; in addition, the system provides a feedback mechanism to further improve the reliability of the system through introducing the feedback of the user into the system.

Description

Microblog data management system and implementation method thereof
Technical Field
The invention relates to a microblog data management system based on a community discovery technology and an implementation method thereof, and belongs to the technical field of data mining.
Background
In social networks such as microblogs, users face a great deal of information every day as the number of friends of the users increases. For the users with more microblog users, a good data management method is to establish groups according to the social circles of the users in real life and manage the groups according to different groups to which friends belong. After the grouping is established, content filtering, privacy setting, and the like can be performed according to the grouping. At present, main microblog service providers such as tengchin microblogs, new wave microblogs and the like all provide the mechanism to manage data. However, the existing method is mainly performed by grouping and managing friends manually by a user. This method is too time consuming and requires a lot of manual labor by the user. When the user has new friends, the update is difficult. Meanwhile, manual management may cause malfunction.
Disclosure of Invention
The technical problem of the invention is solved: the system and the method can efficiently and accurately mine potential grouping information, and users can conveniently manage the microblog data.
The technical solution of the invention is as follows: a microblog data management system, as shown in fig. 1, includes:
a user authorization module: authorization is performed using the Oauth protocol. By utilizing the safety mechanism provided by the Oauth technology, the system can not be contacted with the private information of the user.
A data capture module: and acquiring the data of the interrelation between the friends of the user and the data of the user profile information by utilizing an API provided by the microblog. First, the friends of the user are grabbed. Then, for each friend, capturing the common friend information of the friend and the user, thereby obtaining the mutual relation among all the friends and forming a user social relation network consisting of the friend relations. The module inputs a user name of a user on a microblog and outputs a social relationship network of the user. Each node in the network represents a friend of the user, and the edges between the nodes represent the relationship between two friends of the user. The finally obtained user social relationship network is output to a database for being called by a community structure detection module;
the community structure mining module: and a graph formed by the user friend relations is obtained for the data capturing module, and potential community structures of the friends are mined out from the social relations among the friends according to a community detection technology and serve as grouping bases. One of the communities is a collection of friends, with a greater density of friend relationships among friends within the community and fewer friend relationships among friends within the community. The module uses a community detection technology and consists of two parts, namely basic community structure search and community aggregation. No parameter is required to be set by a user or any parameter is required. The input of the module is a friend relationship network obtained by the data capture module, and friend groups generated by the processing of the module are output to the group analysis display module;
a grouping analysis display module: and analyzing the user friend groups generated by the community structure discovery module. The module is used for intelligently mining the semantic information of the groups. According to the semantic information of the user friend group, the group is abstracted into four categories of celebrities, friends, classmates and colleagues. The analysis module is used for grouping each group generated by the community structure mining module, and determining the type of the group by utilizing the user information of the members in the group, the content of the microblog and the forwarding relation characteristics. And as a grouping analysis result, the display community structure mining module displays the result of the analysis module to the user.
A feedback module: and (4) grouping each user friend, setting a feedback, and collecting user evaluation. And the user scores and evaluates the effect of the system, collects user feedback information, and stores the user id, the grouping result and the user feedback as a record in a database so as to provide a basis for improving the system and improving the user experience in the future.
A microblog data management method comprises the following implementation steps:
(1) and (3) user authorization: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;
(2) data capture: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, obtaining mutual relations among all friends, and forming a user social relation network consisting of the friend relations; each node in the network represents a friend of the user, edges between the nodes represent the relationship between two friends of the user, and finally the obtained network is output to a database;
(3) excavating a community structure: for the friend relationship network obtained in the step (2), according to a community detection technology, firstly carrying out depth-first search on the network to dig out a basic community structure of the network, then carrying out hierarchical aggregation on the basic community structure, and digging out a potential community structure of the network from social relations among friends as a grouping basis, wherein one community is a set of some friends, the friends in the community have a friend relationship with a higher density, and the friends in the community have a fewer friend relationship, so that a user friend group is obtained;
(4) and (3) packet analysis and presentation: and (4) analyzing the user friend groups generated in the step (3), wherein the module is used for intelligently mining the semantic information of the groups. Abstracting the group into four categories of celebrities, stars, friends, classmates and colleagues, grouping each user and friend generated in the step (3), determining the grouping category by using the user information, microblog content and forwarding relation characteristics of members in the group, and showing the grouping category as a grouping basis to the user;
(5) and (3) carrying out feedback: and setting a feedback for each user-friendly group, and collecting user feedback information so as to provide a basis for system improvement and user experience improvement in the future.
Compared with the prior art, the invention has the advantages that:
(1) the method and the device can automatically analyze the friend relationship of the user and discover potential groups of the user, so that microblog data are managed according to the groups. The whole process does not need manual participation, helps a user to save a large amount of tedious and repeated labor, saves time and improves efficiency.
(2) The invention adopts the community detection theory and technology in the process of automatically discovering the grouping, only uses the friend relation information among users, does not use the information of user data and the like, thereby avoiding the grouping error caused by the incompleteness and timeliness of the user data.
(3) According to the invention, the grouping result is analyzed into the category which is easy to understand by the user, and the user can intuitively manage the microblog data according to the category.
Drawings
FIG. 1 is an architectural diagram of the system of the present invention;
FIG. 2 is a flow diagram of a data capture module implementation of the present invention;
FIG. 3 is a flow chart of an implementation of a community structure mining module in the present invention;
FIG. 4 is a flow chart of the implementation of the packet parsing presentation module in the present invention.
Detailed Description
As shown in fig. 1, the microblog data management system and method based on the community discovery technology of the invention is composed of a user authorization module, a data capture module, a community structure mining module, a grouping analysis presentation module and a user feedback module.
The specific implementation process of each module is as follows:
1. user authorization module
(1) The user inputs account information;
(2) and sending the account information to a microblog server for verification, if the account information passes the verification, returning to the accesstocken, completing the authorization, and obtaining the data capturing permission by using the accesstocken through a data capturing module.
2. According to the grasping module, as shown in fig. 2,
(1) and initializing a hash table H for storing the social relationship network data of the user. Acquiring an attention list of a user;
(2) for each uid of the user interest list obtained in (1), a common interest list2 is taken with the user. Initializing the hash table entry with the key uid to be a null set;
(3) adding each uid2 of the co-interest list2 obtained in the step (2) into a set corresponding to the key uid in the hash table;
(4) repeating step (3) until each item in the common interest list is processed;
(5) repeating the step (2) until each item in the user interest list is processed;
(6) and writing the data in the hash table into a database, outputting the data to a community structure mining module, and finishing data capture.
3. The community structure mining module, as shown in figure 3,
(1) initializing an empty set c, taking a hash table obtained by the data capture module, taking one table entry, and executing the steps (2) to (3);
(2) this process is a recursive depth-first traversal process. And (4) adding the key of the selected hash table item in the step (1) into the set. And sequentially taking out each uid of the selected key from the values corresponding to the selected key in the hash table, judging whether the uid is in the set, and if not, judging whether the uid exists in the hash value corresponding to each uid in the set, and judging whether each uid exists in the hash value corresponding to each uid in the set. If all exist, adding the same into the set, and then starting from the uid, continuing to execute (2);
(3) if the number of elements in the set is larger than 3 at the moment, a community structure c is found, and the result is stored in the community set S. And (4) continuing to circularly execute according to the step (1).
(4) The threshold value threshold =0.99 is set. Calculating any two community structures c for the community structures obtained in the previous three stepsi,cjThe similarity between the two is calculated by the following formula:
wherein,
where E is the set of all users in the user social relationship network, and V is the set of concern relationships among all users. Xm,nRepresenting community structure ci,cjWhether the user m in (1) is related to the annotation relation of the user n, if so, Xm,n1 is ═ 1; otherwise Xm,n=0;
(4.1) if the similarity is greater than the threshold value, merging two community structures, and after the calculation and execution of any two community structures are finished, executing the step (4.2);
(4.2) reducing the value of threshold, wherein the threshold is set to be 0.05;
(4.3) judging the value of the threshold, and if the value of the threshold is greater than 0.27, turning to the step (4.1); otherwise, executing the step (5);
(5) and outputting the obtained community structure to a grouping analysis display module.
4. The packet parsing presentation module, as shown in figure 4,
(1) for the community structure mined by each community structure mining module, storing the user resources of the members in a vector, wherein each dimension of the vector represents information;
(2) calculating the dimensionality with the maximum number of vectors with the same value in each group of vectors;
(2.1) if the most dimensionality is school and is over half, then the category is analyzed as classmate;
(2.2) if the most dimensions are working and over half, then its category resolves to colleague;
(2.3) if the most dimensionality is plus V and half, then its category is celebrity;
and (2.4) otherwise, resolving the category of the friend.
(3) And after the semantic analysis is finished, displaying the result.
5. Feedback module
(1) Obtaining user feedback data, namely scoring information (1-5 points) of a user;
(2) storing the feedback information into a database;
portions of the invention not described in detail are well within the skill of the art.

Claims (2)

1. A microblog data management system is characterized by comprising: user authorization module, data capture module, community structure mining module, grouping analysis show module and feedback module, wherein:
a user authorization module: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;
a data capture module: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, thereby obtaining mutual relations among all friends, forming a user social relation network consisting of the friend relations, and finally storing the obtained user social relation network in a database and outputting the user social relation network to a community structure mining module for calling;
the community structure mining module: for a friend relationship network obtained by the data capturing module, according to a community detection technology, a potential community structure of the friend network is mined from social relationships among friends and used as a basis for grouping the friends of the user; the adopted community detection technology consists of two parts of basic community structure search and community aggregation, and the user friend groups generated by processing are output to the group analysis display module; according to the community detection technology, firstly, a network is subjected to depth-first search to excavate a basic community structure of the network, then, hierarchy aggregation is carried out on the basic community structure, potential community structures of the network are excavated from social relations among friends as grouping bases, one community is a set of some friends, the friends in the community have friend relations with higher density, and the friends in the community have fewer friend relations, so that user friend groups are obtained; a grouping analysis display module: analyzing the user friend groups generated by the community structure discovery module according to the user friend groups, abstracting the group groups into four categories of celebrities, friends, classmates and colleagues according to semantic information of the user friend groups, grouping the user friend groups generated by the community structure mining module by group analysis, determining the grouping categories by utilizing user data of members in the group, and showing the results of the community structure mining module and the analysis module to the user by a showing module as a grouping analysis result;
a feedback module: and grouping each user friend, setting a feedback, collecting user evaluation, making a score evaluation on the effect of the system by the user, collecting user feedback information, and storing the user id, the grouping result and the user feedback as a record in a database so as to provide a basis for system improvement and user experience improvement in the future.
2. A method for managing microblog data by adopting the microblog data management system of claim 1 is characterized by comprising the following steps:
(1) and (3) user authorization: authorization is carried out by adopting an Oauth protocol to obtain a user name of the user on the microblog;
(2) data capture: acquiring the mutual relation data between friends of the user and the data of user profile information by using an API (application programming interface) provided by the microblog according to the user name of the user on the microblog, wherein the method comprises the steps of firstly grabbing the friends of the user; then, for each friend, capturing common friend information of the friend and the user, obtaining mutual relations among all friends, and forming a user social relation network consisting of the friend relations; each node in the network represents a friend of the user, edges between the nodes represent the relationship between two friends of the user, and finally the obtained network is output to a database;
(3) excavating a community structure: for the friend relationship network obtained in the step (2), according to a community detection technology, firstly carrying out depth-first search on the network to dig out a basic community structure of the network, then carrying out hierarchical aggregation on the basic community structure, and digging out a potential community structure of the network from social relations among friends as a grouping basis, wherein one community is a set of some friends, the friends in the community have a friend relationship with a higher density, and the friends in the community have a fewer friend relationship, so that a user friend group is obtained;
(4) and (3) packet analysis and presentation: analyzing the user friend groups generated in the step (3), wherein the module is used for intelligently digging out semantic information of the groups, abstracting the groups into four categories of celebrities, friends, classmates and colleagues, and determining the categories of the groups according to each user friend group generated in the step (3) by utilizing the user information, microblog content and forwarding relation characteristics of members in the group, and showing the categories to the users as the basis of the groups;
(5) and (3) carrying out feedback: and setting a feedback for each user-friendly group, and collecting user feedback information so as to provide a basis for system improvement and user experience improvement in the future.
CN201310367762.0A 2013-08-21 2013-08-21 Microblog data management system and implementation method thereof Expired - Fee Related CN103488683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310367762.0A CN103488683B (en) 2013-08-21 2013-08-21 Microblog data management system and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310367762.0A CN103488683B (en) 2013-08-21 2013-08-21 Microblog data management system and implementation method thereof

Publications (2)

Publication Number Publication Date
CN103488683A CN103488683A (en) 2014-01-01
CN103488683B true CN103488683B (en) 2017-05-10

Family

ID=49828909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310367762.0A Expired - Fee Related CN103488683B (en) 2013-08-21 2013-08-21 Microblog data management system and implementation method thereof

Country Status (1)

Country Link
CN (1) CN103488683B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104052651B (en) * 2014-06-03 2017-09-12 西安交通大学 A kind of method and apparatus for setting up social groups
CN104202319B (en) * 2014-08-28 2018-05-29 北京淘友天下科技发展有限公司 A kind of social networks recommend method and device
CN104965878B (en) * 2015-06-12 2018-11-27 微梦创科网络科技(中国)有限公司 A kind of method and device carrying out the excavation of user job unit based on grouping information
CN105262822A (en) * 2015-10-28 2016-01-20 维沃移动通信有限公司 Method and apparatus for assisting user to identify identity of friend
CN105430020A (en) * 2015-12-31 2016-03-23 南京邮电大学 Access group-based privacy protection-supporting access authorization method
CN106411572B (en) * 2016-09-06 2019-05-07 山东大学 A kind of community discovery method of combination nodal information and network structure
CN109783715A (en) * 2019-01-08 2019-05-21 鑫涌算力信息科技(上海)有限公司 Network crawler system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009092222A1 (en) * 2007-12-27 2009-07-30 Tencent Technology (Shenzhen) Company Limited A method,a client and a communication system for sharing a communication object
CN102122291A (en) * 2011-01-18 2011-07-13 浙江大学 Blog friend recommendation method based on tree log pattern analysis
CN102708176A (en) * 2012-05-08 2012-10-03 山东大学 Microblog data mining method based on active users

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009092222A1 (en) * 2007-12-27 2009-07-30 Tencent Technology (Shenzhen) Company Limited A method,a client and a communication system for sharing a communication object
CN102122291A (en) * 2011-01-18 2011-07-13 浙江大学 Blog friend recommendation method based on tree log pattern analysis
CN102708176A (en) * 2012-05-08 2012-10-03 山东大学 Microblog data mining method based on active users

Also Published As

Publication number Publication date
CN103488683A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
CN103488683B (en) Microblog data management system and implementation method thereof
US20210319329A1 (en) Method and apparatus for generating knowledge graph, method for relation mining
US9928155B2 (en) Automated anomaly detection service on heterogeneous log streams
CN102722709B (en) Method and device for identifying garbage pictures
US10824676B2 (en) Hybrid graph and relational database architecture
US11354325B2 (en) Methods and apparatus for a multi-graph search and merge engine
CN103927398A (en) Microblog hype group discovering method based on maximum frequent item set mining
WO2014107988A1 (en) Method and system for discovering and analyzing micro-blog user group structure
CN112580831B (en) Intelligent auxiliary operation and maintenance method and system for power communication network based on knowledge graph
Al-Taie et al. Online data preprocessing: A case study approach
WO2023123182A1 (en) Multi-source heterogeneous data processing method and apparatus, computer device and storage medium
CN108847957A (en) It was found that the method and system with presentation network application access information
CN114637989A (en) APT attack tracing method and system based on distributed system and storage medium
CN104516954A (en) Visualized evidence obtaining and analyzing system
BRPI1105271A2 (en) Method, and one or more computer readable media.
US11334626B1 (en) Hybrid graph and relational database architecture
Keyvanpour A survey on community detection methods based on the nature of social networks
CN109299340B (en) Microblog user forwarding relation importing and visualizing method based on graph database
CN113779261B (en) Quality evaluation method and device of knowledge graph, computer equipment and storage medium
Zheng et al. Analysis of criminal social networks with typed and directed edges
CN112765313B (en) False information detection method based on original text and comment information analysis algorithm
CN110704612B (en) Social group discovery method and device and storage medium
CN112395431B (en) Method for constructing behavior model, electronic device and electronic equipment
Chaudhary et al. Tools for social network analysis and mining
Kumar et al. Real-time analysis and visualization of online social media dynamics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170510