CN106875278A - Social network user portrait method based on random forest - Google Patents
Social network user portrait method based on random forest Download PDFInfo
- Publication number
- CN106875278A CN106875278A CN201710038836.4A CN201710038836A CN106875278A CN 106875278 A CN106875278 A CN 106875278A CN 201710038836 A CN201710038836 A CN 201710038836A CN 106875278 A CN106875278 A CN 106875278A
- Authority
- CN
- China
- Prior art keywords
- attribute
- label
- social network
- random forest
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 7
- 238000003066 decision tree Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims abstract description 4
- 238000001514 detection method Methods 0.000 claims description 11
- 238000007689 inspection Methods 0.000 claims 1
- 238000005070 sampling Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a kind of social network user portrait method based on random forest, following steps are specifically included:Obtain the multi-source attribute data of online social network sites;The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to the data attribute COLLECTION TRAVERSALSThe approx imately-detecting of different attribute;According to the decision tree of original individual layer multi-source attribute, after the data attribute set that similarity meets threshold range is merged into generation merging attribute tags, using random forests algorithm training sample;Ballot mode is obtained, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, obtain whole label weighted values;Retain the label in predetermined threshold value, forming new tag attributes collection is used for the portrait of attribute in user social contact network.Present invention aim at Random Forest model is used, the attribute tags for user are divided, and effectively improve the problem of traditional not enough and complexity that attribute is divided based on small sample sampling.
Description
Technical field
The present invention relates to online community network technical field, more particularly to a kind of social network user based on random forest
Portrait method.
Background technology
The research of online community network is the major fields of academic research in recent years, and China has worldwide largest
Internet netizen, therefore, generate substantial amounts of data during the early stage of internet promotes stage and use at this stage.Absolutely
Most data resource is idle, it is impossible to processes well and commercial applications, huge loss is caused, while being also unfavorable for
The further development of social networks, major Internet firms put into huge financial resources and manpower to online social relationships field one after another
Carry out a series of researchs, the data resource of internet is reasonably developed and using significant.
The content of the invention
The present invention provides a kind of social network user portrait method based on random forest, it is therefore intended that use random forest
Model, the attribute tags for user are divided, and are effectively improved traditional being sampled based on small sample and are divided the not enough and multiple of attribute
The problem of miscellaneous degree.
To solve the above problems, the embodiment of the present invention provides a kind of social network user portrait side based on random forest
Method, specifically includes following steps:
Obtain the multi-source attribute data of online social network sites;
The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to different attribute
Data attribute COLLECTION TRAVERSALSThe approx imately-detecting;
According to the decision tree of original individual layer multi-source attribute, the data attribute set that similarity meets threshold range is merged into life
Into after merging attribute tags, using random forests algorithm training sample;
Ballot mode is obtained, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, obtain complete
The label weighted value in portion;
Retain the label in predetermined threshold value, forming new tag attributes collection is used for the portrait of attribute in user social contact network.
It is further comprising the steps of as a kind of implementation method:
Setting lowest detection terminates threshold value, when similarity terminates threshold value less than lowest detection, terminates the similar of the set
Degree detection.
Used as a kind of implementation method, it is 0.15 that the lowest detection terminates threshold value.
Used as a kind of implementation method, the similarity function is:
Wherein, α be similarity regulation parameter, α ∈ [0,1],ω (x) represents label similarity two kinds higher
Property function.
Used as a kind of implementation method, the α values are 0.001.
Used as a kind of implementation method, the label in the reservation predetermined threshold value, forming new tag attributes collection is used for user
The portrait step of attribute, specifically includes following steps in social networks:
Setting label mode threshold value, when the ballot mode that random forests algorithm is obtained is less than label mode, then it is assumed that should
Label is under-represented, gives up the label;
By the label after reservation according to the descending sequence of label weighted value, new tag attributes collection is formed.
Used as a kind of implementation method, the similarity threshold scope is [0.9,1].
The present invention is compared to the beneficial effect of prior art:Using Random Forest model, for the attribute mark of user
Sign and divide, effectively improve the problem of traditional not enough and complexity that attribute is divided based on small sample sampling.
Brief description of the drawings
Fig. 1 is the flow chart of the social network user portrait method based on random forest of the invention.
Specific embodiment
Below in conjunction with accompanying drawing, the technical characteristic above-mentioned and other to the present invention and advantage are clearly and completely described,
Obviously, described embodiment is only section Example of the invention, rather than whole embodiments.
As illustrated, a kind of social network user portrait method based on random forest, specifically includes following steps:
S100:The multi-source attribute data of online social network sites is obtained, data-storage system is conducted into;
S101:The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to not
With the COLLECTION TRAVERSALSThe approx imately-detecting of attribute, similarity function is:
Wherein, wherein, α be similarity regulation parameter, α ∈ [0,1],ω (x) represents higher two of label similarity
Attribute function.But α values are general very small in practice, depend on the test value of sample constantly to correct, and tied according to experiment
Fruit shows that, when α improves an order of magnitude, the feature of selection is considerably less, and the numerical value obtained when α reduces an order of magnitude is several
It is constant, therefore, α uses 0.001 in the present embodiment;
S102:Setting lowest detection terminates threshold value, when similarity terminates threshold value less than lowest detection, terminates the set
Similarity detection, wherein, it is 0.15 that lowest detection terminates threshold value;
S103:According to the decision tree of original individual layer multi-source attribute, the set that similarity meets threshold range is merged into generation
After merging attribute tags, using random forests algorithm training sample, similarity threshold scope is [0.9,1];
S104:Ballot mode is obtained, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight,
Obtain whole label weighted values;
S105:Retain the label in predetermined threshold value, forming new tag attributes collection is used for attribute in user social contact network
Draw a portrait, specific embodiment is:Setting label mode threshold value, when the ballot mode that random forests algorithm is obtained is less than label mode
During threshold value, then it is assumed that the label is under-represented, give up the label;Label after reservation is descending according to label weighted value
Sequence, forms new tag attributes collection, and the user that new tag attributes collection is used for social networks draws a portrait.
The present invention is compared to the beneficial effect of prior art:Using Random Forest model, for the attribute mark of user
Sign and divide, effectively improve the problem of traditional not enough and complexity that attribute is divided based on small sample sampling.
Particular embodiments described above, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect
Describe in detail, it will be appreciated that the foregoing is only specific embodiment of the invention, the protection being not intended to limit the present invention
Scope.Particularly point out, to those skilled in the art, it is all within the spirit and principles in the present invention, done any repair
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (7)
1. it is a kind of based on random forest social network user portrait method, it is characterised in that specifically include following steps:
Obtain the multi-source attribute data of online social network sites;
The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to the number of different attribute
Approx imately-detecting is traveled through according to attribute set;
According to the decision tree of original individual layer multi-source attribute, the data attribute set that similarity meets threshold range is merged into generation and is closed
And after attribute tags, using random forests algorithm training sample;
Acquisition ballot mode, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, obtains whole
Label weighted value;
Retain the label in predetermined threshold value, forming new tag attributes collection is used for the portrait of attribute in user social contact network.
2. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that also include
Following steps:
Setting lowest detection terminates threshold value, when similarity terminates threshold value less than lowest detection, terminates the similarity inspection of the set
Survey.
3. it is according to claim 2 based on random forest social network user portrait method, it is characterised in that it is described most
It is 0.15 that low detection terminates threshold value.
4. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that the phase
It is like degree function:
Wherein, α be similarity regulation parameter, α ∈ 0,1,ω (x) represents label similarity two attribute functions higher.
5. it is according to claim 4 based on random forest social network user portrait method, it is characterised in that the α
Value is 0.001.
6. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that the guarantor
The label in predetermined threshold value is stayed, forming new tag attributes collection is used for the portrait step of attribute in user social contact network, specific bag
Include following steps:
Setting label mode threshold value, when the ballot mode that random forests algorithm is obtained is less than label mode, then it is assumed that the label
It is under-represented, give up the label;
By the label after reservation according to the descending sequence of label weighted value, new tag attributes collection is formed.
7. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that the phase
It is [0.9,1] like degree threshold range.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710038836.4A CN106875278B (en) | 2017-01-19 | 2017-01-19 | Social network user image drawing method based on random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710038836.4A CN106875278B (en) | 2017-01-19 | 2017-01-19 | Social network user image drawing method based on random forest |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106875278A true CN106875278A (en) | 2017-06-20 |
CN106875278B CN106875278B (en) | 2020-11-03 |
Family
ID=59157771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710038836.4A Active CN106875278B (en) | 2017-01-19 | 2017-01-19 | Social network user image drawing method based on random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106875278B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596444A (en) * | 2018-04-02 | 2018-09-28 | 清华大学 | The method and device of large scale community network user sampling based on diversification strategy |
CN108876470A (en) * | 2018-06-29 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Tagging user extended method, computer equipment and storage medium |
CN109635190A (en) * | 2018-11-28 | 2019-04-16 | 四川亨通网智科技有限公司 | User characteristics method for digging based on position and behavior Conjoint Analysis |
CN109785034A (en) * | 2018-11-13 | 2019-05-21 | 北京码牛科技有限公司 | User's portrait generation method, device, electronic equipment and computer-readable medium |
CN110659921A (en) * | 2018-06-28 | 2020-01-07 | 上海传漾广告有限公司 | Method and system for analyzing correlation between network advertisement audience behaviors and audience interests |
CN112307831A (en) * | 2019-07-31 | 2021-02-02 | 广州弘度信息科技有限公司 | Violent movement detection method based on human body key point detection and tracking |
CN113076476A (en) * | 2021-04-01 | 2021-07-06 | 重庆邮电大学 | User portrait construction method of microblog heterogeneous information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678659A (en) * | 2013-12-24 | 2014-03-26 | 焦点科技股份有限公司 | E-commerce website cheat user identification method and system based on random forest algorithm |
CN105824912A (en) * | 2016-03-15 | 2016-08-03 | 平安科技(深圳)有限公司 | Personalized recommending method and device based on user portrait |
CN105868773A (en) * | 2016-03-23 | 2016-08-17 | 华南理工大学 | Hierarchical random forest based multi-tag classification method |
US20160328837A1 (en) * | 2015-05-08 | 2016-11-10 | Kla-Tencor Corporation | Method and System for Defect Classification |
-
2017
- 2017-01-19 CN CN201710038836.4A patent/CN106875278B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678659A (en) * | 2013-12-24 | 2014-03-26 | 焦点科技股份有限公司 | E-commerce website cheat user identification method and system based on random forest algorithm |
US20160328837A1 (en) * | 2015-05-08 | 2016-11-10 | Kla-Tencor Corporation | Method and System for Defect Classification |
CN105824912A (en) * | 2016-03-15 | 2016-08-03 | 平安科技(深圳)有限公司 | Personalized recommending method and device based on user portrait |
CN105868773A (en) * | 2016-03-23 | 2016-08-17 | 华南理工大学 | Hierarchical random forest based multi-tag classification method |
Non-Patent Citations (2)
Title |
---|
FENG LIU 等: "MLRF:Multi-label Classification Through Random Forest with Label-Set Partition", 《ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS(ICIC 2015)》 * |
刘勘 等: "基于随机森林分类的微博机器用户识别研究", 《北京大学学报(自然科学版)》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596444A (en) * | 2018-04-02 | 2018-09-28 | 清华大学 | The method and device of large scale community network user sampling based on diversification strategy |
CN108596444B (en) * | 2018-04-02 | 2021-06-29 | 清华大学 | Method and device for sampling large-scale social network users based on diversified strategies |
CN110659921A (en) * | 2018-06-28 | 2020-01-07 | 上海传漾广告有限公司 | Method and system for analyzing correlation between network advertisement audience behaviors and audience interests |
CN108876470A (en) * | 2018-06-29 | 2018-11-23 | 腾讯科技(深圳)有限公司 | Tagging user extended method, computer equipment and storage medium |
CN109785034A (en) * | 2018-11-13 | 2019-05-21 | 北京码牛科技有限公司 | User's portrait generation method, device, electronic equipment and computer-readable medium |
CN109635190A (en) * | 2018-11-28 | 2019-04-16 | 四川亨通网智科技有限公司 | User characteristics method for digging based on position and behavior Conjoint Analysis |
CN112307831A (en) * | 2019-07-31 | 2021-02-02 | 广州弘度信息科技有限公司 | Violent movement detection method based on human body key point detection and tracking |
CN112307831B (en) * | 2019-07-31 | 2023-04-14 | 广州弘度信息科技有限公司 | Violent movement detection method based on human body key point detection and tracking |
CN113076476A (en) * | 2021-04-01 | 2021-07-06 | 重庆邮电大学 | User portrait construction method of microblog heterogeneous information |
CN113076476B (en) * | 2021-04-01 | 2021-11-30 | 重庆邮电大学 | User portrait construction method of microblog heterogeneous information |
Also Published As
Publication number | Publication date |
---|---|
CN106875278B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106875278A (en) | Social network user portrait method based on random forest | |
CN102289522B (en) | Method of intelligently classifying texts | |
CN102521248B (en) | Network user classification method and device | |
CN106845717B (en) | Energy efficiency evaluation method based on multi-model fusion strategy | |
CN105760888B (en) | A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute | |
CN104809117B (en) | Video data aggregation processing method, paradigmatic system and video search platform | |
CN102508859A (en) | Advertisement classification method and device based on webpage characteristic | |
CN107220277A (en) | Image retrieval algorithm based on cartographical sketching | |
CN109299258A (en) | A kind of public sentiment event detecting method, device and equipment | |
CN104866524A (en) | Fine classification method for commodity images | |
CN109961095B (en) | Image labeling system and method based on unsupervised deep learning | |
CN101853470A (en) | Collaborative filtering method based on socialized label | |
CN112650923A (en) | Public opinion processing method and device for news events, storage medium and computer equipment | |
CN107944035A (en) | A kind of image recommendation method for merging visual signature and user's scoring | |
CN108737423A (en) | Fishing website based on webpage key content similarity analysis finds method and system | |
CN104090936B (en) | News recommendation method based on hypergraph sequencing | |
CN105677640A (en) | Domain concept extraction method for open texts | |
CN107480213B (en) | Community detection and user relation prediction method based on time sequence text network | |
CN103390046A (en) | Multi-scale dictionary natural scene image classification method based on latent Dirichlet model | |
CN108804516A (en) | Similar users search device, method and computer readable storage medium | |
CN112559764A (en) | Content recommendation method based on domain knowledge graph | |
CN113297429B (en) | Social network link prediction method based on neural network architecture search | |
CN104636504A (en) | Method and system for identifying sexuality of user | |
CN111723666A (en) | Signal identification method and device based on semi-supervised learning | |
CN105183748A (en) | Combined forecasting method based on content and score |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |