CN110197389A - A kind of user identification method and device - Google Patents
A kind of user identification method and device Download PDFInfo
- Publication number
- CN110197389A CN110197389A CN201910161169.8A CN201910161169A CN110197389A CN 110197389 A CN110197389 A CN 110197389A CN 201910161169 A CN201910161169 A CN 201910161169A CN 110197389 A CN110197389 A CN 110197389A
- Authority
- CN
- China
- Prior art keywords
- user
- information
- vector
- social
- propagation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 239000013598 vector Substances 0.000 claims abstract description 95
- 230000003542 behavioural effect Effects 0.000 claims abstract description 28
- 230000011273 social behavior Effects 0.000 claims abstract description 25
- 230000000694 effects Effects 0.000 claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000009191 jumping Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 2
- 238000010030 laminating Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000006399 behavior Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 7
- 230000015654 memory Effects 0.000 description 7
- 241000283690 Bos taurus Species 0.000 description 5
- 235000015278 beef Nutrition 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000000644 propagated effect Effects 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- KGRVJHAUYBGFFP-UHFFFAOYSA-N 2,2'-Methylenebis(4-methyl-6-tert-butylphenol) Chemical compound CC(C)(C)C1=CC(C)=CC(CC=2C(=C(C=C(C)C=2)C(C)(C)C)O)=C1O KGRVJHAUYBGFFP-UHFFFAOYSA-N 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000013145 classification model Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 210000003739 neck Anatomy 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000009792 diffusion process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000007363 ring formation reaction Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of user identification method and devices, which comprises obtains the social behavior information of user, wherein the social behavior information includes: user's corpus information, user social relationship information and user's operation information;Obtain target text vector corresponding with current area;According to user's corpus information and the target text vector, the text feature of user is determined;The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;The user's operation information is inputted in preset prediction model, the behavioural characteristic of user is obtained;The text feature, group's propagation characteristic and the behavioural characteristic are merged, user's recognition result is obtained.The present invention, can be in different field, and according to the social behavior information of user, integrated multidimensional degree user characteristics are identified before user carries out specific activity to whether user is related to the field relevant operation.
Description
Technical field
The present invention relates to depth learning technology field more particularly to a kind of user identification methods and device.
Background technique
As electric business ox is increasingly savage, electric business platform and brand quotient sustain a loss increasing, and existing electric business is anti-
In the technical solution of ox, the basic method being polymerize using order identifies ox, i.e. electric business platform is by detecting the same kind of goods
No a large amount of polymerizations identify ox to an identical logistics area.The existing method being polymerize by order identifies the technology of ox
Scheme suffers from the drawback that one, subsequent retardance, and electric business platform will can just polymerize problematic order after beef cattle places an order successfully
It is single and cannot be single under client when, ox attack is identified in advance, to miss the opportunity of best prevention loss;Two, sentence black
Dimension is single, and general electric business platform is due to lacking user's Figure Characteristics, so that ox order can only be identified from geographic area, it cannot
The relevant Figure Characteristics of beef cattle are portrayed from user perspective, it is single to polymerize from the same kind of goods to the knowledge of identical this dimension of logistics region
Other ox is easy normal order being mistaken for ox order, and accuracy rate is low, manslaughters rate height.
Summary of the invention
Technical problem to be solved by the present invention lies in, a kind of user identification method and device are provided, it can be in different necks
In domain, according to the social behavior information of user, integrated multidimensional degree user characteristics, before user carries out specific activity to user whether
It is related to the field relevant operation to be identified.
In order to solve the above-mentioned technical problem, on the one hand, the present invention provides a kind of user identification method, the method packets
It includes:
Obtain the social behavior information of user, wherein the social behavior information includes: user's corpus information, Yong Hushe
Hand over relation information and user's operation information;
Obtain target text vector corresponding with current area;
According to user's corpus information and the target text vector, the text feature of user is determined;
The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;
The user's operation information is inputted in preset prediction model, the behavioural characteristic of user is obtained;
The text feature, group's propagation characteristic and the behavioural characteristic are merged, user's recognition result is obtained.
On the other hand, the present invention provides a kind of customer identification device, described device includes:
User profile acquisition module, for obtaining the social behavior information of user, wherein the social behavior information packet
It includes: user's corpus information, user social relationship information and user's operation information;
Object vector obtains module, for obtaining target text vector corresponding with current area;
Text feature determining module, for determining user according to user's corpus information and the target text vector
Text feature;
Group's propagation characteristic determining module is obtained for inputting the user social relationship information in preset propagation model
To group's propagation characteristic of user;
Behavioural characteristic determining module obtains user for inputting the user's operation information in preset prediction model
Behavioural characteristic;
Fusion Features module is obtained for merging the text feature, group's propagation characteristic and the behavioural characteristic
User's recognition result.
The implementation of the embodiments of the present invention has the following beneficial effects:
The present invention obtains corresponding user characteristics by obtaining the social behavior information of user, wherein the social activity row
It include user's corpus information, user social relationship information and user's operation information for information;For current application field, obtain
Target text vector corresponding with current area determines the text of user according to user's corpus information and target text vector
Feature;The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;By the use
Family operation information inputs in preset prediction model, obtains the behavioural characteristic of user;Merge the text feature, the group propagates
Feature and the behavioural characteristic, obtain user's recognition result.The present invention can be directed to different fields, the social activity based on user
Behavioural information identifies before user carries out concrete operations to whether user is related to the field relevant operation, so that phase
Pass personnel carry out corresponding counter-measure according to recognition result;It solves the problems, such as to judge that dimension is single in the prior art, pass through
The various dimensions feature based on user social contact behavioural information is obtained, to depict user's Figure Characteristics, the accuracy rate of identification is high.
Detailed description of the invention
Fig. 1 is application scenarios schematic diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of user identification method flow chart provided in an embodiment of the present invention;
Fig. 3 is a kind of generation method flow chart of target text vector provided in an embodiment of the present invention;
Fig. 4 is the text feature calculation method flow chart of user provided in an embodiment of the present invention a kind of;
Fig. 5 is population propagation characteristic acquisition methods flow chart provided in an embodiment of the present invention;
Fig. 6 is a kind of user behavior characteristics acquisition methods flow chart provided in an embodiment of the present invention;
Fig. 7 is a kind of multimodal information fusion neural network model schematic diagram provided in an embodiment of the present invention;
Fig. 8 is the network model schematic diagram of LSTM provided in an embodiment of the present invention;
Fig. 9 is a kind of textual classification model schematic diagram based on LSTM provided in an embodiment of the present invention;
Figure 10 is a kind of customer identification device schematic diagram provided in an embodiment of the present invention;
Figure 11 is text feature determining module schematic diagram provided in an embodiment of the present invention;
Figure 12 is object vector generation module schematic diagram provided in an embodiment of the present invention;
Figure 13 is group's propagation characteristic determining module schematic diagram provided in an embodiment of the present invention;
Figure 14 is behavioural characteristic determining module schematic diagram provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, the present invention is made into one below in conjunction with attached drawing
Step ground detailed description.Obviously, described embodiment is only a part of the embodiments of the present invention, rather than whole implementation
Example.Based on the embodiments of the present invention, those of ordinary skill in the art are obtained without making creative work
Every other embodiment, shall fall within the protection scope of the present invention.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot
It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include one or more of the features.Moreover, term " first ",
" second " etc. is suitable for distinguishing similar object, without being used to describe a particular order or precedence order.It should be understood that in this way
The data used are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein can be in addition to scheming herein
Sequence other than those of showing or describe is implemented.
Relational language involved in the embodiment of the present invention is done first explained below:
Ox: illegal intermediary refers specifically to monopolize and sell other than legitimate sales approach limitation right to participate in or commodity to scheme
The intermediary of benefit.
Ox head: the organizer that ox crowdsourcing is stored goods is initiated.
Beef cattle: the ox attacker of practical execution of order purchase.
Declaration form: ox head and beef cattle cash knot declaration form, actually a cash disengaging deposit report.
RNN:Recurrent neural Network, Recognition with Recurrent Neural Network are a kind of people of node orientation connection cyclization
Artificial neural networks, its internal state can show dynamic time sequence behavior, can use its internal memory come when handling any
The list entries of sequence.
LSTM:Long Short-Term Memory is shot and long term memory network, is a kind of time recurrent neural network, is fitted
Together in processing and predicted time sequence in be spaced and postpone relatively long critical event.
Attention: also known as attention mechanism is a kind of model to be allowed to pay close attention to important integration stress and sufficiently learn
The technology of absorption.
Referring to Figure 1, it illustrates application scenarios schematic diagrams provided in an embodiment of the present invention, including several user terminals
110 and server 120, the user terminal 110 includes but is not limited to smart phone, tablet computer, laptop, desktop
Brain etc..User can log in related application APP by user terminal 110 or website carries out network of relation activity, when user passes through
When user terminal 110 sends network service request to server 120, server 120 is and same in response to the network service request
When the history social behavior information of the user is obtained according to the login account of the user, by history social behavior information
Analysis identification is carried out, the Activity recognition result of the user is finally obtained.When the Activity recognition result of the user meets network service
When request condition, then server 120 can issue corresponding business information so that user to the corresponding user terminal 110 of the user
Complete relevant network activity;When the Activity recognition result of the user is unsatisfactory for service request condition, then server 120 can be refused
Network of relation business service is provided for the corresponding user terminal 110 of the user absolutely, so that the user can not carry out relevant network
Activity.
Fig. 2 is referred to, it illustrates a kind of user identification methods, can be applied to server side, which comprises
S210. the social behavior information of user is obtained, wherein the social behavior information includes: user's corpus information, uses
Family social relationship information and user's operation information.
In embodiments of the present invention, the social behavior information for obtaining user is that operation triggering is requested based on user network,
Specifically, when sending service request with user orientation server, server obtains current use according to current usersaccount information
The social behavior information at family.
Here social behavior information includes: user's corpus information, user social relationship information and user's operation information.
Wherein, user's corpus information can specifically include: the text envelope of group's title of social group, user's chat content that user is added
The text information expression related to user such as the related article information that breath, user deliver;User social relationship information may include:
Type, the user liveness and group members series in social group of the social group that user is added, user social friend relation
Deng;User's operation information may include: user click link, user browsing page info, user click browsing mutually inside the Pass
The number etc. of appearance.The above acquired social behavior information can be used as the foundation identified to user.
S220. target text vector corresponding with current area is obtained.
For different fields, the embodiment of the present invention can provide the priori data of different field as foundation.For each
Field can collect corpus information relevant to the field, and generate corresponding text vector as subsequent reference, specifically
The generation method of target text vector can be found in Fig. 3, which comprises
S310. source corresponding with current area corpus information is obtained.
After the application field being currently related to has been determined, source corresponding with field corpus information is obtained, it is right here
The source corpus information in Mr. Yu field, which refers to, to be phase with some representative words, word or the sentence etc. in the specific characterization field
Pass personnel are obtained according to the experience accumulation of early period.
S320. the source corpus information is segmented, generates the term vector of each word in the source corpus information.
Since the source corpus information of acquisition is one whole section of word, needs to segment it, can realize to language in the prior art
Expect that the method segmented can be applied in the present embodiment, such as the storage of Trie tree and longest match principle, the participle based on HMM
Method, probability participle model etc..
After being segmented to source corpus, to the word after cutting with insertion (embedding) coding form carry out to
Amount expression, i.e. term vector, it is possible to understand that are as follows: by some word in text space, by certain method, maps or be embedded into
Another numerical value vector space.Term vector in the present embodiment can be realized by word2vec.
S330. the term vector of each word in the source corpus information is overlapped, obtains the target text vector.
Term vector corresponding to each word in step S320 is overlapped, target text corresponding with the field is obtained
This vector embeddingObject vector。
Since the source corpus information in each field here is obtained by experience accumulation, so pushing away with the time
It moves, needs to be updated source corpus information according to newest situation, to ensure that current corpus information can be comprehensively and accurate
Portray the characteristics of current area in ground.
S230. according to user's corpus information and the target text vector, the text feature of user is determined.
Fig. 4 is referred to, it illustrates the text feature calculation methods of user a kind of, which comprises
S410. user's corpus information is segmented, generate the word of each word in user's corpus information to
Amount.
User's corpus information of acquisition is segmented, and generates corresponding term vector, concrete implementation process can be found in
Step S320.
S420. the term vector for calculating each word in user's corpus information is similar to the target text vector
Degree.
Theoretically any method that can calculate two vector similarities can be using similarity in this present embodiment
It calculates, such as:
1. cosine formula can be used directly and calculate in target text vector and user's corpus in order to reduce the training of parameter
The similarity of the term vector of each word;
2. by a simple neural network, inputs as a and b, export as similarity c;
3. obtaining similarity by matrixing.
Cosine formula can be selected in the present embodiment and calculate similarity, specific formula is as follows:
By above-mentioned formula, the term vector w of each word in user's corpus is calculated separatelyiWith target text vector
embeddingObject vectorSimilarity αi。
S430. using the similarity as the weight of corresponding term vector, the text feature of the user is calculated.
Using the similarity of each term vector being calculated in step S420 and target text vector as the power of the term vector
Value, is weighted all term vectors in user's corpus, detailed process is as follows:
Wherein, n is the number of the term vector obtained according to user's corpus, embeddingUser characteristicsTo be believed according to user's corpus
Cease finally obtained user version feature.
S240. the user social relationship information is inputted in preset propagation model, obtains group's propagation characteristic of user.
Fig. 5 is referred to, it illustrates a population propagation characteristic acquisition methods, which comprises
S510. the user information with label and the user information without the label are obtained.
S520. label propagation algorithm is used, according to the user information with label and described without the label
User information the propagation model is trained.
S530. the user social contact relationship is input in the propagation model, is generated by the propagation model and institute
State the corresponding vector of user social contact relationship.
The similarity of active user and target group are judged by label propagation.Due to only a small amount of use for having label
Family is needed to be diffused out more users potentially relevant to target group using the user of these tape labels, supervised used here as half
The method that educational inspector practises allows label to propagate.
Semi-supervised learning, as the term suggests it is exactly only a small amount of labeled data, it is intended to the labeled number a small amount of from this
Useful information is arrived according to study in a large amount of unlabeled data.It is based on three big hypothesis: 1) Smoothness is smoothly false
If: similar data label having the same;2) Cluster cluster is assumed: the data under the same cluster have identical
label;3) Manifold manifold is assumed: the data under same manifold structure have identical label.
The core concept of label propagation algorithm (label propagation) is very simple: LP algorithm is based on Graph
, it is therefore desirable to first construct a figure.A figure is constructed for all data first, the node of figure is exactly a data point, packet
Data containing labeled and unlabeled.The side of node i and node j indicate their similarity, and label propagation algorithm passes through
Label is propagated on side between node, and the weight on side is bigger, indicates that two nodes are more similar, is propagated through then label is easier
It goes.It is to take that class of maximum probability as its classification when determining the classification of node.Step is divided into simple terms:
1) propagation is executed;2) label of labeled sample is reset;3) step 1) is repeated and 2) until F restrains.With labeled data
Constantly the label of oneself is blazed abroad, last class boundary can pass through high-density region, and rest on the interval of low-density
In, it is equivalent to each different classes of labeled sample and has divided the sphere of influence.
Wherein, the user social contact relationship includes social groups, the friend relation of user and the user that user participates in
Social active degree.It by user basic information, crawls out whether user participates in target group, and enlivens journey in group
Degree, wherein the active degree of group chat can judge according to group members series.Customer relationship graph is constructed by social networks,
Using label propagation algorithm, vector of the available active user in propagation model is indicated.
S250. the user's operation information is inputted in preset prediction model, obtains the behavioural characteristic of user.
Fig. 6 is referred to, it illustrates a kind of user behavior characteristics acquisition methods, which comprises
S610. in predetermined period, when detecting the click skip operation of user, the page letter after jumping is obtained
Breath.
S620. the page info is inputted into the prediction model, exports the result predicted the page info.
Here prediction result is the specific value between one 0~1, can specifically refer to that the page info of input is different
In the probability of target pages information, a settable threshold value determines final prediction result, such as in the present embodiment, threshold value
It is set as 0.5, i.e., when the probability of prediction is less than 0.5, judges the page info of input for target information;When the probability of prediction is greater than
When equal to 0.5, judge that the page info of input is not target information.
S630. it is recorded in the page info that user clicks after jumping in the predetermined period and is predicted to be target information
Number.
The comprehensive probability and number that target pages are predicted to be in predetermined period, the final behavioural characteristic letter for determining user
Breath.
Here user's style of writing feature is primarily referred to as user and whether there is the behavior of certain specific operations, clicks and jumps to user
The page info and specific objective information turned carries out similarity prediction, uses open source technology fastText in the present embodiment to make
For classifier, the sequence of one word of fastText mode input exports the probability that this word sequence belongs to a different category.
CBOW model in fastText model framework and Word2Vec is much like.The difference is that fastText prediction label, and
CBOW model prediction medium term.FastText is also added into N-gram feature.Bag of words feature in " I likes her " the words
It is " I " " love ", " she ".As these features are characterized in sentence " she likes me ".If 2-Ngram is added, first
There are also " I-love " and " like-she ", this two word " I likes her " and " she likes me " can be distinguished the feature of words.Due to
It has used vector characterization word N-gram to take into account local word order in fastText, has been more suitable for the application of the present embodiment
Scene.
S260. the text feature, group's propagation characteristic and the behavioural characteristic are merged, user is obtained and identifies knot
Fruit.
After having obtained the text feature above with respect to user, group's propagation characteristic and behavioural characteristic, need these spies
Sign is merged to obtain final recognition result.
A kind of neural network model is present embodiments provided, multi-modal information can be merged, be detailed in Fig. 7, wherein
User's Figure Characteristics in figure include the user behavior characteristics in the present embodiment, in addition to this further include the ASSOCIATE STATISTICS spy of user
Sign, such as gender, age, region, the login frequency, the brush amount behavior information of user equipment.
If above-mentioned characteristic information is only carried out linear superposition to obtain recognition result, it will lose significantly original
Information content, Fig. 7 are extracted each modal characteristics of user by deep neural network and are fused into a feature vector, and nerve net is utilized
The order of information of the multiple modal characteristics of the nonlinear feature extraction of network exports recognition result finally by output layer.
Present invention can apply to the field of the anti-ox of electric business, the mode input in Fig. 7 is the various dimensions feature of user, mould
The output of type is ox fraud point, and ox fraud point here can be used for indicating that active user is the suspicion degree of ox, point
Number is higher, then it is bigger for the suspicion of ox.
In the service of the anti-ox of electric business, ox fraud sub-service business is supplied to caller by SaaS mode, and caller is only
Relevant user information need to be provided, SaaS service can return to corresponding ox fraud point, to assess the degree that user is ox.
First by collect ox field related corpus information, obtain target text corresponding with ox field to
Amount, for carrying out the calculating of similarity with the user information got.
It by user basic information, crawls out whether user participates in ox group, for how to identify ox group, can lead to
It crosses and crawls group chat content and group's title to analyze;For other text informations of user, can equally be obtained by crawling.According to
User's corpus information of acquisition and the corpus information in ox field, determine the text feature of user.
More potential ox users are spread by the user with ox label, can specifically be believed by social networks
It ceases to carry out the diffusion of label, eventually finds out which user has been transmitted to ox label.Ox is general in social group
There is aggregation, the tightness degree for judging user Yu ox group is propagated by equipment diffusion, label
For user's operation information, be primarily referred to as here user whether the movable specific link of actual participation ox, such as
The operation of ox declaration form is executed, when user clicks related link, obtains the page info after jumping, and page info is inputted
Trained declaration form classifier, the page info inputted are not the probability of ox placard information, while user being assisted to be judged to
Break to execute the number of ox declaration form operation, so that it is determined that whether user executes the operation of ox declaration form, such as it is believed that
A possibility that in predetermined period, the number for being judged as executing the operation of ox declaration form is more, is ox are bigger.
Whether the present invention participates in ox corporations in social networks and participates in active degree from user, and whether user is in social activity
The behavior of ox declaration form and frequent degree occur in network, whether user uses the single software of brush, if it is multiple to there is number suspicion of supporting etc.
Social networks dimension judges whether user is ox beef cattle.
By the user characteristics being calculated above, other related essential characteristics input mould shown in Fig. 7 to the user of acquisition
Type finally obtains the ox fraud point of active user.
It is by by user's term vector and target text vector to the processing mode of user's corpus information in above-described embodiment
Similarity calculation is carried out, and using similarity as the weight of corresponding term vector, calculates the weighting of term vector, the text as user
Feature is input in identification model.For user's corpus information, there are also another processing modes, i.e., by LSTM model to
The corpus information at family carries out text classification.
Analysis point generally is carried out to text using RNN (Recognition with Recurrent Neural Network) model in natural language Language Processing
Class, but since chat content is generally long, RNN is difficult to compress the general information of whole section of chat content, so using being based on
Memory network (LSTM) model is trained the improved length of RNN in short-term, and LSTM controls front output by the concept of " door "
On subsequent influence, the connection between sentence word can be linked well, extracted long text general idea, improved the correct of classifier
Property, the network model of LSTM refers to Fig. 8.
Textual classification model based on LSTM can be found in Fig. 9, and the realization process for carrying out text classification to user's corpus is as follows:
The a large amount of corpus informations collected in advance are manually marked, 0 represent this section dialogue it is unrelated with ox, 1 represent this
Section dialogue is related to ox, and deep learning is handled in text classification problem, using the form of identification of term vector, for term vector
Distribution mark both reduces dimension, also embodies semantic information, and the distributed expression of most common term vector is exactly
Word2vec, is a kind of unsupervised training, and the term vector trained has dense, the characteristics of including semantic information.To acquisition
User's corpus information segments, and the term vector of each word is sent into order inside LSTM, the output of LSTM is exactly this section
The expression of words, and can include the timing information of sentence, textual classification model is recently entered, is classified to current sentence,
Judge whether it is related to ox field.
For the present invention by when with user orientation server initiating business request, server obtains the social behavior information of user,
User's various dimensions feature is obtained according to social behavior information, judges whether the user meets business eventually by feature fusion
Request condition, and when user does not meet service request condition, refusal provides a user corresponding service;Specifically, in electric business
Anti- ox field, with user orientation server transmission place an order request when, server obtain user social contact behavioural information, it is finally obtained
Ox fraud point then judges active user for ox user, to refuse the ox when ox fraud point is greater than some threshold value
Lower single service request of user, so that the ox user not can be carried out the operation that places an order.The present invention is based on social datas to capture
User participates in the behavior of ox in social networks, just differentiates whether the user is accused of ox and takes advantage of when can be advanced under user single
Swindleness is accomplished to identify that ox is attacked in advance, holds the best opportunity for taking precautions against loss.
Secondly, the present invention analyzes multiple social dimensional characteristics of user according to user social contact behavioural information, to portray
The accuracy rate of user's Figure Characteristics out, identification is high.
In addition, can generally take when handling user's corpus information and carry out the term vector of user's corpus directly
The mode of superposition is combined to express the text feature of user, but because prefers to extract and the neck in a certain specific area
The relevant information in domain, so the present invention provides a kind of improved attention mechanism, first with the object vector in the field into
Row similarity calculation, then summation is weighted to user's term vector and obtains user version feature, with this come extract more with the neck
The relevant information in domain.Different with the attention of series model to be, general attention is concerned with context letter
Breath, the similarity of each embedding is calculated by the status information of context, but disadvantage is user's language once us
Expect it is constant, regardless of being fixed and invariable in the similarity that ox field or other field are calculated.And it is mentioned based on the present invention
The method of confession can accomplish for different field, even if with portion user's corpus, similarity is also different;Based on phase of the present invention
When in the priori knowledge that joined field in ox field, so that calculated similarity focuses more on ox field.
The embodiment of the invention also provides a kind of customer identification device, referring to Figure 10, described device includes:
User profile acquisition module 1010, for obtaining the social behavior information of user, wherein the social behavior information
It include: user's corpus information, user social relationship information and user's operation information.
Object vector obtains module 1020, for obtaining target text vector corresponding with current area.
Text feature determining module 1030, for determining according to user's corpus information and the target text vector
The text feature of user.
Group's propagation characteristic determining module 1040, for the user social relationship information to be inputted preset propagation model
In, obtain group's propagation characteristic of user.
Behavioural characteristic determining module 1050 is obtained for inputting the user's operation information in preset prediction model
The behavioural characteristic of user.
Fusion Features module 1060, for merging the text feature, group's propagation characteristic and the behavioural characteristic,
Obtain user's recognition result.
Referring to Figure 11, the text feature determining module 1030 includes:
First participle module 1110 generates user's corpus information for segmenting to user's corpus information
In each word term vector.
Similarity calculation module 1120, for calculating the term vector and the mesh of each word in user's corpus information
Mark the similarity of text vector.
Text feature computing module 1130, for calculating the use using the similarity as the weight of corresponding term vector
The text feature at family.
Referring to Figure 12, described device further includes object vector generation module, and the object vector generation module includes:
Source corpus obtains module 1210, for obtaining source corresponding with current area corpus information.
Second word segmentation module 1220 generates in the source corpus information for segmenting to the source corpus information
The term vector of each word.
Vector laminating module 1230 is obtained for the term vector of each word in the source corpus information to be overlapped
The target text vector.
Referring to Figure 13, group's propagation characteristic determining module 1040 includes:
First obtains module 1310, has the user information of label and without user's letter of the label for obtaining
Breath.
Propagation model training module 1320, for using label propagation algorithm, according to the user information with label
The propagation model is trained with the user information without the label.
Vector calculation module 1330 is propagated to pass through for the user social contact relationship to be input in the propagation model
The propagation model generates vector corresponding with the user social contact relationship.
Wherein, the user social contact relationship includes social groups, the friend relation of user and the user that user participates in
Social active degree.
Referring to Figure 14, the behavioural characteristic determining module 1050 includes:
Detection module 1410 is operated, for when detecting the click skip operation of user, obtaining and jumping in predetermined period
Page info after turning.
Prediction module 1420 exports pre- to the page info for the page info to be inputted the prediction model
The result of survey.
Number logging modle 1430 clicks page info quilt after jumping for being recorded in user in the predetermined period
It is predicted as the number of target information.
Any embodiment of that present invention institute providing method can be performed in the device provided in above-described embodiment, has execution this method
Corresponding functional module and beneficial effect.The not technical detail of detailed description in the above-described embodiments, reference can be made to the present invention is any
Method provided by embodiment.
The present embodiment additionally provides a kind of computer readable storage medium, and computer is stored in the storage medium to be held
Row instruction, the computer executable instructions are loaded by processor and execute the above-mentioned any means of the present embodiment.
The present embodiment additionally provides a kind of equipment, and the equipment includes processor and memory, wherein the memory is deposited
Computer program is contained, the computer program is suitable for being loaded by the processor and executing the above-mentioned any side of the present embodiment
Method.
Present description provides the method operating procedures as described in embodiment or flow chart, but based on routine or without creation
The labour of property may include more or less operating procedure.The step of enumerating in embodiment and sequence are only numerous steps
One of execution sequence mode, does not represent and unique executes sequence.System in practice or when interrupting product and executing, can be with
It is executed according to embodiment or method shown in the drawings sequence or parallel executes (such as parallel processor or multiple threads
Environment).
Structure shown in the present embodiment, only part-structure relevant to application scheme, is not constituted to this
The restriction for the equipment that application scheme is applied thereon, specific equipment may include more or fewer components than showing,
Perhaps certain components or the arrangement with different components are combined.It is to be understood that method disclosed in the present embodiment,
Device etc., may be implemented in other ways.For example, the apparatus embodiments described above are merely exemplary, for example,
The division of the module is only a kind of division of logic function, and there may be another division manner in actual implementation, such as more
A unit or assembly can be combined or can be integrated into another system, or some features can be ignored or not executed.It is another
Point, shown or discussed mutual coupling, direct-coupling or communication connection can be through some interfaces, device or
The indirect coupling or communication connection of unit module.
Based on this understanding, technical solution of the present invention substantially in other words the part that contributes to existing technology or
The all or part of person's technical solution can be embodied in the form of software products, which is stored in one
In a storage medium, including some instructions are used so that computer equipment (it can be personal computer, server, or
Network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And storage medium above-mentioned includes:
USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random
Access Memory), the various media that can store program code such as magnetic or disk.
Those skilled in the art further appreciate that, respectively show in conjunction with what embodiment disclosed in this specification described
Example unit and algorithm steps, being implemented in combination with electronic hardware, computer software or the two, in order to clearly demonstrate
The interchangeability of hardware and software generally describes each exemplary composition and step according to function in the above description
Suddenly.These functions are implemented in hardware or software actually, the specific application and design constraint item depending on technical solution
Part.Professional technician can use different methods to achieve the described function each specific application, but this reality
Now it should not be considered as beyond the scope of the present invention.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although referring to before
Stating embodiment, invention is explained in detail, those skilled in the art should understand that: it still can be to preceding
Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these
It modifies or replaces, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of user identification method characterized by comprising
Obtain the social behavior information of user, wherein the social behavior information includes: user's corpus information, user social contact pass
It is information and user's operation information;
Obtain target text vector corresponding with current area;
According to user's corpus information and the target text vector, the text feature of user is determined;
The user social relationship information is inputted in preset propagation model, group's propagation characteristic of user is obtained;
The user's operation information is inputted in preset prediction model, the behavioural characteristic of user is obtained;
The text feature, group's propagation characteristic and the behavioural characteristic are merged, user's recognition result is obtained.
2. a kind of user identification method according to claim 1, which is characterized in that described according to user's corpus information
With target text vector, determine that the text feature of user includes:
User's corpus information is segmented, the term vector of each word in user's corpus information is generated;
Calculate the term vector of each word in user's corpus information and the similarity of the target text vector;
Using the similarity as the weight of corresponding term vector, the text feature of the user is calculated.
3. a kind of user identification method according to claim 2, which is characterized in that the generation side of the target text vector
Method includes:
Obtain source corresponding with current area corpus information;
The source corpus information is segmented, the term vector of each word in the source corpus information is generated;
The term vector of each word in the source corpus information is overlapped, the target text vector is obtained.
4. a kind of user identification method according to claim 1, which is characterized in that described to believe the user social contact relationship
Breath inputs in preset propagation model, and the group's propagation characteristic for obtaining user includes:
Obtain the user information with label and the user information without the label;
Using label propagation algorithm, according to the user information with label and the user information without the label
The propagation model is trained;
The user social contact relationship is input in the propagation model, is generated and the user social contact by the propagation model
The corresponding vector of relationship;
Wherein, the user social contact relationship includes social groups, the friend relation of user and the social activity of user that user participates in
Active degree.
5. a kind of user identification method according to claim 1, which is characterized in that described that the user's operation information is defeated
Enter in preset prediction model, the behavioural characteristic for obtaining user includes:
In predetermined period, when detecting the click skip operation of user, the page info after jumping is obtained;
The page info is inputted into the prediction model, exports the result predicted the page info;
It is recorded in the number that the page info that user clicks after jumping in the predetermined period is predicted to be target information.
6. a kind of customer identification device characterized by comprising
User profile acquisition module, for obtaining the social behavior information of user, wherein the social behavior information includes: use
Family corpus information, user social relationship information and user's operation information;
Object vector obtains module, for obtaining target text vector corresponding with current area;
Text feature determining module, for determining the text of user according to user's corpus information and the target text vector
Eigen;
Group's propagation characteristic determining module is used for inputting the user social relationship information in preset propagation model
Group's propagation characteristic at family;
Behavioural characteristic determining module obtains the row of user for inputting the user's operation information in preset prediction model
It is characterized;
Fusion Features module obtains user for merging the text feature, group's propagation characteristic and the behavioural characteristic
Recognition result.
7. a kind of customer identification device according to claim 6, which is characterized in that the text feature determining module packet
It includes:
First participle module generates each of described user's corpus information for segmenting to user's corpus information
The term vector of word;
Similarity calculation module, for calculate each word in user's corpus information term vector and the target text to
The similarity of amount;
Text feature computing module, for calculating the text of the user using the similarity as the weight of corresponding term vector
Feature.
8. a kind of customer identification device according to claim 7, which is characterized in that described device further includes that object vector is raw
At module, comprising:
Source corpus obtains module, for obtaining source corresponding with current area corpus information;
Second word segmentation module generates each word in the source corpus information for segmenting to the source corpus information
Term vector;
Vector laminating module obtains the target for the term vector of each word in the source corpus information to be overlapped
Text vector.
9. a kind of customer identification device according to claim 6, which is characterized in that group's propagation characteristic determining module packet
It includes:
First obtains module, has the user information of label and without the user information of the label for obtaining;
Propagation model training module, for use label propagation algorithm, according to the user information with label and it is described not
User information with the label is trained the propagation model;
It propagates vector calculation module and passes through the propagation for the user social contact relationship to be input in the propagation model
Model generates vector corresponding with the user social contact relationship;
Wherein, the user social contact relationship includes social groups, the friend relation of user and the social activity of user that user participates in
Active degree.
10. a kind of customer identification device according to claim 6, which is characterized in that the behavioural characteristic determining module packet
It includes:
Detection module is operated, for when detecting the click skip operation of user, obtaining after jumping in predetermined period
Page info;
Prediction module exports the result predicted the page info for the page info to be inputted the prediction model;
Number logging modle is predicted to be mesh for being recorded in the page info that user clicks after jumping in the predetermined period
Mark the number of information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910161169.8A CN110197389A (en) | 2019-03-04 | 2019-03-04 | A kind of user identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910161169.8A CN110197389A (en) | 2019-03-04 | 2019-03-04 | A kind of user identification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110197389A true CN110197389A (en) | 2019-09-03 |
Family
ID=67751725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910161169.8A Pending CN110197389A (en) | 2019-03-04 | 2019-03-04 | A kind of user identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110197389A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781407A (en) * | 2019-10-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | User label generation method and device and computer readable storage medium |
CN111368552A (en) * | 2020-02-26 | 2020-07-03 | 北京市公安局 | Network user group division method and device for specific field |
CN111737456A (en) * | 2020-05-15 | 2020-10-02 | 恩亿科(北京)数据科技有限公司 | Corpus information processing method and apparatus |
CN113111132A (en) * | 2020-01-13 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | Method and device for identifying target user |
CN113204622A (en) * | 2021-05-25 | 2021-08-03 | 广州三星通信技术研究有限公司 | Electronic device and information processing method thereof |
CN113361198A (en) * | 2021-06-09 | 2021-09-07 | 南京大学 | Public and private information mining-based crowdsourcing test report fusion method |
CN114422207A (en) * | 2021-12-30 | 2022-04-29 | 中国人民解放军战略支援部队信息工程大学 | Multi-mode-based C & C communication flow detection method and device |
CN114742569A (en) * | 2021-01-08 | 2022-07-12 | 广州视源电子科技股份有限公司 | User life stage prediction method and device, computer equipment and storage medium |
CN116127204A (en) * | 2023-04-17 | 2023-05-16 | 中国科学技术大学 | Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium |
CN118552234A (en) * | 2024-07-29 | 2024-08-27 | 烟台大学 | Mobile crowdsourcing task data offline prediction method and system based on LSTM |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136226A (en) * | 2011-11-25 | 2013-06-05 | 深圳市腾讯计算机系统有限公司 | Method and device capable of searching user |
CN103577549A (en) * | 2013-10-16 | 2014-02-12 | 复旦大学 | Crowd portrayal system and method based on microblog label |
CN106484764A (en) * | 2016-08-30 | 2017-03-08 | 江苏名通信息科技有限公司 | User's similarity calculating method based on crowd portrayal technology |
CN107330709A (en) * | 2016-04-29 | 2017-11-07 | 阿里巴巴集团控股有限公司 | Determine the method and device of destination object |
CN108932669A (en) * | 2018-06-27 | 2018-12-04 | 北京工业大学 | A kind of abnormal account detection method based on supervised analytic hierarchy process (AHP) |
-
2019
- 2019-03-04 CN CN201910161169.8A patent/CN110197389A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136226A (en) * | 2011-11-25 | 2013-06-05 | 深圳市腾讯计算机系统有限公司 | Method and device capable of searching user |
CN103577549A (en) * | 2013-10-16 | 2014-02-12 | 复旦大学 | Crowd portrayal system and method based on microblog label |
CN107330709A (en) * | 2016-04-29 | 2017-11-07 | 阿里巴巴集团控股有限公司 | Determine the method and device of destination object |
CN106484764A (en) * | 2016-08-30 | 2017-03-08 | 江苏名通信息科技有限公司 | User's similarity calculating method based on crowd portrayal technology |
CN108932669A (en) * | 2018-06-27 | 2018-12-04 | 北京工业大学 | A kind of abnormal account detection method based on supervised analytic hierarchy process (AHP) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110781407A (en) * | 2019-10-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | User label generation method and device and computer readable storage medium |
CN110781407B (en) * | 2019-10-21 | 2024-07-23 | 腾讯科技(深圳)有限公司 | User tag generation method, device and computer readable storage medium |
CN113111132A (en) * | 2020-01-13 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | Method and device for identifying target user |
CN111368552A (en) * | 2020-02-26 | 2020-07-03 | 北京市公安局 | Network user group division method and device for specific field |
CN111737456A (en) * | 2020-05-15 | 2020-10-02 | 恩亿科(北京)数据科技有限公司 | Corpus information processing method and apparatus |
CN114742569A (en) * | 2021-01-08 | 2022-07-12 | 广州视源电子科技股份有限公司 | User life stage prediction method and device, computer equipment and storage medium |
CN113204622A (en) * | 2021-05-25 | 2021-08-03 | 广州三星通信技术研究有限公司 | Electronic device and information processing method thereof |
CN113361198B (en) * | 2021-06-09 | 2023-11-03 | 南京大学 | Crowd-sourced test report fusion method based on public and private information mining |
CN113361198A (en) * | 2021-06-09 | 2021-09-07 | 南京大学 | Public and private information mining-based crowdsourcing test report fusion method |
CN114422207A (en) * | 2021-12-30 | 2022-04-29 | 中国人民解放军战略支援部队信息工程大学 | Multi-mode-based C & C communication flow detection method and device |
CN114422207B (en) * | 2021-12-30 | 2023-06-02 | 中国人民解放军战略支援部队信息工程大学 | C & C communication flow detection method and device based on multiple modes |
CN116127204A (en) * | 2023-04-17 | 2023-05-16 | 中国科学技术大学 | Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium |
CN118552234A (en) * | 2024-07-29 | 2024-08-27 | 烟台大学 | Mobile crowdsourcing task data offline prediction method and system based on LSTM |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110197389A (en) | A kind of user identification method and device | |
US11494648B2 (en) | Method and system for detecting fake news based on multi-task learning model | |
CN108717408B (en) | Sensitive word real-time monitoring method, electronic equipment, storage medium and system | |
WO2023108980A1 (en) | Information push method and device based on text adversarial sample | |
CN106030571A (en) | Dynamically modifying elements of user interface based on knowledge graph | |
CN105574067A (en) | Item recommendation device and item recommendation method | |
CN112231570B (en) | Recommendation system support attack detection method, device, equipment and storage medium | |
CN110287314B (en) | Long text reliability assessment method and system based on unsupervised clustering | |
CN107844533A (en) | A kind of intelligent Answer System and analysis method | |
CN109213843A (en) | A kind of detection method and device of rubbish text information | |
Edwards et al. | Identifying wildlife observations on twitter | |
CN106537387B (en) | Retrieval/storage image associated with event | |
Asgari-Chenaghlu et al. | TopicBERT: A cognitive approach for topic detection from multimodal post stream using BERT and memory–graph | |
Olabenjo | Applying naive bayes classification to google play apps categorization | |
CN110516210A (en) | The calculation method and device of text similarity | |
Raja et al. | Fake news detection on social networks using Machine learning techniques | |
CN115248855A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
Suhas Bharadwaj et al. | A novel multimodal hybrid classifier based cyberbullying detection for social media platform | |
CN110489552B (en) | Microblog user suicide risk detection method and device | |
Shrivastava et al. | A research on fake news detection using machine learning algorithm | |
CN113158672B (en) | Relationship analysis method and device based on news event | |
Nisha et al. | Detection and classification of cyberbullying in social media using text mining | |
CN107688594B (en) | The identifying system and method for risk case based on social information | |
Yuan et al. | Research of deceptive review detection based on target product identification and metapath feature weight calculation | |
Tarnpradab et al. | Attention based neural architecture for rumor detection with author context awareness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190903 |