CN110046981B - Credit evaluation method, device and storage medium - Google Patents
Credit evaluation method, device and storage medium Download PDFInfo
- Publication number
- CN110046981B CN110046981B CN201810036839.9A CN201810036839A CN110046981B CN 110046981 B CN110046981 B CN 110046981B CN 201810036839 A CN201810036839 A CN 201810036839A CN 110046981 B CN110046981 B CN 110046981B
- Authority
- CN
- China
- Prior art keywords
- information
- vector
- user
- category information
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 119
- 239000013598 vector Substances 0.000 claims abstract description 322
- 238000012545 processing Methods 0.000 claims abstract description 88
- 238000013507 mapping Methods 0.000 claims abstract description 48
- 238000012546 transfer Methods 0.000 claims description 78
- 238000012549 training Methods 0.000 claims description 33
- 238000000034 method Methods 0.000 claims description 32
- 238000009826 distribution Methods 0.000 claims description 22
- 230000015572 biosynthetic process Effects 0.000 claims description 12
- 238000003786 synthesis reaction Methods 0.000 claims description 12
- 230000009977 dual effect Effects 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 abstract description 12
- 230000007547 defect Effects 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 162
- 230000006399 behavior Effects 0.000 description 109
- 238000000586 desensitisation Methods 0.000 description 17
- 238000004364 calculation method Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 238000010606 normalization Methods 0.000 description 13
- 239000002131 composite material Substances 0.000 description 10
- 239000000284 extract Substances 0.000 description 10
- 238000004891 communication Methods 0.000 description 5
- 238000007667 floating Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 238000010130 dispersion processing Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011284 combination treatment Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention discloses a credit evaluation method, a device and a storage medium, wherein heterogeneous data related to a user is obtained, and the heterogeneous data is classified to obtain a plurality of category information; obtaining vector information corresponding to each category information in the plurality of category information, and obtaining a feature space corresponding to each category information according to the vector information; mapping the feature space corresponding to each category information into a kernel space respectively to obtain a plurality of kernel spaces; performing multi-core linear combination processing on the plurality of core spaces to obtain a synthetic core space; and acquiring a credit evaluation result of the user according to the synthetic kernel space. According to the scheme, the credit of the user can be accurately evaluated according to the plurality of categories of information obtained by classifying the heterogeneous data of the user, so that the defects that the information of the user is uniformly processed, the credit of the user is evaluated according to a characteristic matrix formed by directly splicing all characteristics and the like are effectively overcome, and the evaluation result is more reliable.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a credit assessment method, a credit assessment device and a storage medium.
Background
With the rapid development of economy and the increasing expansion of credit loan scale, credit is the focus of attention. And because the risk of the credit loan is gradually increased, the assessment of the credit of the user has great significance for effectively identifying the credit risk, avoiding adverse effects such as financial crisis and the like, maintaining the normal operation of the credit loan and the financial market, and even maintaining the continuous and stable growth of national economy.
At present, in the process of evaluating the credit of a user, different types of information such as transaction history data, academic information, property quantity and the like of the user are generally mined from the user, then the information is processed uniformly, namely all types of information are trained through the same model, unified structuring is carried out, characteristics are extracted, the obtained characteristics are directly spliced to form a characteristic matrix, and finally the credit score of the user is calculated according to the characteristic matrix.
Because the information of the user is directly processed in a unified way, when all types of information are structured in a unified way, the problems of information loss, information error and the like are easy to occur; moreover, training all types of information through the same model possibly causes the problems of error accumulation, error cancellation and the like, so that the credit score obtained by calculation is very inaccurate according to the feature matrix directly spliced by the features.
Disclosure of Invention
The embodiment of the invention provides a credit assessment method, a device and a storage medium, aiming at improving the accuracy of credit assessment.
In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:
a credit assessment method, comprising:
acquiring heterogeneous data related to a user, and classifying the heterogeneous data to obtain a plurality of category information;
obtaining vector information corresponding to each category information in the plurality of category information, and obtaining a feature space corresponding to each category information according to the vector information;
mapping the feature space corresponding to each category information into a kernel space respectively to obtain a plurality of kernel spaces;
performing multi-core linear combination processing on the plurality of core spaces to obtain a synthetic core space;
and acquiring a credit evaluation result of the user according to the synthetic kernel space.
A credit evaluation device, comprising:
the information acquisition unit is used for acquiring heterogeneous data related to a user and classifying the heterogeneous data to obtain a plurality of category information;
the characteristic obtaining unit is used for obtaining vector information corresponding to each category information in the plurality of category information and obtaining a characteristic space corresponding to each category information according to the vector information;
the first mapping unit is used for respectively mapping the feature space corresponding to each category information into a kernel space to obtain a plurality of kernel spaces;
the synthesis unit is used for carrying out multi-core linear combination processing on the plurality of core spaces to obtain a synthesis core space;
and the evaluation unit is used for acquiring a credit evaluation result of the user according to the synthetic kernel space.
Optionally, the feature obtaining unit includes:
an extraction subunit, configured to extract candidate category information from the plurality of category information;
the acquisition subunit is used for constructing a directed weighted network corresponding to each candidate category information and acquiring a node vector of the directed weighted network;
and the first generation subunit is used for generating a feature space corresponding to the category information according to the node vector.
Optionally, the candidate category information includes transaction information, and the obtaining subunit includes:
the record acquisition module is used for acquiring the transfer record and the mobile payment record of the transaction information;
the construction module is used for constructing a directed authorized network corresponding to the transfer record;
a node vector acquisition module for acquiring the node vector of the directed weighted network;
the characteristic vector acquisition module is used for acquiring the characteristic vector of the mobile payment record;
the first generating subunit is specifically configured to: and generating a feature space corresponding to the transaction information according to the node vector and the feature vector.
Optionally, the node vector obtaining module includes:
the first calculation submodule is used for calculating the estimated connection probability and the empirical connection probability between every two nodes in the directed weighted network;
the second calculation submodule is used for calculating the distribution difference between the estimated connection probability and the empirical connection probability to obtain a first objective function;
the third computation submodule is used for computing context pre-estimated probability and context empirical probability between every two nodes in the directed weighted network;
the fourth calculation submodule is used for calculating the distribution difference between the context pre-estimation probability and the context experience probability to obtain a second objective function;
and the obtaining submodule is used for obtaining the node vector of the directed weighted network according to the first objective function and the second objective function.
Optionally, the obtaining sub-module is specifically configured to:
optimizing the first objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the first objective function;
optimizing the second objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the second objective function;
and splicing the node low-dimensional vector under the first objective function and the node low-dimensional vector under the second objective function to obtain the node vector of the directed weighted network.
Optionally, the feature vector obtaining module includes:
the characteristic obtaining submodule is used for obtaining the multidimensional payment characteristics of the mobile payment record;
the coding submodule is used for coding the multi-dimensional payment characteristics to obtain payment coding information;
and the generating submodule is used for generating the characteristic vector of the payment record according to the payment coding information.
Optionally, the encoding submodule is specifically configured to:
converting non-numeric type payment features in the multi-dimensional payment features into numeric type payment features;
and carrying out discretization processing on the converted numerical type payment characteristics and the numerical type payment characteristics in the multi-dimensional payment characteristics to obtain payment coding information.
Optionally, the candidate category information includes behavior information, and the obtaining subunit is specifically configured to:
acquiring multidimensional behavior characteristics of the behavior information;
constructing a directed weighted network corresponding to each dimension of behavior characteristics;
acquiring a node vector of each directed weighted network;
the first generating subunit is specifically configured to: and generating a feature space corresponding to the behavior information according to the node vector of each directed weighted network.
Optionally, the category information includes attribute information, and the feature obtaining unit includes:
the characteristic obtaining subunit is used for obtaining the multidimensional attribute characteristics of the attribute information in the plurality of types of information;
the coding subunit is used for coding the multi-dimensional attribute characteristics to obtain attribute coding information;
and the second generating subunit is used for generating a feature space corresponding to the attribute information according to the attribute coding information.
Optionally, the encoding subunit is specifically configured to:
converting non-numerical type attribute features in the multi-dimensional attribute features into numerical type attribute features;
and generating attribute coding information according to the numerical value type attribute characteristics obtained by conversion and the numerical value type attribute characteristics in the multi-dimensional attribute characteristics.
Optionally, the synthesis unit is specifically configured to:
normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces;
acquiring a weight value corresponding to each category information in the plurality of category information;
and performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
Optionally, the evaluation unit is specifically configured to:
and calculating the credit score of the user according to the synthetic kernel space through a preset regression model.
Optionally, the credit evaluation device further comprises:
the information set acquisition unit is used for acquiring a training sample set and dividing the training sample set into a plurality of category information sets;
a second mapping unit, configured to map the plurality of category information sets to a kernel space;
the target function generating unit is used for generating a target function according to the kernel space and a preset regression function;
and the model generating unit is used for processing the target function through a Lagrangian dual algorithm to generate a regression model.
A storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor for performing the steps of the above-described credit assessment method.
The embodiment of the invention can classify heterogeneous data related to users to be divided into a plurality of categories of information; then, acquiring a feature space corresponding to each category information, mapping the feature space corresponding to each category information into a kernel space respectively, and performing multi-kernel linear combination processing on the obtained multiple kernel spaces to obtain a synthesized kernel space; therefore, the credit evaluation result of the user can be obtained according to the synthetic kernel space. According to the scheme, the credit of the user can be accurately evaluated according to the plurality of categories of information obtained by classifying the heterogeneous data of the user, so that the defects that the information of the user is uniformly processed, the credit of the user is evaluated according to a characteristic matrix formed by directly splicing all characteristics and the like are effectively overcome, and the evaluation result is more reliable.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a scenario of a credit evaluation system according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a credit evaluation method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a directed authorized network for constructing transfer records as provided by embodiments of the present invention;
FIG. 4 is a schematic flow chart of a credit evaluation method according to an embodiment of the present invention;
FIG. 5 is another flowchart illustrating a credit evaluation method according to an embodiment of the invention
FIG. 6 is a schematic structural diagram of a feature space of transaction information provided by an embodiment of the invention;
FIG. 7 is a schematic structural diagram of a feature space of behavior information provided by an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a feature space of attribute information provided in an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a credit evaluation device according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of another exemplary embodiment of a credit evaluation device;
FIG. 11 is a schematic diagram of another embodiment of a credit evaluation device;
FIG. 12 is a schematic diagram of another embodiment of a credit evaluation device;
fig. 13 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description that follows, specific embodiments of the present invention are described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to herein, for a number of times, as being performed by a computer, the computer performing operations involving a processing unit of the computer executing electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the invention have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, but on the contrary, it is to be understood that various steps and operations described hereinafter may be implemented in hardware.
The embodiment of the invention provides a credit evaluation method, a credit evaluation device and a storage medium.
Referring to fig. 1, fig. 1 is a schematic view of a scenario of a credit evaluation system according to an embodiment of the present invention, where the credit evaluation system may include a credit evaluation device, and the credit evaluation device may be specifically integrated in a server and is mainly used to obtain heterogeneous data of a user, where the heterogeneous data may be heterogeneous data reported by a receiving terminal of the server in real time or at preset time intervals, and the heterogeneous data may also be stored in a database; the heterogeneous data may include information such as a transfer record of the user, a mobile payment record of the user, a frequency of approval of the user, a frequency of messaging of the user, a frequency of communication of the user, a gender of the user, an age of the user, a city of residence of the user, a transfer record of friends of the user, a frequency of communication of friends of the user, and credits of friends of the user. Then, classifying the heterogeneous data to obtain a plurality of category information, for example, obtaining category information a, category information B, category information C, and the like; secondly, vector information corresponding to each category information in the plurality of category information is obtained, and a feature space corresponding to each category information is obtained according to the vector information, for example, a feature space A, a feature space B, a feature space C and the like can be obtained; then, mapping the feature space corresponding to each category information into a kernel space respectively to obtain a plurality of kernel spaces, for example, a kernel space a, a kernel space B, a kernel space C, and the like; thirdly, performing multi-core linear combination processing on the plurality of core spaces to obtain a synthesized core space, and finally acquiring a credit evaluation result of the user according to the synthesized core space, wherein the credit evaluation result can be credit score or credit rating and the like; and so on.
In addition, the credit evaluation system can further comprise a terminal, the terminal can comprise a tablet computer, a mobile phone, a notebook computer, a desktop computer and other terminals which are provided with storage units and are provided with microprocessors and have computing capability, the terminal is mainly used for reporting the heterogeneous data of the user to a server, and the server can store the received heterogeneous data in a database.
It should be noted that the scenario diagram of the credit evaluation system shown in fig. 1 is merely an example, and the credit evaluation system and the scenario described in the embodiment of the present invention are for more clearly illustrating the technical solution of the embodiment of the present invention, and do not form a limitation on the technical solution provided in the embodiment of the present invention.
The following are detailed below.
In this embodiment, a description will be made from the perspective of a credit evaluation apparatus, which may be specifically integrated in a network device such as a server or a gateway.
A credit assessment method, comprising: acquiring heterogeneous data related to a user, and classifying the heterogeneous data to obtain a plurality of category information; acquiring vector information corresponding to each category information in the plurality of category information, and acquiring a feature space corresponding to each category information according to the vector information; respectively mapping the feature space corresponding to each category information into a kernel space to obtain a plurality of kernel spaces; performing multi-core linear combination processing on the plurality of core spaces to obtain a synthetic core space; and acquiring a credit evaluation result of the user according to the synthetic kernel space.
Referring to fig. 2, fig. 2 is a flowchart illustrating a credit evaluation method according to an embodiment of the invention. The credit evaluation processing method may include:
in step S101, heterogeneous data related to the user is acquired, and the heterogeneous data is classified to obtain a plurality of pieces of category information.
The user may be an individual or an enterprise. When the user is a person, the heterogeneous data may include the gender of the user, the account identification of the user, the age of the user, the city of the user, the frequency of comments of the user, the frequency of messages sent by the user, the frequency of calls made by the user, the account transfer records of the user, the mobile payment records of the user, the gender of friends of the user, the frequency of calls made by friends of the user, the credit of friends of the user, the age of friends of the user, the account transfer records of friends of the user, and the like. When the user is a business, the heterogeneous data may include business years, places of business, revenue records, transfer records, mobile payment records, and the like. The following description will be made in detail taking an example in which the user is a person.
Optionally, in an embodiment, the credit evaluation device may obtain heterogeneous data of the user from the internet through a crawler technology; in another embodiment, the credit evaluation device may obtain the heterogeneous data of the user through an Application Programming Interface (API) opened by a social platform (e.g., WeChat, microblog, QQ, etc.).
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on the heterogeneous data of the user, for example, the account id of the user may be subjected to hash processing to obtain a long string of characters to represent the account id of the user. Therefore, the heterogeneous data of the user counted by the embodiment of the invention is the information after the processes of transcoding, desensitization and the like, so that the purpose of protecting the privacy of the user is achieved.
After the heterogeneous data of the user is obtained, the heterogeneous data of the user may be classified to obtain a plurality of category information, where different category information may be obtained by classifying according to the attribute characteristics, the usage, the source of the generation, the obtaining manner, or the platform where the heterogeneous data is located, and the specific classification manner is not limited here. For example, the heterogeneous data of the user may be divided into a plurality of different categories of information, such as transaction information, behavior information, and attribute information. The transaction information may include transfer records and mobile payment records of the user, transfer records and mobile payment records of friends of the user, and the like, the behavior information may include frequency of approval of the user, frequency of published comments of the user, frequency of messaging of the user, frequency of voice calls of the user, frequency of video calls of the user, frequency of messaging of friends of the user, frequency of voice calls of friends of the user, and the like, and the attribute information may include gender of the user, hometown of the user, age of the user, city of residence of the user, gender of friends of the user, city of residence of friends of the user, age of friends of the user, and the like. It can be understood that one category information is one type of heterogeneous data, and multiple category information may respectively correspond to multiple different types of heterogeneous data.
Note that, for the missing heterogeneous data, the information of the similar user may be used to perform completion. For example, when the gender of the user a is missing, the similarity between the users can be measured by the euclidean distance, a similar user most similar to the user a is determined, and then the gender of the similar user is used as the gender of the user a.
In step S102, vector information corresponding to each of the plurality of pieces of category information is acquired, and a feature space corresponding to each of the pieces of category information is acquired according to the vector information.
After the heterogeneous data of the user is divided into a plurality of category information, the credit evaluation device may process each category information to obtain a feature space corresponding to each category information, where the feature space may be a vector space composed of vector information of the category information, and the vector information may include a feature vector, a node vector, and the like.
Different types of information have heterogeneous characteristics, so that different algorithms can be adopted to obtain the feature space corresponding to each type of information.
In some embodiments, the step of obtaining vector information corresponding to each category information in the plurality of category information, and obtaining the feature space corresponding to each category information according to the vector information may include:
(1) extracting candidate category information from the plurality of category information;
(2) constructing a directed weighted network corresponding to each candidate category information, and acquiring a node vector of the directed weighted network;
(3) and generating a feature space corresponding to the category information according to the node vector.
Specifically, in order to improve the efficiency of processing the plurality of category information, the credit evaluation device first extracts candidate category information from the plurality of category information, wherein the candidate category information may include one or more category information, and the candidate category information may calculate a feature space thereof through a directed weighted network. For example, if the category information a, the category information B, the category information C, the category information D, and the category information E exist among the category information a, the category information B, and the category information C, and the feature space thereof can be calculated by constructing a directional weighted network, the category information a, the category information B, and the category information C can be extracted as candidate category information.
Then, the credit evaluation device constructs a directed weighted network corresponding to each candidate category information, for example, constructs a directed weighted network G corresponding to the category information AAComprises the following steps: gA=(VA,EA,WA) Wherein V isAAs a directed weighted network GAEach node may represent a user; eAAs a directed weighted network GAEach edge may represent perspective information a existing between two users; wACan represent edge EAThe weight of (c).
Secondly, after obtaining the directed weighted Network corresponding to each candidate category Information, the credit evaluation device may calculate the node vector of the directed weighted Network based on a First-order approximation (First-order approximation) and a Second-order approximation (Second-order approximation) by using a Large-scale Information Network Embedding algorithm (LINE), that is, the Network node of the directed weighted Network is characterized as a low-dimensional vector. In the directed weighted network, the similarity of two connected nodes is high (namely, the first-order similarity is high), the similarity of two unconnected nodes with a plurality of public neighbor nodes is also high (namely, the second-order similarity is also high), and the two similarities can be well learned through a LINE algorithm, so that the LINE algorithm well retains the information contained in the original directed weighted network.
Finally, the credit evaluation device may generate the feature space corresponding to the candidate category information according to the node vector of the directed weighted network corresponding to the candidate category information, for example, the node vector may be directly set as the feature space of the candidate category information, or the node vector may be optimized or screened, and the processed node vector may be set as the feature space of the candidate category information.
It should be noted that, for one candidate category information, one or more directed weighted networks may be constructed, when one candidate category information constructs a plurality of directed weighted networks, a node vector of each directed weighted network may be respectively calculated to obtain a plurality of node vectors, the plurality of node vectors are used as node vectors corresponding to the one candidate category information, and a feature space corresponding to the candidate category information may be generated according to the node vectors.
In some embodiments, the candidate category information may include transaction information, a directed weighted network corresponding to each candidate category information is constructed, and a node vector of the directed weighted network is obtained, and the step of generating the feature space corresponding to the category information according to the node vector may include:
(a) acquiring a transfer record and a mobile payment record of transaction information;
(b) constructing a directed authorized network corresponding to the transfer record;
(c) acquiring a node vector of a directed weighted network;
(d) acquiring a feature vector of a mobile payment record;
(e) and generating a feature space corresponding to the transaction information according to the node vector and the feature vector.
Taking the candidate category information as the transaction information, for example, the transaction information may include a transfer record, a mobile payment record, and the like, and the credit evaluation device may extract the transfer record, the mobile payment record, and the like from the transaction information, where the transfer record may include one or more transfer records, and the mobile payment record may include one or more mobile payment records.
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on category information such as transaction information, and therefore, the category information counted in the embodiment of the present invention is information subjected to transcoding, desensitization, and other processing, so as to achieve the purpose of protecting the privacy of the user.
Then, the credit evaluation device constructs a directed authorized network G corresponding to the transfer record as follows: g ═ V, E, W, where V denotes nodes in the directed weighted network G, each node representing a user; e represents an edge in the directed authorized network G, and each edge represents that a transfer record exists between two users; w represents the weight of the edge and represents the transfer amount.
For example, as shown in fig. 3, taking the transfer record as an example, it is assumed that there are user u1, user u2, user u3, user u4, user u5, and user u6 in the directional weighted network G, and of course, there may not be only 6 users in the directional weighted network G. The points at the tail of the arrows in fig. 3 indicate one user who rolls out the amount, the points at the head of the arrows indicate the other user who receives the amount, the presence of edges between the two users a, b, c, d, e, f, g, and h indicate the amount of transfer, it can be appreciated from fig. 3 that user u1 transferred the amount a to user u2, user u5 transferred the amount e to user u1, user u3 transferred the amount f to user u5, and so on.
And secondly, the credit evaluation device acquires a node vector of the directed weighted network corresponding to the constructed transfer record, and at the moment, the network node can be represented as a low-dimensional vector through an LINE algorithm based on First-order approximation and Second-order approximation, so that the node vector is obtained.
It should be noted that the similarity between two nodes may represent the similarity of user transfer records, and if some sample users for credit evaluation are given to some initial nodes (i.e., after heterogeneous data of the user is obtained, a part of information is randomly selected as sample information, and the credit levels of the users corresponding to the sample information are manually labeled), the credit evaluation of other users may be measured by the similarity between other nodes and the sample users, that is, after network embedding is completed by using a LINE algorithm, each node in the weighted directed network G is represented by a low-dimensional vector, so that the similarity between users may be measured by using a euclidean distance.
In addition, the mobile payment record refers to a record generated by consuming through a social platform (such as WeChat) or paying through a network payment platform (such as Paibao, shopping website) and the like by a user. A mobile payment record may include a user identification, a type of goods paid, an amount paid, a time stamp of the payment store and the occurrence of the transaction, etc.
After obtaining the mobile payment record, the credit evaluation device may obtain a feature vector of the mobile payment record, for example, may extract payment features in the mobile payment record, where the payment features may include a user identifier, a type of a payment good, a payment amount, a timestamp of a payment shop and a transaction occurrence, and then perform a numerical processing on the payment features, and generate the feature vector according to the numerical features.
Finally, the credit evaluation device may generate a feature space corresponding to the transaction information according to the obtained node vector of the transfer record and the feature vector of the mobile payment record, for example, the node vector of the transfer record and the feature vector of the mobile payment record may be spliced to obtain the feature space corresponding to the transaction information.
Optionally, the step of obtaining the node vector of the directed weighted network may include:
calculating the estimated connection probability and the empirical connection probability between every two nodes in the directed weighted network;
calculating the distribution difference between the estimated connection probability and the empirical connection probability to obtain a first objective function;
calculating context pre-estimated probability and context experience probability between every two nodes in the directed weighted network;
calculating the distribution difference between the context pre-estimated probability and the context empirical probability to obtain a second objective function;
and acquiring a node vector of the directed weighted network according to the first objective function and the second objective function.
Specifically, the credit evaluation device firstly performs First-order approximation on a directed weighted network corresponding to the transfer records through a LINE algorithm:
calculating the estimated connection probability between every two nodes in the directed weighted network, wherein the estimated connection probability can be a low-dimensional space and can be represented by the following formula (1):
wherein p is1(vi,vj) Representing a node viAnd node vjPredicted connection probability between viAnd vjDirected to two nodes in a weighted network, viAnd vjThere is an edge in between, i.e. an edge (v) in a directed weighted networki,vj),uiRefers to obtaining the node v by the LINE algorithmiVector representation in a low-dimensional space, ujRefers to obtaining the node v by the LINE algorithmjIn the vector representation in the low dimensional space, T denotes the transpose of the vector, exp denotes an exponential function with a natural constant e as the base.
The meaning of the empirical connection probability may be a probability that every two nodes in the directed weighted network are connected with each other, the empirical connection probability may be a high-dimensional space, and the empirical connection probability between every two nodes in the directed weighted network is calculated, which may be shown in the following formula (2):
wherein,representing the probability of empirical connection, wi,jFor nodes v in directed weighted networksiAnd node vjThe weight of the edges in between, W is the sum of the weights of the edges in the directed weighted network, i.e.,
then, calculating a distribution difference between the estimated connection probability and the empirical connection probability to obtain a first objective function, which can be represented by the following formula (3):
wherein, O1A first objective function is represented as a function of time,representing the probability of empirical connection, p1(-) represents the estimated connection probability, d (·,) represents the KL-divergence (also called relative entropy) of the distribution between the estimated connection probability and the empirical connection probability, and the specific definition of the KL-divergence can be shown as the following formula (4):
it should be noted that, after obtaining the estimated connection probability and the empirical connection probability, since one requirement of network embedding is that the space of the node in the original space of the directional weighted network is kept as much as possible after embedding, the distance distribution between the nodes should be maintained, and therefore, if two nodes are connected to each other in the original directional weighted network, the distance between the vectors corresponding to the two nodes after embedding should be a little smaller, and in order to characterize the difference between the two distributions, a classical KL-divergence algorithm may be used here.
The empirical connection probability of the high-dimensional space is represented as the original connection information (i.e., the adjacency matrix) of the directed weighted network, the predicted connection probability of the low-dimensional space is represented as the vector space after vectorizing the nodes in the directed weighted network, and the first objective function can minimize the distribution difference between the predicted connection probability of the low-dimensional space and the empirical connection probability of the high-dimensional space. The first objective function O1That is, the first-order similarity is characterized, that is, the vector representations corresponding to two nodes connected in the original directed weighted network in the low-dimensional space should be relatively close.
Further, the credit evaluation device performs Second-order approximation on the directed weighted network corresponding to the transfer record through a LINE algorithm:
calculating the context prediction probability between every two nodes in the directed weighted network, which can be shown in the following formula (5):
wherein p is2(vj|vi) Representing a node viAs node vjThe context of (c) indicates the number of nodes in the weighted network, and (T) indicates the transpose of the vector.
Calculating the context experience probability between every two nodes in the directed weighted network can be shown as the following formula (6):
wherein, wi,jAs weights for edges in a directed weighted network, diIs node viThe out-degree is the number of nodes which are connected with one node in the directed weighted network; in-degree corresponds to out-degree, in-degree being, for a node, connected to the nodeThe number of nodes of a point. Node viOut degree d ofiCan be expressed as follows:
then, calculating a distribution difference between the context pre-estimation probability and the context empirical probability to obtain a second objective function, which can be represented by the following formula (7):
wherein λ isiCan be defined as the degree (including in-degree and out-degree) of each node, i.e.:
it should be noted that the second objective function can minimize the distribution difference between the context prediction probability of the low-dimensional space and the context experience probability of the high-dimensional space. The second objective function O2That is, second-order similarity is described, that is, vector representations corresponding to two nodes having many common neighbor nodes in the original directed weighted network in a low-dimensional space are relatively similar.
After the first objective function and the second objective function are obtained, the node vector of the first objective function and the node vector of the second objective function can be obtained, then the node vectors obtained by the first objective function and the second objective function are spliced to obtain the node vector of the directed weighted network, and the information of the transfer record is hidden in the node vector.
Optionally, the step of obtaining the node vector of the directed weighted network according to the first objective function and the second objective function may include:
optimizing the first objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the first objective function;
optimizing the second objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the second objective function;
and splicing the node low-dimensional vector under the first objective function and the node low-dimensional vector under the second objective function to obtain the node vector of the directed weighted network.
Specifically, in order to obtain an accurate low-dimensional node vector, after a first objective function and a second objective function are obtained, the first objective function and the second objective function may be optimized respectively, and then a node vector of the directed weighted network is obtained according to the optimized objective functions. For example, a first objective function may be optimized through a Stochastic Gradient Descent (SGD) algorithm to obtain a node low-dimensional vector under the first objective function, and a second objective function may be optimized through the SGD algorithm to obtain a node low-dimensional vector under the second objective function. And finally, carrying out left-right splicing on the node vector of the first objective optimization function and the node vector of the second objective optimization function to obtain the node vector of the directed weighted network.
Optionally, the step of obtaining the feature vector of the mobile payment record may comprise:
acquiring multi-dimensional payment characteristics of a mobile payment record;
coding the multidimensional payment characteristics to obtain payment coding information;
and generating a feature vector of the payment record according to the payment coding information.
Specifically, the credit evaluation device obtains multidimensional payment characteristics of the mobile payment record, where the multidimensional payment characteristics may include any of a plurality of payment characteristics in a user identifier, a type of a payment commodity, a payment amount, a payment shop, a timestamp of occurrence of a transaction, and the like, and for example, one mobile payment record may be represented as: user, category, money, shop _ name, time _ stamp, etc., where the user represents a user identifier, which may be a string type; category represents the type of payment goods, which may be a character string type; money denotes a payment amount, which may be of a floating-point number type, shop _ name denotes a payment store, which may be of a string type, and time _ stamp denotes a timestamp of the occurrence of the transaction, which may be of a timestamp type.
After the multidimensional payment characteristics of the mobile payment record are obtained, the multidimensional payment characteristics can be coded to obtain payment coding information, wherein the obtained payment coding information can be digitalized information, the coding mode can be flexibly set according to actual needs, and specific contents are not limited here. Finally, a feature vector of the payment record can be generated according to the payment coding information of each payment record, and the feature vector of the payment record can include a feature vector corresponding to one or more payment records.
Optionally, the step of encoding the multidimensional payment characteristic to obtain the payment encoding information may include:
converting non-numerical type payment characteristics in the multi-dimensional payment characteristics into numerical type payment characteristics;
and carrying out discretization processing on the converted numerical type payment characteristics and the numerical type payment characteristics in the multidimensional payment characteristics to obtain payment coding information.
When the credit evaluation device encodes the multidimensional payment feature, the multidimensional payment feature may be digitized to obtain a numerical type corresponding to each multidimensional payment feature, for example, a mapping relationship between a non-numerical type and a numerical type may be preset, different non-numerical types correspond to different numerical types, then a numerical type corresponding to the non-numerical type payment feature in the multidimensional payment feature is obtained according to the mapping relationship, and the non-numerical type payment feature in the multidimensional payment feature is converted into a numerical type payment feature.
Or, the non-numerical type payment feature in the multidimensional payment feature is converted into the numerical type payment feature to obtain the numerical multidimensional payment feature. For example, a mobile payment record may be represented as: the user identification user (character string type), the type category of the payment commodity, the payment amount money (floating point type), the payment shop _ name (character string type), the timestamp time _ stamp (timestamp type) of the transaction occurrence and the like, and the user identification user of the character string type, the type category of the payment commodity of the character string type and the payment shop _ name of the character string type can be labeled, that is, the character string type is mapped to the numerical value type, and the corresponding numerical value type payment characteristics are obtained.
After the non-numerical type payment features in the multi-dimensional payment features are converted into numerical type payment features, at this time, the multi-dimensional payment features of the mobile payment records are all numerical type payment features, discretization processing can be performed on the numerical type payment features in the multi-dimensional payment features, for example, money of floating point number type payment amount can be discretized into 10 levels (the difference between the maximum value and the minimum value in all payment amounts is equally 10 levels) at equal intervals in the records, and the numerical type payment features corresponding to each level are determined; dividing the timestamp time stamp of the occurrence of the transaction of the timestamp type by taking 10 minutes as a granularity, and determining the value type payment characteristic corresponding to each granularity. And finally, according to the multi-dimensional payment characteristics after discretization processing, payment coding information corresponding to each-dimensional payment characteristic can be obtained.
It should be noted that, when analyzing the mobile payment records of the user, all the mobile payment records may be preprocessed to obtain target payment features, and feature vectors of the mobile payment records are generated according to the target payment features, where the target payment features may include average consumption amount, most frequently occurring consumption amount, and most frequently consumed time period of the user. For example, statistical information such as an average consumption amount avg _ num, a most frequently occurring consumption amount (the consumption amount is subjected to equidistant dispersion processing in units of 100 here), an average value most _ num of the consumption amounts in a section where the consumption amounts are most frequently consumed, and a time period most _ time most frequently consumed by the user among all the mobile payment records of the user may be calculated, and a feature vector of the payment record may be formed from the statistical information.
In conclusion, after the node vector of the directed authorized network corresponding to the transfer record and the feature vector of the mobile payment record are obtained, the node vector of the transfer record and the feature vector of the mobile payment record can be spliced left and right to generate a feature space corresponding to the transaction information. The left-right splicing is formed by splicing the node vectors in the feature space on the left side and the feature vectors on the right side, or the left-right splicing is formed by splicing the node vectors in the feature space on the right side and the feature vectors on the left side.
In some embodiments, the candidate category information includes behavior information, a directed weighted network corresponding to each candidate category information is constructed, and a node vector of the directed weighted network is obtained, and the step of generating the feature space corresponding to the category information according to the node vector may include:
(a) acquiring multidimensional behavior characteristics of behavior information;
(b) constructing a directed weighted network corresponding to each dimension of behavior characteristics;
(c) acquiring a node vector of each directed weighted network;
(d) and generating a feature space corresponding to the behavior information according to the node vector of each directed weighted network.
Taking the candidate category information as the behavior information, for example, the behavior information may include a frequency of user's likes and dislikes, a frequency of comment making, a frequency of message sending, a frequency of voice call, a frequency of video call, and the like, the credit evaluation device may extract multidimensional behavior features from the behavior information, where the multidimensional behavior features may include any of a plurality of behavior features of the frequency of likes and dislikes, the frequency of comment making, the frequency of message sending, the frequency of call, and the like, and the frequency of call includes the frequency of voice call, the frequency of video call, and the like.
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on the visual information such as behavior information, and therefore, the visual information counted in the embodiment of the present invention is information subjected to transcoding, desensitization, and other processing, so as to achieve the purpose of protecting the privacy of the user.
Aiming at each dimension of behavior characteristics of the behavior information, the credit evaluation device constructs a directed authorized network corresponding to each dimension of behavior characteristics, for example, the behavior characteristics comprise the frequency of praise, the frequency of comment, the frequency of message sending, the frequency of communication and the like, at the moment, a directed authorized network A corresponding to the frequency of praise can be constructed, and the directed authorized network A is a network in which users mutually praise in a friend circle; a directed weighted network B corresponding to comments can be constructed, the directed weighted network B is a network in which users comment each other in a friend circle, a directed weighted network C corresponding to message sending frequency can be constructed, and the directed weighted network C is a network in which users send messages each other in a social platform (such as WeChat, QQ and the like); a directed weighted network D corresponding to the call frequency can be constructed, and the directed weighted network D is a network of the mutual video or voice call frequency between users.
After the directed weighted network corresponding to each dimensional behavior characteristic is obtained, the credit evaluation device can obtain the node vector of each directed weighted network through a LINE algorithm based on First-order approximation and Second-order approximation, wherein the node vector of each directed weighted network is obtained through the LINE algorithm, and is similar to the node vector of the directed weighted network corresponding to the transfer record obtained through the LINE algorithm, and the detailed description is omitted here.
After the node vectors of the directed weighted networks corresponding to the behavior features of each dimension are obtained, the credit evaluation device can generate feature spaces corresponding to the behavior information according to the node vectors of each directed weighted network. For example, the obtained node vector a of the directed weighted network a, the obtained node vector B of the directed weighted network B, the obtained node vector C of the directed weighted network C, and the obtained node vector D of the directed weighted network D may be spliced to obtain the feature space corresponding to the behavior information.
In some embodiments, the category information includes attribute information, and the step of obtaining vector information corresponding to each category information in the plurality of category information may include:
(1) acquiring multi-dimensional attribute characteristics of attribute information in a plurality of types of information;
(2) coding the multi-dimensional attribute characteristics to obtain attribute coding information;
(3) and generating a feature space corresponding to the attribute information according to the attribute coding information.
Specifically, for example, the category information is taken as the attribute information, where the attribute information may include gender, hometown, city of residence, age, and the like of the user, and the credit evaluation device may obtain a multi-dimensional attribute feature from the attribute information, where the multi-dimensional attribute feature may include any multiple attribute features of gender, hometown, city of residence, age, and the like.
For example, the multidimensional attribute feature of attribute information of one user can be expressed as: the method comprises the following steps of a user, a generator, a home, a domicile, an age and the like, wherein the user represents a user identifier which can be a character string type; gender represents the gender of the user, and the gender can be a character string type; the home represents the hometown of the user, and the hometown can be a character string type; domiile represents the user's city of residence, which may be a string type; age represents the present age of the user, which may be an integer type. The gender, the hometown, the living city and the like of the user can be provided when the user registers a social account or registers other website platform accounts, and can also be obtained from other ways; the age of the user can be calculated by the age and time filled in when the account is registered, or can be obtained from other ways.
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on the visual information such as attribute information, and therefore, the visual information counted in the embodiment of the present invention is information subjected to transcoding, desensitization, and other processing, so as to achieve the purpose of protecting the privacy of the user.
After the multidimensional attribute features of the attribute information are obtained, the credit evaluation device can encode the multidimensional attribute features to obtain attribute encoded information, wherein the obtained attribute encoded information can be digitalized information, the encoding mode of the attribute encoded information can be flexibly set according to actual needs, and specific contents are not limited here. Finally, a feature space corresponding to the attribute information may be generated according to the attribute coding information, for example, the attribute coding information may be directly composed into the feature space.
Optionally, the step of encoding the multidimensional attribute feature to obtain the attribute encoding information may include:
converting non-numerical type attribute features in the multi-dimensional attribute features into numerical type attribute features;
and generating attribute coding information according to the numerical value type attribute characteristics obtained by conversion and the numerical value type attribute characteristics in the multi-dimensional attribute characteristics.
When the credit evaluation device encodes the multidimensional attribute feature, the multidimensional attribute feature may be digitized to obtain a value type corresponding to each multidimensional attribute feature, for example, a mapping relationship between a non-value type and a value type may be preset, different non-value types correspond to different value types, then a value type corresponding to the non-value type attribute feature in the multidimensional attribute feature is obtained according to the mapping relationship, and the non-value type attribute feature in the multidimensional attribute feature is converted into a value type attribute feature.
Or, the non-numerical type attribute features in the multi-dimensional attribute features are encoded to form numerical type attribute features, that is, the number n of different values is counted for each dimension of the non-numerical type attribute features, then each dimension of the attribute features is encoded according to the sequence of 1 to n, and the same value is encoded (for example, male code is 1, female code is 0), so that the non-numerical type can be converted into the numerical type. For example, the user identifiers user, gender identifier, home, city of residence, etc. of the character string types may be mapped to numerical types to obtain the corresponding numerical type attribute characteristics.
After the non-numerical type attribute features in the multi-dimensional attribute features are converted into numerical type attribute features, at this time, the multi-dimensional attribute features of the attribute information correspond to the numerical type attribute features, and attribute coding information can be obtained. The attribute feature vector of the attribute information may be generated according to the attribute encoding information, and the attribute feature vector constitutes a feature space corresponding to the attribute information.
The corresponding feature space can be acquired by using different processing modes according to each category of information, so that the flexibility of acquiring the feature space is improved.
In step S103, the feature space corresponding to each category information is mapped into a kernel space, so as to obtain a plurality of kernel spaces.
After obtaining the feature space corresponding to each category information, the credit evaluation device may map the feature space corresponding to each category information into a kernel space, respectively, to obtain the kernel space corresponding to each category information, where the kernel space may be an infinite-dimensional space, and the kernel space may include an inner product between two feature vectors or an inner product between two node vectors, and the like. For example, after obtaining the feature space of the transaction information, the feature space of the behavior information, and the feature space of the attribute information, the feature space of the transaction information, the feature space of the behavior information, and the feature space of the attribute information may be mapped to the kernel space, and the kernel space of the transaction information, the kernel space of the behavior information, and the kernel space of the attribute information may be obtained.
Specifically, the feature space may be mapped to the kernel space by a kernel function, i.e., a vector characterizing the user in the feature space is mapped to the kernel space. The kernel function is defined as follows:
let χ be the input space, which may be the Euclidean space RnLet H be the kernel space, which may be a hilbert space, if there is a mapping from χ to H: φ (x) × → H, so that for all x, z ∈ χ, the function k (x, z) satisfies the condition:
k(x,z)=φ(x)·φ(y)
then k (x, z) is called the kernel function and φ (x) is the mapping function, where φ (x) · φ (z) is the inner product between φ (x) and φ (z).
The kernel function may include a linear kernel function, a polynomial kernel function, a gaussian kernel function, a radial basis kernel function, a Sigmoid kernel function, a complex kernel function, and the like, and the kernel function is a gaussian kernel function, which may be as follows:
and then, corresponding the obtained feature space corresponding to each category information to an input space χ, and obtaining a corresponding kernel space through processing of a Gaussian kernel function. For example, the obtained feature space of the transaction information, the feature space of the behavior information, and the feature space of the attribute information are mapped to the kernel space of the transaction information, the kernel space of the behavior information, and the kernel space of the attribute information, respectively, by being processed by the gaussian kernel function.
In step S104, a multi-kernel linear combination process is performed on the plurality of kernel spaces to obtain a composite kernel space.
After obtaining the multiple core spaces formed by the core spaces corresponding to each category information, the multiple core spaces may be subjected to multi-core linear combination processing, for example, the multiple core spaces may be subjected to linear combination calculation to obtain a composite core space.
In some embodiments, the step of performing multi-kernel linear combination processing on the plurality of kernel spaces to obtain the synthetic kernel space may include:
(1) normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces;
(2) acquiring a weight value corresponding to each category information in a plurality of category information;
(3) and performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
When the feature space corresponding to each category of information is obtained, the value range of each dimension feature in each category of information may not be limited, so that the value range of some features may be larger, and the value range of some features may be smaller, which may affect the obtained composite kernel space.
Specifically, the kernel space normalization process is as follows:
setting a finite subset of the input space χ to S ═ x1,...xnH, the features are mapped as phi (x): χ → H, the kernel function is k (x, z) ═ phi (x) · phi (z), let phi (S) · phi (x) ·1),...φ(xn) Is the mapping of S under the mapping phi,the element of the kernel matrix K is Kij=k(xi,xj) N, the norm of the eigenvector Φ (x) is:
the normalized feature vectors are:
the normalized kernel space is:
after normalization processing is performed on each kernel space, a normalized kernel space corresponding to each kernel space can be obtained, and therefore, after normalization processing is performed on a plurality of kernel spaces, a plurality of normalized kernel spaces can be obtained.
Since the heterogeneous data of the user is processed from different category information, and the different category information is heterogeneous and diverse, the obtained kernel spaces have different characteristics, the kernel spaces with different characteristics can be linearly combined, the advantages of multiple types of kernel spaces can be obtained, and thus better mapping performance can be obtained.
That is, multiple kernel spaces are subjected to multi-kernel linear combination processing to construct a synthetic kernel space, which may be a linear weighted sum kernel. Specifically, a weight value corresponding to each category information is obtained, and multi-core linear combination processing is performed according to the weight value corresponding to each category information and each normalized core space, so that a synthesized core space is obtained.
For example, the nuclear spaces corresponding to the transaction information, behavior information, and attribute information are set to k respectively1(x,z),k2(x,z),k3(x, z), normalized nuclear spaces after normalization are respectivelySubjecting the three normalized nuclear spaces to multipleAnd (3) performing linear combination treatment on the kernels to obtain a synthetic kernel space K (x, z) as follows:
wherein, betaiAnd the importance degree of the kernel space corresponding to the ith category information to the credit evaluation of the user is represented, namely the weight value corresponding to the ith category information. The multiple kernel spaces are subjected to multi-kernel linear combination processing to obtain the synthetic kernel space, so that large errors caused by information loss and the like due to the fact that the characteristics of heterogeneous data are spliced directly can be avoided.
In step S105, a credit evaluation result of the user is acquired from the synthetic kernel space.
After the multi-kernel linear combination processing is performed on the multiple kernel spaces to obtain the composite kernel space, a credit evaluation result of the user may be obtained according to the composite kernel space, where the credit evaluation result may be a credit score or a credit rating.
In some embodiments, the step of obtaining the credit evaluation result of the user according to the synthetic kernel space may include: and calculating the credit score of the user according to the synthetic kernel space through a preset regression model.
The credit evaluation device may preset a regression model, wherein the regression model is mainly used for calculating a credit evaluation result of the user according to the synthetic kernel space, and the obtained synthetic kernel space may be input to the regression model by taking the credit evaluation result as a credit score as an example, so that the credit score of the user may be output. The higher the credit score, the better the credit; the lower the credit score, the worse the credit.
Optionally, taking the credit evaluation result as the credit rating, the user's credit may be further rated according to the obtained credit rating result, for example, the credit rating corresponding to the obtained credit rating may be determined, and the higher the credit rating is, the higher the credit rating is; the lower the credit score, the lower the credit rating; the higher the credit rating, the better the credit may be set; the lower the credit rating, the worse the credit.
In some embodiments, the credit evaluation method may further include, before the step of calculating the credit score of the user according to the synthetic kernel space through a preset regression model:
(1) acquiring a training sample set, and dividing the training sample set into a plurality of category information sets;
(2) mapping a plurality of sets of class information to a kernel space;
(3) generating a target function according to the kernel space and a preset regression function;
(4) and processing the target function through a Lagrange dual algorithm to generate a regression model.
Specifically, the historical data of the user may be collected first, and the historical data of the user may be used as a training sample set, for example, all the historical data may be used as the training sample set, or a part of the collected historical data may be randomly or according to a preset rule screened out as the training sample set. And then, manually marking the credit level of the user corresponding to the training sample set for training the regression model.
A Support Vector Regression (SVR) model corresponding to a single category information will be described below:
let the training sample set beWherein xiIs an input value, yiIs the output value, d is the dimension, and n is the number of training samples. In ε -SVR, input x firstiMapping to feature space by a non-linear mapping phi allows the output value y to be fitted in feature space with a linear function f (x) omega phi (x) + biAnd f (x) for all training samples, there is | f (x)i)-yi| ≦ ε, and f (x) should be as smooth as possible. The regression function of the epsilon-SVR algorithm is thus obtained:
the expression meaning of the loss function L defined in the formula is that the model is allowed to have certain error, points within the error range are all considered as points on the model, and points outside the error range are required to enable the distance between the points and the fitted regression function to be as small as possible; the constant term C in the equation is a penalty parameter.
Similar to SVM, relaxation variables xi and xi are introduced into the SVR*The regression function is converted into:
solving by a Lagrange dual method, converting the original problem into a dual problem to obtain a new regression function as follows:
wherein alpha isiAndsolving the quadratic programming problem for Lagrange multipliers corresponding to two constraint conditions in the original problem to obtain optimal alpha and alpha*From α and α*Can obtain the originalThe values of ω and b for the problem.
The regression function of the single category information can be easily extended to multiple category information, that is, the training sample set is divided into multiple category information sets, the multiple category information sets are respectively mapped to the kernel space according to the above method, and the objective function is generated according to the kernel space and the regression function, so that the objective function of the regression model for the multiple category information can be defined as follows:
the above problem is also a quadratic programming problem, and the optimal values alpha and alpha are obtained by Lagrange dual algorithm*And beta, b and the like to obtain a fitting function of the training sample set, wherein the fitting function is a regression model and can be as follows:
after training the regression model, there is a new user xnewWhen it is desired to evaluate its credit, the user xnewThe credit of (c) can be calculated according to the following formula:
therefore, based on the regression model, the credit of the user can be accurately evaluated from multiple categories, the defect that the traditional single category is subjected to unified processing is effectively overcome, and the evaluation result is more accurate.
As can be seen from the above, the embodiment of the present invention can classify heterogeneous data related to a user into a plurality of category information; then, acquiring a feature space corresponding to each category information, mapping the feature space corresponding to each category information into a kernel space respectively, and performing multi-kernel linear combination processing on the obtained multiple kernel spaces to obtain a synthesized kernel space; therefore, the credit evaluation result of the user can be obtained according to the synthetic kernel space. According to the scheme, the credit of the user can be accurately evaluated according to the plurality of categories of information obtained by classifying the heterogeneous data of the user, so that the defects that the information of the user is uniformly processed, the credit of the user is evaluated according to a characteristic matrix formed by directly splicing all characteristics and the like are effectively overcome, and the evaluation result is more reliable.
The method described in the above embodiments is further illustrated in detail by way of example.
Taking a credit evaluation device as a server, taking credit scoring for a user A as an example, dividing basic data of the user A into three different categories, namely transaction information, behavior information, attribute information and the like, learning from the different categories respectively, mapping the different category information from a feature space to a kernel space, performing multiple linear combinations on the kernel space of each category information to obtain a synthetic kernel space, and finally scoring the credit of the user A through a regression model.
Referring to fig. 4, fig. 4 is a flowchart illustrating a credit evaluation method according to an embodiment of the invention. The method flow can comprise the following steps:
s201, the server obtains heterogeneous data of the user A.
The heterogeneous data may include gender, age, city of residence, frequency of comments, frequency of messaging, frequency of calls, records of money transfers, records of mobile payments, gender of user a friends, age of user a friends, frequency of calls between user a friends and user a, records of money transfers between user a friends and user a, and credits of user a friends.
The server may obtain the heterogeneous data of the user a from the internet through a crawler technology, and may also obtain the heterogeneous data of the user a through an API opened by a social platform (e.g., WeChat, microblog, QQ, etc.).
It is understood that the server may also obtain the heterogeneous data of the user a by other means, and the specific obtaining manner is not limited herein.
It should be noted that the embodiment of the present invention may perform transcoding, desensitization, and other processing on the heterogeneous data of the user, so as to achieve the purpose of protecting the privacy of the user.
S202, the server divides the heterogeneous data of the user A into transaction information, behavior information and attribute information.
After obtaining the heterogeneous data of the user a, the server may divide the heterogeneous data of the user a into three different categories of category information, such as transaction information, behavior information, and attribute information, as shown in fig. 5.
The transaction information may include a transfer record, a mobile payment record and the like, the behavior information may include a frequency of praise, a frequency of comment, a frequency of message sending, a frequency of voice call, a frequency of video call and the like, and the attribute information may include gender, hometown, age, city of residence and the like.
S203, the server acquires a transaction characteristic space of the transaction information, a behavior characteristic space of the behavior information and an attribute characteristic space of the attribute information.
It should be noted that the transaction feature space is a feature space of the transaction information, the behavior feature space is a feature space of the behavior information, and the attribute feature space is a feature space of the attribute information, where the naming is only to distinguish feature spaces of different types of information, such as the transaction information, the behavior information, and the attribute information.
After the transaction information, the behavior information and the attribute information are obtained, the server can obtain a transaction characteristic space of the transaction information, specifically, the server extracts a transfer record from the transaction information, the transfer record can comprise one or more transfer records, and then a directed authorized network G corresponding to the transfer record is constructed as follows: g ═ V, E, W, where V denotes nodes in the directed weighted network G, each node representing a user; e represents an edge in the directed authorized network G, and each edge represents that a transfer record exists between two users; w represents the weight of the edge and represents the transfer amount.
For example, a transfer record may be user 1 to user 2 with a transfer amount of 100 dollarsThe transfer record may have the following fields: u. of1,trans_num,u2Wherein u is1For user 1, trans _ num is the transfer amount transferred by user 1 to user 2, u2User 2.
After obtaining the transfer records, the server can calculate the node vectors of the transfer records through the LINE algorithm based on First-order approximation and Second-order approximation.
The server extracts a mobile payment record from the transaction information, which may include one or more mobile payment records. Then, extracting multidimensional payment characteristics such as user identification, payment commodity type, payment amount, payment shop and transaction occurrence timestamp in the mobile payment record, then carrying out numerical coding on the multidimensional payment characteristics to obtain payment coding information, and finally forming a feature vector of the payment record according to the payment coding information.
After obtaining the node vector of the transfer record and the feature vector of the mobile payment record, the server may splice the node vector and the feature vector left and right to obtain a transaction feature space (i.e., a feature space of the transaction information) corresponding to the transaction information, as shown in fig. 6.
And the server acquires a behavior feature space of the behavior information, specifically, the server extracts multidimensional behavior features such as praise frequency, comment, message sending frequency and call frequency from the behavior information, and then constructs a directed weighted network corresponding to each dimensional behavior feature, for example, constructs a directed weighted network Net corresponding to the praise frequencytuConstructing a directed weighted network Net corresponding to the commentcmConstructing directed weighted network Net corresponding to message sending frequencymsgAnd constructing a directed weighted network Net corresponding to the call frequencyvv. The frequency of each dimension of behavior feature can be used as the corresponding weight value, so that the four networks are all directed weighted networks. In order to be able to utilize the behavior information and social relations of the users in different networks and to be able to better mine the social information of the users, a node vector of the behavior information may be calculated through a LINE algorithm.
Is obtained byAfter the multidimensional behavior characteristics are obtained, the server can calculate the Net with the weight in the directed network based on First-order approximation and Second-order approximation through the LINE algorithmtuPrazina frequency node vector and calculation directed weighted network NetcmComment node vector and directed weighted network NetmsgMessage sending node vector and calculation directed weighted network NetvvThe traffic frequency node vector of (1). As shown in fig. 7, the praise frequency node vector, the comment node vector, the message sending node vector, and the call frequency node vector may be left-right spliced to obtain a behavior feature space (i.e., a feature space of behavior information) corresponding to the behavior information.
And the server acquires an attribute feature space of the attribute information, specifically, the server extracts multi-dimensional attribute features such as gender, hometown, residential city, age and the like from the attribute information, then carries out numerical coding on the multi-dimensional attribute features to obtain attribute coded information, and can generate an attribute feature vector of the attribute information (namely, a feature vector of the attribute information) according to the attribute coded information. For example, as shown in fig. 8, the multidimensional attribute feature of the attribute information of the user a may be expressed as: and coding information obtained by numerical coding corresponding to the multi-dimensional attribute characteristics is spliced left and right to obtain the characteristic space of the attribute information, wherein the user marks user, gender generator, home, city domicile, age at present and the like.
S204, the server maps the transaction characteristic space into a transaction core space, maps the behavior characteristic space into a behavior core space and maps the attribute characteristic space into an attribute core space.
It should be noted that the transaction core space is a core space of the transaction information, the behavior core space is a core space of the behavior information, and the attribute core space is a core space of the attribute information, where the naming is only to distinguish the core spaces of different types of information, such as the transaction information, the behavior information, and the attribute information.
After the transaction feature space, the behavior feature space, and the attribute feature space are obtained, the server may map the transaction feature space, the behavior feature space, and the attribute feature space to kernel spaces, respectively. Specifically, the transaction feature space may be mapped to a transaction kernel space by the kernel function, the behavior feature space may be mapped to a behavior kernel space by the kernel function, and the attribute feature space may be mapped to an attribute kernel space by the kernel function, as shown in fig. 5.
S205, the server carries out multi-core linear combination processing on the transaction core space, the behavior core space and the attribute core space to obtain a synthetic core space.
After the transaction core space, the behavior core space, and the attribute core space are obtained, the server may perform normalization processing on the transaction core space, the behavior core space, and the attribute core space, respectively, and then the server may perform multi-core linear combination processing on the transaction core space, the behavior core space, and the attribute core space after the normalization processing, so as to obtain a synthetic core space, as shown in fig. 5.
The normalization processing method is similar to the above-mentioned normalization processing method, and the multi-core linear combination processing is similar to the above-mentioned multi-core linear combination processing, which is not described herein again.
And S206, the server calculates the credit score of the user A according to the synthetic kernel space through a regression model.
After obtaining the composite kernel space according to the transaction kernel space, the behavior kernel space and the attribute kernel space, the server may input the composite kernel space into the regression model, and output the credit score of the user a from the regression model, as shown in fig. 5.
In the embodiment of the invention, multi-class description is generated on the credit of the user A based on the transaction information, the behavior information, the attribute information and the like of the user A, the characteristic space of the transaction information, the behavior information and the attribute information is mapped to the kernel space by utilizing a multi-class learning mechanism, multi-kernel linear combination is carried out on the transaction kernel space, the behavior kernel space and the attribute kernel space to obtain a synthetic kernel space, and finally the credit score of the user A is calculated by the synthetic kernel space through a regression model. The method and the device solve the problems of low fault tolerance rate, low accuracy and the like caused by processing the user information from a single category in the prior art, and improve the accuracy of credit scoring for the user A.
In order to better implement the credit evaluation method provided by the embodiment of the invention, the embodiment of the invention also provides a device based on the credit evaluation method. Wherein the meaning of the noun is the same as that in the above credit evaluation method, and the details of the implementation can be referred to the description in the method embodiment.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a credit evaluation apparatus according to an embodiment of the present invention, wherein the credit evaluation apparatus may include an information obtaining unit 301, a feature obtaining unit 302, a first mapping unit 303, a synthesizing unit 304, an evaluating unit 305, and the like.
The information obtaining unit 301 is configured to obtain heterogeneous data related to a user, and classify the heterogeneous data to obtain a plurality of category information.
The user may be an individual or an enterprise. When the user is a person, the heterogeneous data may include the gender of the user, the account identification of the user, the age of the user, the city of the user, the frequency of comments of the user, the frequency of messages sent by the user, the frequency of calls made by the user, the account transfer records of the user, the mobile payment records of the user, the gender of friends of the user, the frequency of calls made by friends of the user, the credit of friends of the user, the age of friends of the user, the account transfer records of friends of the user, and the like. The following description will be made in detail taking an example in which the user is a person.
Optionally, in an embodiment, the information obtaining unit 301 may obtain heterogeneous data of the user from the internet through a crawler technology; in another embodiment, the information obtaining unit 301 may obtain the heterogeneous data of the user through an Application Programming Interface (API) opened by a social platform (e.g., WeChat, microblog, QQ, etc.).
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on the heterogeneous data of the user, for example, the account id of the user may be subjected to hash processing to obtain a long string of characters to represent the account id of the user. Therefore, the heterogeneous data of the user counted by the embodiment of the invention is the information after the processes of transcoding, desensitization and the like, so that the purpose of protecting the privacy of the user is achieved.
After obtaining the heterogeneous data of the user, the information obtaining unit 301 may classify the heterogeneous data of the user to obtain a plurality of category information, where different category information may be obtained by classifying according to an attribute feature, a use, a source of generation, an obtaining manner, or a platform where the heterogeneous data is located, and a specific classification manner is not limited herein. For example, the heterogeneous data of the user may be divided into a plurality of different categories of information, such as transaction information, behavior information, and attribute information. The transaction information may include transfer records and mobile payment records of the user, transfer records and mobile payment records of friends of the user, and the like, the behavior information may include frequency of approval of the user, frequency of published comments of the user, frequency of messaging of the user, frequency of voice calls of the user, frequency of video calls of the user, frequency of messaging of friends of the user, frequency of voice calls of friends of the user, and the like, and the attribute information may include gender of the user, hometown of the user, age of the user, city of residence of the user, gender of friends of the user, city of residence of friends of the user, age of friends of the user, and the like. It can be understood that one category information is one type of heterogeneous data, and multiple category information may respectively correspond to multiple different types of heterogeneous data.
Note that, for the missing heterogeneous data, the information of the similar user may be used to perform completion. For example, when the gender of the user a is missing, the similarity between the users can be measured by the euclidean distance, a similar user most similar to the user a is determined, and then the gender of the similar user is used as the gender of the user a.
The feature obtaining unit 302 is configured to obtain vector information corresponding to each of the plurality of category information, and obtain a feature space corresponding to each of the category information according to the vector information.
After dividing the heterogeneous data of the user into a plurality of category information, the feature obtaining unit 302 may process each category information to obtain a feature space corresponding to each category information, where the feature space may be a vector space composed of vector information of the category information, and the vector information may include a feature vector, a node vector, and the like.
Different types of information have heterogeneous characteristics, so that different algorithms can be adopted to obtain the feature space corresponding to each type of information.
In some embodiments, as shown in fig. 10, the feature obtaining unit 302 may include:
an extraction subunit 3021 configured to extract candidate category information from the plurality of category information;
the acquiring subunit 3022 is configured to construct a directed weighted network corresponding to each candidate category information, and acquire a node vector of the directed weighted network;
the first generating subunit 3023 is configured to generate a feature space corresponding to the category information according to the node vector.
Specifically, in order to improve the efficiency of processing the plurality of pieces of category information, first, candidate category information is extracted from the plurality of pieces of category information by the extraction subunit 3021, where the candidate category information may include one or more pieces of category information, and the candidate category information may calculate a feature space thereof through a directed weighted network. For example, if the category information a, the category information B, the category information C, the category information D, and the category information E exist among the category information a, the category information B, and the category information C, and the feature space thereof can be calculated by constructing a directional weighted network, the category information a, the category information B, and the category information C can be extracted as candidate category information.
Then, the acquisition sub-unit 3022 constructs a directed weighted network corresponding to each candidate category information, for example, constructs a directed weighted network G corresponding to the category information aAComprises the following steps: gA=(VA,EA,WA) Wherein V isAAs a directed weighted network GAEach node may represent a user; eAAs a directed weighted network GAEach edge may represent category information a existing between two users; wACan represent edge EAThe weight of (c).
Secondly, after obtaining the directed weighted Network corresponding to each candidate category Information, the obtaining subunit 3022 may calculate a node vector of the directed weighted Network based on First-order approximation (First-order approximation) and Second-order approximation (Second-order approximation) through a Large-scale Information Network Embedding algorithm (LINE), that is, a Network node of the directed weighted Network is characterized as a low-dimensional vector. In the directed weighted network, the similarity of two connected nodes is high (namely, the first-order similarity is high), the similarity of two unconnected nodes with a plurality of public neighbor nodes is also high (namely, the second-order similarity is also high), and the two similarities can be well learned through a LINE algorithm, so that the LINE algorithm well retains the information contained in the original directed weighted network.
Finally, the first generating sub-unit 3023 may generate the feature space corresponding to the candidate category information according to the node vector of the directed weighted network corresponding to the candidate category information, for example, the node vector may be directly set as the feature space of the candidate category information, or the node vector may be optimized or filtered, and the processed node vector is set as the feature space of the candidate category information.
It should be noted that, for one candidate category information, one or more directed weighted networks may be constructed, when one candidate category information constructs a plurality of directed weighted networks, a node vector of each directed weighted network may be respectively calculated to obtain a plurality of node vectors, the plurality of node vectors are used as node vectors corresponding to the one candidate category information, and a feature space corresponding to the candidate category information may be generated according to the node vectors.
In some embodiments, the candidate category information includes transaction information, and the obtaining subunit 3022 may include:
the record acquisition module is used for acquiring the transfer record and the mobile payment record of the transaction information;
the construction module is used for constructing a directed authorized network corresponding to the transfer record;
the node vector acquisition module is used for acquiring a node vector of the directed weighted network;
the characteristic vector acquisition module is used for acquiring the characteristic vector of the mobile payment record;
the first generating subunit 3023 is specifically configured to: and generating a feature space corresponding to the transaction information according to the node vector and the feature vector.
Taking the candidate category information as the transaction information, for example, the transaction information may include a transfer record, a mobile payment record, and the like, the record obtaining module may extract information such as the transfer record, the mobile payment record, and the like from the transaction information, where the transfer record may include one or more transfer records, and the mobile payment record may include one or more mobile payment records.
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on category information such as transaction information, and therefore, the category information counted in the embodiment of the present invention is information subjected to transcoding, desensitization, and other processing, so as to achieve the purpose of protecting the privacy of the user.
Then, the construction module constructs a directed authorized network G corresponding to the transfer record as follows: g ═ V, E, W, where V denotes nodes in the directed weighted network G, each node representing a user; e represents an edge in the directed authorized network G, and each edge represents that a transfer record exists between two users; w represents the weight of the edge and represents the transfer amount.
For example, as shown in fig. 3, taking the transfer record as an example, it is assumed that there are user u1, user u2, user u3, user u4, user u5, and user u6 in the directional weighted network G, and of course, there may not be only 6 users in the directional weighted network G. The points at the tail of the arrows in fig. 3 indicate one user who rolls out the amount, the points at the head of the arrows indicate the other user who receives the amount, the presence of edges between the two users a, b, c, d, e, f, g, and h indicate the amount of transfer, it can be appreciated from fig. 3 that user u1 transferred the amount a to user u2, user u5 transferred the amount e to user u1, user u3 transferred the amount f to user u5, and so on.
And secondly, acquiring a node vector of the directed weighted network corresponding to the constructed transfer record by a node vector acquisition module, and characterizing the network nodes as low-dimensional vectors to obtain the node vector based on First-order approximation and Second-order approximation by an LINE algorithm.
It should be noted that the similarity between two nodes may represent the similarity of user transfer records, and if some sample users for credit evaluation are given to some initial nodes (i.e., after heterogeneous data of the user is obtained, a part of information is randomly selected as sample information, and the credit levels of the users corresponding to the sample information are manually labeled), the credit evaluation of other users may be measured by the similarity between other nodes and the sample users, that is, after network embedding is completed by using a LINE algorithm, each node in the weighted directed network G is represented by a low-dimensional vector, so that the similarity between users may be measured by using a euclidean distance.
In addition, the mobile payment record refers to a record generated by consuming through a social platform (such as WeChat) or paying through a network payment platform (such as Paibao, shopping website) and the like by a user. A mobile payment record may include a user identification, a type of goods paid, an amount paid, a time stamp of the payment store and the occurrence of the transaction, etc.
After the mobile payment record is obtained, the feature vector obtaining module may obtain a feature vector of the mobile payment record, for example, the payment features in the mobile payment record may be extracted first, where the payment features may include a user identifier, a type of a payment commodity, a payment amount, a time stamp of a payment store and a transaction occurrence, and then, the payment features are subjected to a numerical processing, and the feature vector is generated according to the numerical features.
Finally, the first generating subunit 3023 may generate a feature space corresponding to the transaction information according to the obtained node vector of the transfer record and the feature vector of the mobile payment record, for example, the node vector of the transfer record and the feature vector of the mobile payment record may be spliced to obtain the feature space corresponding to the transaction information.
Optionally, the node vector obtaining module may include:
the first calculation submodule is used for calculating the estimated connection probability and the empirical connection probability between every two nodes in the directed weighted network;
the second calculation submodule is used for calculating the distribution difference between the estimated connection probability and the empirical connection probability to obtain a first objective function;
the third computation submodule is used for computing context pre-estimated probability and context experience probability between every two nodes in the directed weighted network;
the fourth calculation submodule is used for calculating the distribution difference between the context pre-estimation probability and the context experience probability to obtain a second objective function;
and the obtaining submodule is used for obtaining the node vector of the directed weighted network according to the first objective function and the second objective function.
Specifically, First, a First-order approximation is carried out on a directed weighted network corresponding to the transfer record through a LINE algorithm:
the first calculation submodule calculates an estimated connection probability between every two nodes in the directed weighted network, where the estimated connection probability may be a low-dimensional space, and may be represented by the following formula (1):
wherein p is1(vi,vj) Representing a node viAnd node vjPredicted connection probability between viAnd vjDirected to two nodes in a weighted network, viAnd vjThere is an edge in between, i.e. an edge (v) in a directed weighted networki,vj),uiRefers to obtaining the node v by the LINE algorithmiVector representation in a low-dimensional space, ujRefers to obtaining the node v by the LINE algorithmjIn the vector representation in the low dimensional space, T denotes the transpose of the vector, exp denotes an exponential function with a natural constant e as the base.
The meaning of the empirical connection probability may be a probability that every two nodes in the directed weighted network are connected to each other, the empirical connection probability may be a high-dimensional space, and the first calculation sub-module calculates the empirical connection probability between every two nodes in the directed weighted network, which may be represented by the following formula (2):
wherein,representing the probability of empirical connection, wi,jFor nodes v in directed weighted networksiAnd node vjThe weight of the edges in between, W is the sum of the weights of the edges in the directed weighted network, i.e.,
then, the second calculation submodule calculates a distribution difference between the estimated connection probability and the empirical connection probability to obtain a first objective function, which can be expressed by the following formula (3):
wherein, O1A first objective function is represented as a function of time,representing the probability of empirical connection, p1(-) represents the estimated connection probability, d (·,) represents the KL-divergence (also called relative entropy) of the distribution between the estimated connection probability and the empirical connection probability, and the specific definition of the KL-divergence can be shown as the following formula (4):
it should be noted that, after obtaining the estimated connection probability and the empirical connection probability, since one requirement of network embedding is that the space of the node in the original space of the directional weighted network is kept as much as possible after embedding, the distance distribution between the nodes should be maintained, and therefore, if two nodes are connected to each other in the original directional weighted network, the distance between the vectors corresponding to the two nodes after embedding should be a little smaller, and in order to characterize the difference between the two distributions, a classical KL-divergence algorithm may be used here.
The empirical connection probability of the high-dimensional space is represented as the original connection information (i.e., the adjacency matrix) of the directed weighted network, the predicted connection probability of the low-dimensional space is represented as the vector space after vectorizing the nodes in the directed weighted network, and the first objective function can minimize the distribution difference between the predicted connection probability of the low-dimensional space and the empirical connection probability of the high-dimensional space. The first objective function O1That is, the first-order similarity is characterized, that is, the vector representations corresponding to two nodes connected in the original directed weighted network in the low-dimensional space should be relatively close.
Further, performing Second-order approximation on the directed weighted network corresponding to the transfer records through a LINE algorithm:
the third computation submodule calculates the context estimated probability between every two nodes in the directed weighted network, which can be expressed by the following formula (5):
wherein p is2(vj|vi) Representing a node viAs node vjThe context of (c) indicates the number of nodes in the weighted network, and (T) indicates the transpose of the vector.
The third computation submodule calculates the empirical probability of the context between every two nodes in the weighted network, which can be expressed by the following formula (6):
wherein, wi,jAs weights for edges in a directed weighted network, diIs node viThe out-degree is the number of nodes which are connected with one node in the directed weighted network; the in degree is the number of nodes connected to a node for which the out degree corresponds. Node viOut degree d ofiCan be expressed as follows:
then, the fourth calculating submodule calculates a distribution difference between the context pre-estimation probability and the context empirical probability to obtain a second objective function, which may be represented by the following formula (7):
wherein λ isiCan be defined as the degree (including in-degree and out-degree) of each node, i.e.:
it should be noted that the second objective function can minimize the distribution difference between the context prediction probability of the low-dimensional space and the context experience probability of the high-dimensional space. The second objective function O2That is, second-order similarity is described, that is, vector representations corresponding to two nodes having many common neighbor nodes in the original directed weighted network in a low-dimensional space are relatively similar.
After the first objective function and the second objective function are obtained, the obtaining submodule can obtain the node vector of the first objective function and obtain the node vector of the second objective function, then the node vectors obtained by the first objective function and the second objective function are spliced to obtain the node vector of the directed weighted network, and the information of the transfer record is hidden in the node vector.
Optionally, the obtaining sub-module is specifically configured to:
optimizing the first objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the first objective function;
optimizing the second objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the second objective function;
and splicing the node low-dimensional vector under the first objective function and the node low-dimensional vector under the second objective function to obtain the node vector of the directed weighted network.
Specifically, in order to obtain an accurate low-dimensional node vector, after obtaining the first objective function and the second objective function, the obtaining sub-module may respectively optimize the first objective function and the second objective function, and then obtain the node vector of the directed weighted network according to the optimized objective functions. For example, a first objective function may be optimized through a Stochastic Gradient Descent (SGD) algorithm to obtain a node low-dimensional vector under the first objective function, a second objective function may be optimized through an SGD algorithm to obtain a node low-dimensional vector under the second objective function, and finally, the node vectors of the first objective function and the node vectors of the second objective function are left-right spliced to obtain a node vector of the directed weighted network.
Optionally, the feature vector obtaining module may include:
the characteristic obtaining submodule is used for obtaining the multidimensional payment characteristics of the mobile payment record;
the coding submodule is used for coding the multidimensional payment characteristics to obtain payment coding information;
and the generating submodule is used for generating the characteristic vector of the payment record according to the payment coding information.
Specifically, the feature obtaining sub-module obtains the multidimensional payment features of the mobile payment record, where the multidimensional payment features may include any of a plurality of payment features in a user identifier, a type of a payment commodity, a payment amount, a payment store, a timestamp of occurrence of a transaction, and the like, and for example, one mobile payment record may be represented as: user, category, money, shop _ name, time _ stamp, etc., where the user represents a user identifier, which may be a string type; category represents the type of payment goods, which may be a character string type; money denotes a payment amount, which may be of a floating-point number type, shop _ name denotes a payment store, which may be of a string type, and time _ stamp denotes a timestamp of the occurrence of the transaction, which may be of a timestamp type.
After the multidimensional payment characteristics of the mobile payment record are obtained, the coding sub-module can code the multidimensional payment characteristics to obtain payment coding information, wherein the obtained payment coding information can be digitalized information, the coding mode can be flexibly set according to actual needs, and specific contents are not limited here. And finally, the generating submodule can generate a feature vector of the payment record according to the payment coding information of each payment record, and the feature vector of the payment record can comprise a feature vector corresponding to one or more payment records.
Optionally, the encoding submodule is specifically configured to:
converting non-numerical type payment characteristics in the multi-dimensional payment characteristics into numerical type payment characteristics;
and carrying out discretization processing on the converted numerical type payment characteristics and the numerical type payment characteristics in the multidimensional payment characteristics to obtain payment coding information.
When the coding sub-module codes the multidimensional payment features, the multidimensional payment features may be digitized to obtain a numerical type corresponding to each multidimensional payment feature, for example, a mapping relationship between a non-numerical type and a numerical type may be preset, different non-numerical types correspond to different numerical types, then a numerical type corresponding to the non-numerical type payment features in the multidimensional payment features is obtained according to the mapping relationship, and the non-numerical type payment features in the multidimensional payment features are converted into numerical type payment features.
Or, the non-numerical type payment feature in the multidimensional payment feature is converted into the numerical type payment feature to obtain the numerical multidimensional payment feature. For example, a mobile payment record may be represented as: the user identification user (character string type), the type category of the payment commodity, the payment amount money (floating point type), the payment shop _ name (character string type), the timestamp time _ stamp (timestamp type) of the transaction occurrence and the like, and the user identification user of the character string type, the type category of the payment commodity of the character string type and the payment shop _ name of the character string type can be labeled, that is, the character string type is mapped to the numerical value type, and the corresponding numerical value type payment characteristics are obtained.
After the non-numerical type payment features in the multi-dimensional payment features are converted into numerical type payment features, at this time, the multi-dimensional payment features of the mobile payment records correspond to the numerical type payment features, the coding sub-module can discretize the numerical type payment features in the multi-dimensional payment features, for example, floating point type payment money can be discretized into 10 levels (the difference between the maximum value and the minimum value in all payment amounts is equally divided into 10 levels) at equal intervals in the records, and the numerical type payment features corresponding to each level are determined; dividing the timestamp time stamp of the occurrence of the transaction of the timestamp type by taking 10 minutes as a granularity, and determining the value type payment characteristic corresponding to each granularity. And finally, according to the multi-dimensional payment characteristics after discretization processing, payment coding information corresponding to each-dimensional payment characteristic can be obtained.
It should be noted that, when analyzing the mobile payment records of the user, all the mobile payment records may be preprocessed to obtain target payment features, and feature vectors of the mobile payment records are generated according to the target payment features, where the target payment features may include average consumption amount, most frequently occurring consumption amount, and most frequently consumed time period of the user. For example, statistical information such as an average consumption amount avg _ num, a most frequently occurring consumption amount (the consumption amount is subjected to equidistant dispersion processing in units of 100 here), an average value most _ num of the consumption amounts in a section where the consumption amounts are most frequently consumed, and a time period most _ time most frequently consumed by the user among all the mobile payment records of the user may be calculated, and a feature vector of the payment record may be formed from the statistical information.
In summary, after obtaining the node vector of the directed authorized network corresponding to the transfer record and the feature vector of the mobile payment record, the first generating subunit 3023 may splice the node vector of the transfer record and the feature vector of the mobile payment record left and right to generate the feature space corresponding to the transaction information. The left-right splicing is formed by splicing the node vectors in the feature space on the left side and the feature vectors on the right side, or the left-right splicing is formed by splicing the node vectors in the feature space on the right side and the feature vectors on the left side.
In some embodiments, the candidate category information includes behavior information, and the obtaining subunit 3022 is specifically configured to:
acquiring multidimensional behavior characteristics of behavior information;
constructing a directed weighted network corresponding to each dimension of behavior characteristics;
acquiring a node vector of each directed weighted network;
the first generating subunit 3023 is specifically configured to: and generating a feature space corresponding to the behavior information according to the node vector of each directed weighted network.
Taking the candidate category information as the behavior information, for example, the behavior information may include a frequency of user's approval, a frequency of comment making, a frequency of message sending, a frequency of voice call, a frequency of video call, and the like, the obtaining subunit 3022 may extract multidimensional behavior features from the behavior information, where the multidimensional behavior features may include any multiple behavior features of the frequency of approval, the frequency of comment making, the frequency of message sending, the frequency of call, and the like, and the frequency of call includes the frequency of voice call, the frequency of video call, and the like.
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on category information such as behavior information, and therefore, the category information counted in the embodiment of the present invention is information subjected to transcoding, desensitization, and other processing, so as to achieve the purpose of protecting the privacy of the user.
For each dimension of the behavior features of the behavior information, the obtaining subunit 3022 constructs a directional authorized network corresponding to each dimension of the behavior features, for example, the behavior features include a frequency of praise, a frequency of comment, a frequency of message sending, a frequency of call communication, and the like, at this time, a directional authorized network a corresponding to the frequency of praise may be constructed, and the directional authorized network a is a network in which users mutually praise in a friend circle; a directed weighted network B corresponding to comments can be constructed, the directed weighted network B is a network in which users comment each other in a friend circle, a directed weighted network C corresponding to message sending frequency can be constructed, and the directed weighted network C is a network in which users send messages each other in a social platform (such as WeChat, QQ and the like); a directed weighted network D corresponding to the call frequency can be constructed, and the directed weighted network D is a network of the mutual video or voice call frequency between users.
After the directed weighted network corresponding to each dimensional behavior feature is obtained, the obtaining subunit 3022 may obtain the node vector of each directed weighted network based on First-order approximation and Second-order approximation through the LINE algorithm, where the node vector of each directed weighted network obtained through the LINE algorithm is similar to the node vector of the directed weighted network corresponding to the transfer record obtained through the LINE algorithm, and is not described here again.
After obtaining the node vector of the directed weighted network corresponding to each dimensional behavior feature, the first generating subunit 3023 may generate a feature space corresponding to the behavior information according to the node vector of each directed weighted network. For example, the obtained node vector a of the directed weighted network a, the obtained node vector B of the directed weighted network B, the obtained node vector C of the directed weighted network C, and the obtained node vector D of the directed weighted network D may be spliced to obtain the feature space corresponding to the behavior information.
In some embodiments, as shown in fig. 11, the category information includes attribute information, and the feature obtaining unit 302 includes:
a feature obtaining subunit 3024, configured to obtain a multidimensional attribute feature of the attribute information in the plurality of pieces of category information;
the encoding subunit 3025 is configured to encode the multi-dimensional attribute features to obtain attribute encoding information;
a second generating subunit 3026, configured to generate a feature space corresponding to the attribute information according to the attribute coding information.
Specifically, taking category information as attribute information as an example, where the attribute information may include gender, hometown, city of residence, age, and the like of the user, the characteristic obtaining subunit 3024 may obtain a multi-dimensional attribute characteristic from the attribute information, where the multi-dimensional attribute characteristic may include any multiple attribute characteristics of gender, hometown, city of residence, age, and the like.
For example, the multidimensional attribute feature of attribute information of one user can be expressed as: the method comprises the following steps of a user, a generator, a home, a domicile, an age and the like, wherein the user represents a user identifier which can be a character string type; gender represents the gender of the user, and the gender can be a character string type; the home represents the hometown of the user, and the hometown can be a character string type; domiile represents the user's city of residence, which may be a string type; age represents the present age of the user, which may be an integer type. The gender, the hometown, the living city and the like of the user can be provided when the user registers a social account or registers other website platform accounts, and can also be obtained from other ways; the age of the user can be calculated by the age and time filled in when the account is registered, or can be obtained from other ways.
It should be noted that, in order to protect the privacy of the user, the embodiment of the present invention may perform transcoding, desensitization, and other processing on category information such as attribute information, and therefore, the category information counted in the embodiment of the present invention is information subjected to transcoding, desensitization, and other processing, so as to achieve the purpose of protecting the privacy of the user.
After obtaining the multidimensional attribute features of the attribute information, the encoding subunit 3025 may encode the multidimensional attribute features to obtain attribute encoded information, where the obtained attribute encoded information may be digitized information, and the encoding manner of the attribute encoded information may be flexibly set according to actual needs, and the specific content is not limited here. Finally, the second generating subunit 3026 may generate a feature space corresponding to the attribute information according to the attribute coding information, for example, the attribute coding information may be directly composed into the feature space.
Optionally, the coding subunit 3025 is specifically configured to:
converting non-numerical type attribute features in the multi-dimensional attribute features into numerical type attribute features;
and generating attribute coding information according to the numerical value type attribute characteristics obtained by conversion and the numerical value type attribute characteristics in the multi-dimensional attribute characteristics.
When the encoding subunit 3025 encodes the multidimensional attribute feature, the multidimensional attribute feature may be digitized to obtain a value type corresponding to each multidimensional attribute feature, for example, a mapping relationship between a non-value type and a value type may be preset, where different non-value types correspond to different value types, and then a value type corresponding to the non-value type attribute feature in the multidimensional attribute feature is obtained according to the mapping relationship, so as to convert the non-value type attribute feature in the multidimensional attribute feature into a value type attribute feature.
Or, the non-numerical type attribute features in the multi-dimensional attribute features are encoded to form numerical type attribute features, that is, the number n of different values is counted for each dimension of the non-numerical type attribute features, then each dimension of the attribute features is encoded according to the sequence of 1 to n, and the same value is encoded (for example, male code is 1, female code is 0), so that the non-numerical type can be converted into the numerical type. For example, the user identifiers user, gender identifier, home, city of residence, etc. of the character string types may be mapped to numerical types to obtain the corresponding numerical type attribute characteristics.
After the non-numerical type attribute features in the multi-dimensional attribute features are converted into numerical type attribute features, at this time, the multi-dimensional attribute features of the attribute information correspond to the numerical type attribute features, and attribute coding information can be obtained. The attribute feature vector of the attribute information may be generated according to the attribute encoding information, and the attribute feature vector constitutes a feature space corresponding to the attribute information.
The corresponding feature space can be acquired by using different processing modes according to each category of information, so that the flexibility of acquiring the feature space is improved.
The first mapping unit 303 is configured to map the feature space corresponding to each piece of category information into a kernel space, respectively, to obtain a plurality of kernel spaces.
After obtaining the feature space corresponding to each category information, the first mapping unit 303 may map the feature space corresponding to each category information into a kernel space, respectively, to obtain a kernel space corresponding to each category information, where the kernel space may be an infinite-dimensional space, and the kernel space may include an inner product between every two feature vectors, or an inner product between every two node vectors, and the like. For example, after obtaining the feature space of the transaction information, the feature space of the behavior information, and the feature space of the attribute information, the feature space of the transaction information, the feature space of the behavior information, and the feature space of the attribute information may be mapped to the kernel space, and the kernel space of the transaction information, the kernel space of the behavior information, and the kernel space of the attribute information may be obtained.
Specifically, the first mapping unit 303 may map the feature space to the kernel space through a kernel function, that is, map a vector characterizing a user in the feature space to the kernel space. The kernel function is defined as follows:
let χ be the input space, which may be the Euclidean space RnLet H be the kernel space, which may be a hilbert space, if there is a mapping from χ to H: φ (x) × → H, so that for all x, z ∈ χ, the function k (x, z) satisfies the condition:
k(x,z)=φ(x)·φ(y)
then k (x, z) is called the kernel function and φ (x) is the mapping function, where φ (x) · φ (z) is the inner product between φ (x) and φ (z).
The kernel function may include a linear kernel function, a polynomial kernel function, a gaussian kernel function, a radial basis kernel function, a Sigmoid kernel function, a complex kernel function, and the like, and the kernel function is a gaussian kernel function, which may be as follows:
and then, corresponding the obtained feature space corresponding to each category information to an input space χ, and obtaining a corresponding kernel space through processing of a Gaussian kernel function. For example, the obtained feature space of the transaction information, the feature space of the behavior information, and the feature space of the attribute information are mapped to the kernel space of the transaction information, the kernel space of the behavior information, and the kernel space of the attribute information, respectively, by being processed by the gaussian kernel function.
And a synthesizing unit 304, configured to perform multi-core linear combination processing on the multiple core spaces to obtain a synthesized core space.
After obtaining a plurality of core spaces composed of the core spaces corresponding to each category information, synthesizing unit 304 may perform multi-core linear combination processing on the plurality of core spaces, for example, perform linear combination calculation on the plurality of core spaces to obtain a synthesized core space.
In certain embodiments, synthesis unit 304 is specifically configured to:
normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces;
acquiring a weight value corresponding to each category information in a plurality of category information;
and performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
Since the value range of each dimension feature in each category information may not be limited when the feature space corresponding to each category information is obtained, the value range of some features may be larger, and the value range of some features may be smaller, which may affect the obtained composite kernel space, and in order to reduce such an effect, the synthesis unit 304 may perform normalization processing on the kernel space.
Specifically, the kernel space normalization process is as follows:
setting a finite subset of the input space χ to S ═ x1,...xnH, the feature maps to phi (x): χ → H, and the kernel function k (x, z) ═ phi (x) · phi (z), letφ(S)={φ(x1),...φ(xn) Is the mapping of S under the mapping phi,the element of the kernel matrix K is Kij=k(xi,xj) N, the norm of the eigenvector Φ (x) is:
the normalized feature vectors are:
the normalized kernel space is:
after normalization processing is performed on each kernel space, a normalized kernel space corresponding to each kernel space can be obtained, and therefore, after normalization processing is performed on a plurality of kernel spaces, a plurality of normalized kernel spaces can be obtained.
Since the heterogeneous data of the user is processed from different category information, and the different category information is heterogeneous and diverse, the obtained kernel spaces have different characteristics, the kernel spaces with different characteristics can be linearly combined, the advantages of multiple types of kernel spaces can be obtained, and thus better mapping performance can be obtained.
That is, multiple kernel spaces are subjected to multi-kernel linear combination processing to construct a synthetic kernel space, which may be a linear weighted sum kernel. Specifically, the synthesis unit 304 obtains a weight value corresponding to each category information, and performs multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
For example, the nuclear spaces corresponding to the transaction information, behavior information, and attribute information are set to k respectively1(x,z),k2(x,z),k3(x, z), normalized nuclear spaces after normalization are respectivelyPerforming multi-core linear combination processing on the three normalized kernel spaces to obtain a synthetic kernel space K (x, z) as follows:
wherein, betaiAnd the importance degree of the kernel space corresponding to the ith category information to the credit evaluation of the user is represented, namely the weight value corresponding to the ith category information. The multiple kernel spaces are subjected to multi-kernel linear combination processing to obtain the synthetic kernel space, so that large errors caused by information loss and the like due to the fact that the characteristics of heterogeneous data are spliced directly can be avoided.
And the evaluation unit 305 is used for acquiring a credit evaluation result of the user according to the synthesis kernel space.
After performing multi-kernel linear combination processing on the plurality of kernel spaces to obtain the composite kernel space, the evaluation unit 305 may obtain a credit evaluation result of the user according to the composite kernel space, where the credit evaluation result may be a credit score or a credit rating.
In certain embodiments, the evaluation unit 305 is specifically configured to: and calculating the credit score of the user according to the synthetic kernel space through a preset regression model.
The evaluation unit 305 may set a regression model in advance, where the regression model is mainly used to calculate a credit evaluation result of the user according to the synthetic kernel space, and the evaluation unit 305 may input the obtained synthetic kernel space into the regression model, so as to output the credit score of the user. The higher the credit score, the better the credit; the lower the credit score, the worse the credit.
Alternatively, taking the credit evaluation result as the credit rating as an example, the evaluation unit 305 may further rate the credit of the user according to the obtained credit rating result, for example, may determine the credit rating corresponding to the obtained credit rating, and may set the higher the credit rating, the higher the credit rating; the lower the credit score, the lower the credit rating; the higher the credit rating, the better the credit may be set; the lower the credit rating, the worse the credit.
In some embodiments, as shown in fig. 12, the credit evaluation device further comprises:
an information set obtaining unit 306, configured to obtain a training sample set, and divide the training sample set into a plurality of category information sets;
a second mapping unit 307, configured to map the plurality of category information sets to a kernel space;
an objective function generating unit 308, configured to generate an objective function according to the kernel space and a preset regression function;
the model generating unit 309 is configured to process the target function through a lagrange dual algorithm to generate a regression model.
Specifically, the information set obtaining unit 306 may collect the historical data of the user first, and use the historical data of the user as a training sample set, for example, all the historical data may be used as the training sample set, or a part of the data may be selected from the collected historical data randomly or according to a preset rule as the training sample set. And then, manually marking the credit level of the user corresponding to the training sample set for training the regression model.
A Support Vector Regression (SVR) model corresponding to a single category information will be described below:
let the training sample set beWherein xiIs an input value, yiIs the output value, d is the dimension, and n is the number of training samples. In ε -SVR, input x firstiMapping to feature space by non-linear mapping phi so that in feature spaceThe output y may be fitted with a linear function f (x) ω · Φ (x) + biAnd f (x) for all training samples, there is | f (x)i)-yi| ≦ ε, and f (x) should be as smooth as possible. The regression function of the epsilon-SVR algorithm is thus obtained:
the expression meaning of the loss function L defined in the formula is that the model is allowed to have certain error, points within the error range are all considered as points on the model, and points outside the error range are required to enable the distance between the points and the fitted regression function to be as small as possible; the constant term C in the equation is a penalty parameter.
Similar to SVM, relaxation variables ξ and ξ are introduced into SVR, transforming the above regression function into:
solving by a Lagrange dual method, converting the original problem into a dual problem to obtain a new regression function as follows:
wherein alpha isiAndsolving the quadratic programming problem for Lagrange multipliers corresponding to two constraint conditions in the original problem to obtain optimal alpha and alpha*From α and α*The values of ω and b for the original problem can be obtained.
The regression function of the single category information can be easily extended to multiple category information, that is, the information set obtaining unit 306 first divides the training sample set into multiple category information sets, the second mapping unit 307 maps the multiple category information sets to the kernel space according to the above manner, and the objective function generating unit 308 generates an objective function according to the kernel space and the regression function, so that the objective function of the regression model for the multiple category information can be defined as follows:
the above problem is also a quadratic programming problem, and the model generating unit 309 may obtain the optimal values α and α through the lagrange dual algorithm*And beta, b and the like to obtain a fitting function of the training sample set, wherein the fitting function is a regression model and can be as follows:
after training the regression model, there is a new user xnewWhen it is desired to evaluate its credit, the user xnewThe credit of (c) can be calculated according to the following formula:
therefore, based on the regression model, the credit of the user can be accurately evaluated from multiple categories, the defect that the traditional single category is subjected to unified processing is effectively overcome, and the evaluation result is more accurate.
As can be seen from the above, the information obtaining unit 301 in the embodiment of the present invention may classify heterogeneous data related to a user into a plurality of categories; then, the feature obtaining unit 302 obtains a feature space corresponding to each category of information, and the first mapping unit 303 maps the feature space corresponding to each category of information into a kernel space, respectively; the synthesis unit 304 performs multi-core linear combination processing on the obtained multiple core spaces to obtain a synthesis core space; so that the evaluation unit 305 can acquire the credit evaluation result of the user based on the synthetic kernel space. According to the scheme, the credit of the user can be accurately evaluated according to the plurality of categories of information obtained by classifying the heterogeneous data of the user, so that the defects that the information of the user is uniformly processed, the credit of the user is evaluated according to a characteristic matrix formed by directly splicing all characteristics and the like are effectively overcome, and the evaluation result is more reliable.
An embodiment of the present invention further provides a server, as shown in fig. 13, which shows a schematic structural diagram of the server according to the embodiment of the present invention, specifically:
the server may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the server architecture shown in FIG. 13 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the server, connects various parts of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the server. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The server further includes a power supply 403 for supplying power to each component, and preferably, the power supply 403 may be logically connected to the processor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The server may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 401 in the server loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
acquiring heterogeneous data related to a user, and classifying the heterogeneous data to obtain a plurality of category information; acquiring vector information corresponding to each category information in the plurality of category information, and acquiring a feature space corresponding to each category information according to the vector information; respectively mapping the feature space corresponding to each category information into a kernel space to obtain a plurality of kernel spaces; performing multi-core linear combination processing on the plurality of core spaces to obtain a synthetic core space; and acquiring a credit evaluation result of the user according to the synthetic kernel space.
Optionally, the step of obtaining vector information corresponding to each of the plurality of category information, and obtaining a feature space corresponding to each of the category information according to the vector information may include: extracting candidate category information from the plurality of category information; constructing a directed weighted network corresponding to each candidate category information, and acquiring a node vector of the directed weighted network; and generating a feature space corresponding to the category information according to the node vector.
Optionally, the step of generating the feature space corresponding to the category information according to the node vector may include:
acquiring a transfer record and a mobile payment record of transaction information; constructing a directed authorized network corresponding to the transfer record; acquiring a node vector of a directed weighted network; acquiring a feature vector of a mobile payment record; and generating a feature space corresponding to the transaction information according to the node vector and the feature vector.
Optionally, the step of generating the feature space corresponding to the category information according to the node vector may include:
acquiring multidimensional behavior characteristics of behavior information; constructing a directed weighted network corresponding to each dimension of behavior characteristics;
acquiring a node vector of each directed weighted network; and generating a feature space corresponding to the behavior information according to the node vector of each directed weighted network.
Optionally, the category information includes attribute information, and the step of obtaining vector information corresponding to each of the plurality of category information, and obtaining a feature space corresponding to each of the category information according to the vector information may include:
acquiring multi-dimensional attribute characteristics of attribute information in a plurality of types of information; coding the multi-dimensional attribute characteristics to obtain attribute coding information; and generating a feature space corresponding to the attribute information according to the attribute coding information.
Optionally, the step of performing multi-kernel linear combination processing on the multiple kernel spaces to obtain the synthesized kernel space may include: normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces; acquiring a weight value corresponding to each category information in a plurality of category information; and performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
Optionally, the step of obtaining the credit evaluation result of the user according to the synthetic kernel space may include: and calculating the credit score of the user according to the synthetic kernel space through a preset regression model.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
As can be seen from the above, the embodiment of the present invention can classify heterogeneous data related to a user into a plurality of category information; then, acquiring a feature space corresponding to each category information, mapping the feature space corresponding to each category information into a kernel space respectively, and performing multi-kernel linear combination processing on the obtained multiple kernel spaces to obtain a synthesized kernel space; therefore, the credit evaluation result of the user can be obtained according to the synthetic kernel space. According to the scheme, the credit of the user can be accurately evaluated according to the plurality of categories of information obtained by classifying the heterogeneous data of the user, so that the defects that the information of the user is uniformly processed, the credit of the user is evaluated according to a characteristic matrix formed by directly splicing all characteristics and the like are effectively overcome, and the evaluation result is more reliable.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a storage medium having stored therein a plurality of instructions, which can be loaded by a processor to perform the steps of any one of the credit evaluation methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:
acquiring heterogeneous data related to a user, and classifying the heterogeneous data to obtain a plurality of category information; acquiring vector information corresponding to each category information in the plurality of category information, and acquiring a feature space corresponding to each category information according to the vector information; respectively mapping the feature space corresponding to each category information into a kernel space to obtain a plurality of kernel spaces; performing multi-core linear combination processing on the plurality of core spaces to obtain a synthetic core space; and acquiring a credit evaluation result of the user according to the synthetic kernel space.
Optionally, the step of obtaining vector information corresponding to each of the plurality of category information, and obtaining a feature space corresponding to each of the category information according to the vector information may include: extracting candidate category information from the plurality of category information; constructing a directed weighted network corresponding to each candidate category information, and acquiring a node vector of the directed weighted network; and generating a feature space corresponding to the category information according to the node vector.
Optionally, the step of generating the feature space corresponding to the category information according to the node vector may include:
acquiring a transfer record and a mobile payment record of transaction information; constructing a directed authorized network corresponding to the transfer record; acquiring a node vector of a directed weighted network; acquiring a feature vector of a mobile payment record; and generating a feature space corresponding to the transaction information according to the node vector and the feature vector.
Optionally, the step of generating the feature space corresponding to the category information according to the node vector may include:
acquiring multidimensional behavior characteristics of behavior information; constructing a directed weighted network corresponding to each dimension of behavior characteristics;
acquiring a node vector of each directed weighted network; and generating a feature space corresponding to the behavior information according to the node vector of each directed weighted network.
Optionally, the category information includes attribute information, and the step of obtaining vector information corresponding to each of the plurality of category information, and obtaining a feature space corresponding to each of the category information according to the vector information may include:
acquiring multi-dimensional attribute characteristics of attribute information in a plurality of types of information; coding the multi-dimensional attribute characteristics to obtain attribute coding information; and generating a feature space corresponding to the attribute information according to the attribute coding information.
Optionally, the step of performing multi-kernel linear combination processing on the multiple kernel spaces to obtain the synthesized kernel space may include: normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces; acquiring a weight value corresponding to each category information in a plurality of category information; and performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
Optionally, the step of obtaining the credit evaluation result of the user according to the synthetic kernel space may include: and calculating the credit score of the user according to the synthetic kernel space through a preset regression model.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any of the credit evaluation methods provided in the embodiments of the present invention, the beneficial effects that can be achieved by any of the credit evaluation methods provided in the embodiments of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described again here.
The above detailed description is provided for a credit evaluation method, device and storage medium according to embodiments of the present invention, and the principles and embodiments of the present invention are described herein by using specific examples, and the description of the above embodiments is only used to help understanding the method and its core ideas of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (12)
1. A credit evaluation method, comprising:
acquiring heterogeneous data related to a user, and classifying the heterogeneous data to obtain a plurality of types of information, wherein the plurality of types of information comprise transaction information, behavior information and attribute information;
acquiring vector information corresponding to each category information in the plurality of category information, and acquiring a feature space corresponding to each category information according to the vector information, wherein the feature space comprises a vector space formed by the vector information of the category information, and the vector information comprises a feature vector and a node vector;
respectively mapping the feature space corresponding to each category information into a kernel space to obtain a plurality of kernel spaces, wherein the kernel spaces comprise inner products between every two feature vectors or inner products between every two node vectors;
normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces;
acquiring a weight value corresponding to each category information in the plurality of category information;
performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space;
acquiring a training sample set, and dividing the training sample set into a plurality of category information sets;
mapping the plurality of sets of class information to a kernel space;
generating a target function according to the kernel space and a preset regression function;
processing the target function through a Lagrange dual algorithm to generate a regression model;
and calculating the credit score of the user according to the synthetic kernel space through the regression model.
2. The method according to claim 1, wherein the step of obtaining vector information corresponding to each of the plurality of category information, and the step of obtaining feature space corresponding to each category information according to the vector information comprises:
extracting candidate category information from the plurality of category information;
constructing a directed weighted network corresponding to each candidate category information, and acquiring a node vector of the directed weighted network;
and generating a feature space corresponding to the category information according to the node vector.
3. The credit evaluation method of claim 2, wherein the candidate category information includes transaction information, the constructing a directed weighted network corresponding to each candidate category information, and obtaining a node vector of the directed weighted network, and the generating a feature space corresponding to the category information according to the node vector includes:
acquiring a transfer record and a mobile payment record of the transaction information;
constructing a directed authorized network corresponding to the transfer record;
acquiring a node vector of the directed weighted network;
acquiring a feature vector of the mobile payment record;
and generating a feature space corresponding to the transaction information according to the node vector and the feature vector.
4. The credit evaluation method of claim 3, wherein the step of obtaining the node vector of the directed weighted network comprises:
calculating the estimated connection probability and the empirical connection probability between every two nodes in the directed weighted network;
calculating the distribution difference between the estimated connection probability and the empirical connection probability to obtain a first objective function;
calculating context pre-estimated probability and context empirical probability between every two nodes in the directed weighted network;
calculating the distribution difference between the context pre-estimated probability and the context empirical probability to obtain a second objective function;
and acquiring the node vector of the directed weighted network according to the first objective function and the second objective function.
5. The method according to claim 4, wherein the step of obtaining the node vector of the directed weighted network according to the first objective function and the second objective function comprises:
optimizing the first objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the first objective function;
optimizing the second objective function through a random gradient descent algorithm to obtain a node low-dimensional vector under the second objective function;
and splicing the node low-dimensional vector under the first objective function and the node low-dimensional vector under the second objective function to obtain the node vector of the directed weighted network.
6. The credit evaluation method of claim 3, wherein the step of obtaining the feature vector of the mobile payment record comprises:
acquiring multidimensional payment characteristics of the mobile payment record;
coding the multidimensional payment characteristics to obtain payment coding information;
and generating a feature vector of the payment record according to the payment coding information.
7. The credit evaluation method of claim 6, wherein the step of encoding the multi-dimensional payment characteristics to obtain payment encoding information comprises:
converting non-numeric type payment features in the multi-dimensional payment features into numeric type payment features;
and carrying out discretization processing on the converted numerical type payment characteristics and the numerical type payment characteristics in the multi-dimensional payment characteristics to obtain payment coding information.
8. The method according to claim 2, wherein the candidate category information includes behavior information, the constructing a directed weighted network corresponding to each candidate category information, and the obtaining a node vector of the directed weighted network, and the generating a feature space corresponding to the category information according to the node vector includes:
acquiring multidimensional behavior characteristics of the behavior information;
constructing a directed weighted network corresponding to each dimension of behavior characteristics;
acquiring a node vector of each directed weighted network;
and generating a feature space corresponding to the behavior information according to the node vector of each directed weighted network.
9. The method according to claim 1, wherein the category information includes attribute information, and the step of obtaining vector information corresponding to each of the plurality of category information includes the step of obtaining a feature space corresponding to each of the plurality of category information according to the vector information:
acquiring multi-dimensional attribute characteristics of attribute information in the plurality of types of information;
coding the multi-dimensional attribute characteristics to obtain attribute coding information;
and generating a feature space corresponding to the attribute information according to the attribute coding information.
10. The credit evaluation method of claim 9, wherein the step of encoding the multidimensional attribute feature to obtain attribute encoding information comprises:
converting non-numerical type attribute features in the multi-dimensional attribute features into numerical type attribute features;
and generating attribute coding information according to the numerical value type attribute characteristics obtained by conversion and the numerical value type attribute characteristics in the multi-dimensional attribute characteristics.
11. A credit evaluation apparatus, comprising:
the information acquisition unit is used for acquiring heterogeneous data related to a user and classifying the heterogeneous data to obtain a plurality of types of information, wherein the plurality of types of information comprise transaction information, behavior information and attribute information;
the characteristic obtaining unit is used for obtaining vector information corresponding to each category information in the plurality of category information, and obtaining a characteristic space corresponding to each category information according to the vector information, wherein the characteristic space comprises a vector space formed by the vector information of the category information, and the vector information comprises a characteristic vector and a node vector;
the first mapping unit is used for respectively mapping the feature space corresponding to each category information into a kernel space to obtain a plurality of kernel spaces, wherein the kernel space comprises an inner product between every two feature vectors or an inner product between every two node vectors;
the synthesis unit is used for carrying out multi-core linear combination processing on the plurality of core spaces to obtain a synthesis core space;
the information set acquisition unit is used for acquiring a training sample set and dividing the training sample set into a plurality of category information sets;
a second mapping unit, configured to map the plurality of category information sets to a kernel space;
the target function generating unit is used for generating a target function according to the kernel space and a preset regression function;
the model generating unit is used for processing the target function through a Lagrangian dual algorithm to generate a regression model;
the evaluation unit is used for calculating the credit score of the user according to the synthetic kernel space through the regression model;
wherein the synthesis unit is specifically configured to:
normalizing the plurality of kernel spaces to obtain a plurality of normalized kernel spaces;
acquiring a weight value corresponding to each category information in the plurality of category information;
and performing multi-core linear combination processing according to the weight value corresponding to each category information and each normalized core space to obtain a synthesized core space.
12. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the credit assessment method of any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810036839.9A CN110046981B (en) | 2018-01-15 | 2018-01-15 | Credit evaluation method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810036839.9A CN110046981B (en) | 2018-01-15 | 2018-01-15 | Credit evaluation method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110046981A CN110046981A (en) | 2019-07-23 |
CN110046981B true CN110046981B (en) | 2022-03-08 |
Family
ID=67272847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810036839.9A Active CN110046981B (en) | 2018-01-15 | 2018-01-15 | Credit evaluation method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110046981B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717377B (en) * | 2019-08-26 | 2021-01-12 | 平安科技(深圳)有限公司 | Face driving risk prediction model training and prediction method thereof and related equipment |
CN112446777B (en) * | 2019-09-03 | 2023-11-17 | 腾讯科技(深圳)有限公司 | Credit evaluation method, device, equipment and storage medium |
CN110796269B (en) * | 2019-09-30 | 2023-04-18 | 北京明略软件系统有限公司 | Method and device for generating model, and method and device for processing information |
CN111553800B (en) * | 2020-04-30 | 2023-08-25 | 上海商汤智能科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113362162A (en) * | 2021-06-29 | 2021-09-07 | 深圳壹账通智能科技有限公司 | Wind control identification method and device based on network behavior data, electronic equipment and medium |
CN114039868B (en) * | 2021-11-09 | 2023-08-18 | 广东电网有限责任公司江门供电局 | Value added service management method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008126998A1 (en) * | 2007-04-17 | 2008-10-23 | Hyun Uk Shin | Interpersonal loan brokerage system and method |
CN104850939A (en) * | 2015-04-28 | 2015-08-19 | 信而量数据科技(上海)有限公司 | Information management system and method based on personal credit data |
CN106952052A (en) * | 2017-04-06 | 2017-07-14 | 东北林业大学 | Based on hybrid weight core principle component analysis enterprise supplier evaluation method |
CN107133865A (en) * | 2016-02-29 | 2017-09-05 | 阿里巴巴集团控股有限公司 | A kind of acquisition of credit score, the output intent and its device of characteristic vector value |
CN107292463A (en) * | 2016-03-30 | 2017-10-24 | 阿里巴巴集团控股有限公司 | A kind of method and system that the project evaluation is carried out to application program |
CN107481132A (en) * | 2017-08-02 | 2017-12-15 | 上海前隆信息科技有限公司 | A kind of credit estimation method and system, storage medium and terminal device |
-
2018
- 2018-01-15 CN CN201810036839.9A patent/CN110046981B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008126998A1 (en) * | 2007-04-17 | 2008-10-23 | Hyun Uk Shin | Interpersonal loan brokerage system and method |
CN104850939A (en) * | 2015-04-28 | 2015-08-19 | 信而量数据科技(上海)有限公司 | Information management system and method based on personal credit data |
CN107133865A (en) * | 2016-02-29 | 2017-09-05 | 阿里巴巴集团控股有限公司 | A kind of acquisition of credit score, the output intent and its device of characteristic vector value |
CN107292463A (en) * | 2016-03-30 | 2017-10-24 | 阿里巴巴集团控股有限公司 | A kind of method and system that the project evaluation is carried out to application program |
CN106952052A (en) * | 2017-04-06 | 2017-07-14 | 东北林业大学 | Based on hybrid weight core principle component analysis enterprise supplier evaluation method |
CN107481132A (en) * | 2017-08-02 | 2017-12-15 | 上海前隆信息科技有限公司 | A kind of credit estimation method and system, storage medium and terminal device |
Non-Patent Citations (1)
Title |
---|
Line: Large-scale information network embedding;Tang J等;《Proceedings of the 24th international conference on world wide web》;20150331;第1067-1077页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110046981A (en) | 2019-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110046981B (en) | Credit evaluation method, device and storage medium | |
Bilal et al. | Big Data in the construction industry: A review of present status, opportunities, and future trends | |
US11886955B2 (en) | Self-supervised data obfuscation in foundation models | |
Lenz et al. | Measuring the diffusion of innovations with paragraph vector topic models | |
CN112215604B (en) | Method and device for identifying transaction mutual-party relationship information | |
TW201822098A (en) | Computer device and method for predicting market demand of commodities | |
Fu et al. | A sentiment-aware trading volume prediction model for P2P market using LSTM | |
Bazionis et al. | A review of short‐term wind power probabilistic forecasting and a taxonomy focused on input data | |
Zhang et al. | Describe the house and I will tell you the price: House price prediction with textual description data | |
Stødle et al. | Data‐driven predictive modeling in risk assessment: Challenges and directions for proper uncertainty representation | |
Li et al. | Stock market analysis using social networks | |
Feng | Data Analysis and Prediction Modeling Based on Deep Learning in E‐Commerce | |
CN107644042B (en) | Software program click rate pre-estimation sorting method and server | |
Huynh et al. | Causal inference in econometrics | |
CN110213239B (en) | Suspicious transaction message generation method and device and server | |
Wang et al. | Improving the Method of Short-term Forecasting of Electric Load in Distribution Networks using Wavelet transform combined with Ridgelet Neural Network Optimized by Self-adapted Kho-Kho Optimization Algorithm | |
CN117911079A (en) | Personalized merchant marketing intelligent recommendation method and system | |
Wei | e‐Commerce Online Intelligent Customer Service System Based on Fuzzy Control | |
CN116862658A (en) | Credit evaluation method, apparatus, electronic device, medium and program product | |
WO2024091291A1 (en) | Self-supervised data obfuscation in foundation models | |
CN116304352A (en) | Message pushing method, device, equipment and storage medium | |
CN116975622A (en) | Training method and device of target detection model, and target detection method and device | |
Tang et al. | Stock Price Prediction Based on Natural Language Processing1 | |
CN116910341A (en) | Label prediction method and device and electronic equipment | |
CN117009883B (en) | Object classification model construction method, object classification method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |