CN110019996A - A kind of family relationship recognition methods and system - Google Patents
A kind of family relationship recognition methods and system Download PDFInfo
- Publication number
- CN110019996A CN110019996A CN201711309415.7A CN201711309415A CN110019996A CN 110019996 A CN110019996 A CN 110019996A CN 201711309415 A CN201711309415 A CN 201711309415A CN 110019996 A CN110019996 A CN 110019996A
- Authority
- CN
- China
- Prior art keywords
- index
- family
- numbers
- model
- indexes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000011664 signaling Effects 0.000 claims abstract description 85
- 238000007477 logistic regression Methods 0.000 claims description 34
- 238000004891 communication Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000013075 data extraction Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 230000007812 deficiency Effects 0.000 abstract 1
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
- H04L41/5061—Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the interaction between service providers and their network customers, e.g. customer relationship management
- H04L41/5064—Customer relationship management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Business, Economics & Management (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of family relationship recognition methods and system, which comprises S1, obtains that there are two numbers of message registration as kinsfolk to be identified, and is extracted in two numbers respectively for assessing the index of family relationship between two numbers;The index includes signaling position data target, and characterizes the index of relevance between two numbers, and signaling position data target is the index for characterizing kinsfolk's community life position;S2, based on the family relationship identification model trained, the family relationship of two numbers is identified;Wherein, the family relationship identification model is Logic Regression Models, and in the Logic Regression Models, the index model coefficient of signaling position data target is greater than the index model coefficient of other indexs.Relative stability based on human society relationship and life habit is considered, and the importance of signaling position data is adaptively enhanced, and makes up the deficiency of conventional model, more acurrate to divide user reasonably as kinsfolk's relationship.
Description
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a family relationship identification method and system.
Background
With the popularization of smart phones, smart wearing and surge of smart home devices have led to personal needs to home needs, and communication operators, mobile phone manufacturers, home appliance manufacturers, security equipment manufacturers, software manufacturers and the like aim at home application markets. For the china mobile, the home market has a wide growth space, and besides services such as a mobile phone communication card and a home short number network, the home broadband, and the development and layout of industry chains such as an IPTV and a home intelligent device constructed on the broadband are provided.
Identification of home users is one of the major points based on the exploitation needs of the home market. An existing household user identification model is usually a social network model constructed based on data such as call records of users, and a closely-connected group is mined through a community discovery algorithm to serve as a suspected household client. The method is generally as follows: taking the call record of the user as the basis for constructing the connection; after the connection relation between the users is determined, a tightly-connected community is divided by utilizing a community division algorithm and the like, and the tightly-connected community is used as a suspected family client.
The traditional family relation member identification model has the following defects because the call record is used as the basis for connecting two numbers: firstly, the established family member relationship is easily interfered by intermediate nodes with larger out-degree and in-degree, for example, the group such as a house property intermediary and a takeaway need to take the conversation maintenance client relationship as a means, and two non-family member groups are easily divided into the same family due to the existence of the intermediate nodes when the community is divided; secondly, accidental call behaviors also cause interference on the stable family relationship, and the accidental nodes are not identified and eliminated during the construction of the traditional model, so that the same family membership relationship divided by data training in different months has great difference; thirdly, the traditional model ignores the relation of the user on the geographical position, and the common living position of family members is an important index for identifying the family relation, so that the identification basis of the traditional model is not comprehensive enough, and the stability and the accuracy of the obtained result are not high.
Disclosure of Invention
The invention provides a family relation identification method and a family relation identification system which overcome the problems or at least partially solve the problems, and solves the problems that family relation identification in the prior art is easily interfered by intermediate nodes and cannot be effectively identified by combining geographical positions.
According to an aspect of the present invention, there is provided a family relationship identification method, including:
s1, acquiring two numbers with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
Preferably, the signaling location data indexes include the same number of night signaling location cells, the same number of resident top10 signaling location cells, and the same number of weekend resident top10 signaling location cells.
Preferably, step S1 is preceded by:
constructing a multi-dimensional index for evaluating the family relation of the two numbers, and carrying out logistic regression model training on sample data;
and adjusting the index model coefficient of each index to enable the index model coefficient of the signaling position data index to be larger than the index model coefficients of other indexes, and establishing a logistic regression model based on the signaling position data.
Preferably, the constructing of the multidimensional index for evaluating the relationship between two numbers includes:
constructing a multi-dimensional index for evaluating the family relationship between the two numbers;
performing box separation processing on the indexes, calculating the evidence weight WOE value of each index, and calculating the information value IV value of each index according to the WOE value;
and sorting the indexes in a descending order according to the IV values, and selecting the first 20 percent of indexes as the indexes with strong prediction capability.
Preferably, the sample data includes a positive sample and a negative sample, the positive sample is two numbers of the same family short number network, and the negative sample is two numbers of a call record existing in a non-same family short number network.
Preferably, the two numbers of the positive sample satisfy at the same time: belonging to the same family short number network, having mutual payment relationship and the same resident cell.
Preferably, adjusting the index model coefficient of each index to make the index model coefficient of the signaling location data index larger than the index model coefficients of other indexes includes:
signaling location data indicators ofCorresponding index model coefficient is
Establishing punishment item based on index model coefficientLambda is a punishment coefficient, and s is the total index number;
and constraining the index model coefficient of each non-signaling position data index through a punishment item so as to enable the index model coefficient of the signaling position data index to be larger than the index model coefficient of the non-signaling position data index.
A family relationship identification system comprising:
the number pair extraction module is used for extracting two numbers with call records as family members to be identified;
the data extraction module is used for extracting indexes used for evaluating the family relation of the two numbers from the two numbers;
the family relation recognition and calculation module is used for recognizing the family relation of the two numbers based on the trained family relation recognition model;
the family relation identification model is a logistic regression model based on a signaling position, in the logistic regression model based on the signaling position, index model coefficients of signaling position data indexes are larger than index model coefficients of other indexes, and the signaling position data indexes are indexes representing common living positions of family members.
A family relationship recognition device, comprising:
at least one processor, at least one memory, a communication interface, and a bus; wherein,
the processor, the memory and the communication interface complete mutual communication through the bus;
the communication interface is used for information transmission between the test equipment and the communication equipment of the display device;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the family relationship identification method as described above.
A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the family relationship recognition method as described above.
The invention provides a family relation recognition method and a family relation recognition system, wherein two numbers with a determined family relation are selected as positive samples, two numbers with a call record are extracted as negative sample data, important variables are screened by using an IV value, then an adaptive logistic regression model based on a signaling position is constructed, the signaling position data is ensured to have higher importance, the family relation existing among users is effectively established, the importance of the signaling position data is adaptively enhanced based on the relative stability consideration of human social relation and life habits during model training, the defects of a traditional model are overcome, the stability of traditional model recognition is improved, the misjudgment rate of an algorithm is reduced, and the users are more accurately and reasonably divided into family membership.
Drawings
Fig. 1 is a flow chart of a family relationship identification method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a specific flow chart of a family relationship identification method according to an embodiment of the present invention
Fig. 3 is a schematic diagram of application of a family relation recognition model according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 1 and fig. 2, a family relationship identification method is shown, which includes:
s1, acquiring two numbers (called number pairs) with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
Specifically, in this embodiment, the signaling location data indexes include the same number of night signaling location cells, the same number of resident top10 signaling location cells, and the same number of weekend resident top10 signaling location cells.
In this embodiment, step S1 is preceded by:
constructing a multi-dimensional index for evaluating the family relation of the two numbers, and carrying out logistic regression model training on sample data;
and adjusting the index model coefficient of each index to enable the index model coefficient of the signaling position data index to be larger than the index model coefficients of other indexes, and establishing a logistic regression model based on the signaling position data.
Specifically, constructing a multidimensional index for evaluating the relationship between two numbers includes:
constructing a multi-dimensional index for evaluating the family relationship between the two numbers;
in this embodiment, specifically, the sample data includes a positive sample and a negative sample, where the positive sample is a number pair of the same home short number network, and the negative sample is a number pair having a call record in a non-same home short number network.
To establish a family relationship identification model among numbers, a part of standard positive samples are selected as a reference system, which can distinguish which number pairs with call records are more likely to have family relationship and which number pairs with call records are less likely to have family relationship, and the subsequent identification model is constructed based on the data for analysis.
In order to select the number pair with a more determined family relationship as a positive sample, taking the number pair handling the same family short number network as a main basis, and in order to make the positive sample data more accurate, the number pair handling the same family short number network is further selected from the payment relationship and the geographic position as the positive sample, so that the deviation generated by training the model result is avoided, namely the number pair extracted as the positive sample needs to simultaneously meet the following three conditions:
1) belonging to the same family short number network;
2) there is a mutual-replacement payment relationship;
3) the same resident cell (the same high-frequency base station at night).
And (3) extracting number pairs with call records in the non-same family short number network as negative samples, wherein the ratio of the positive samples to the negative samples is 1:10, and the collection of the positive samples and the negative samples is standard sample data and is used for the subsequent training of the self-adaptive family relation recognition model based on the signaling position.
Performing box separation processing on the indexes, calculating the evidence weight WOE value of each index, and calculating the information value IV value of each index according to the WOE value;
in order to evaluate the family membership between two numbers, a multidimensional index system is constructed: the method comprises the following steps of real-name registration information similarity, payment account association, common contact circle information and other information.
If too many variables are included in the modeling process, the multiple collinearity causes the statistical test of partial variables to be not significant, and the model interpretability is reduced and the model accuracy is affected, so that the variable selection is necessary.
In order to select an index which has a significant effect on the model, an Information Value (IV) value is calculated mainly according to a weight of evidence (WOE) value, and the index is selected according to the value of the IV value. The IV value may measure the difference between the distribution of index values for the family member number pairs and the distribution of index values for the non-family member number pairs.
To calculate the WOE and IV values of the indicators, the indicators need to be binned. For the continuous index, a reasonable binning is to make the data amount in each bin more balanced, not too much or too little, and the proportion of negative samples in each bin should show a monotone rising or falling trend, here, a WOE value is used, which can measure the trend condition of each bin, and is also the variable input of the subsequent regression model, and the calculation formula is as follows:
WOE ═ ln (positive sample fraction/negative sample fraction) × 100%
IV represents the value of information, or amount of information, used to measure the predictive power of a variable. The information value should be as large as possible, and the larger the information value is, the stronger the discrimination ability of the evaluation index is. The IV value calculation formula for each index is as follows:
wherein n is the number of boxes of each index. For discrete indexes, when the values of the indexes are not large, the values can be directly taken as sub-boxes and the WOE value and the IV value can be obtained; when the values are more, some values can be combined, and then the corresponding WOE value and IV value are obtained.
And sorting the indexes in a descending order according to the IV value of each index, selecting the indexes which have obvious effect on the model and are sorted by 20 percent in the top order, performing model training, and removing the indexes with weak prediction capability.
The multiple logistic regression is widely used in the discriminant model, the structure is simple, and the function of the coefficient is easy to explain in business. And respectively identifying the dependent variables of the extracted positive and negative samples by using 1 and 0, and enabling all indexes screened by using IV values to enter a logistic regression model.
In this embodiment, the probability of each relationship being a positive sample can be represented by P, and the logistic regression model can be represented as:
wherein xi(i-1, 2.. multidot.s) is used as an index, s represents the index number, and since the value of P is between 0 and 1, the value range can be converted into any real value after logical conversion, β - β -needs to be solved0,β1,...,βs)TThe model training solving formula is as follows:
in addition, based on human social relations and life habits, the night rest period and the weekend rest period are periods when family members are more likely to be present at the same place, so that a position information index needs to be introduced to reflect the family member relations in terms of geographic position and time. The position information indicators are: the same number of night signaling location cells, the same number of resident top10 signaling location cells, the same number of weekend resident top10 signaling location cells, and the like.
In a general logistic regression model, there is no emphasis on all variables entering the model, but in practical application, the signaling location data index is an important variable for measuring whether the family membership is established, and the signaling location data index is correspondingly emphasized.
Therefore, in this embodiment, adjusting the index model coefficient of each index to make the index model coefficient of the signaling location data index greater than the index model coefficients of other indexes specifically includes:
signaling location data indicators ofCorresponding index model coefficient is
Thus, model training solvesIn the process, in order to ensure that the indexes in the aspect of signaling position data contribute higher weight in the model, adding a penalty term is considered, and the penalty term is established based on the index model coefficientλ is a penalty coefficient and is a constant; s is the total index number;
the index model coefficient of each non-signaling position data index is restricted by a punishment item so as to ensure that the index model coefficient of the signaling position data index is larger than the index model coefficient of the non-signaling position data index, namely the position signaling data index is obtainedIndex model coefficient ofMust be larger than the other indices.
In summary, the logistic regression model β based on the signaling location is given (β)0,β1,...,βs)TThe estimate of (d) is defined as:
solving β ═ using positive and negative sample data (β)0,β1,...,βs)TAnd then obtaining a self-adaptive logistic regression family relation identification model for evaluating whether the two numbers form a stable family relation. The model expression obtained by solving is as follows:
and evaluating whether the number of each connecting line in the original family relation is established to the family member relation of the number pair with the call record. If the family relationship is judged to be established through the model, the connecting line is reserved, otherwise, the connecting line is deleted, and adaptive adjustment is performed to obtain the final stable family member relationship, specifically, as shown in fig. 2, after the identification is performed by the method of the embodiment, the family relationship between a and B is judged to be not established, and the family relationship between B and C is judged to be not established, so that the corresponding connecting line is deleted, and adaptive adjustment is performed to obtain the final stable family member relationship.
The embodiment further provides a family relationship identification system, including:
the number pair extraction module is used for extracting the number pair with the call record as a family member to be identified;
the data extraction module extracts indexes used for evaluating the family relation of the two numbers in the number pair;
the family relation recognition and calculation module is used for recognizing the family relation of the number pair based on the trained family relation recognition model;
the family relation identification model is a logistic regression model based on a signaling position, in the logistic regression model based on the signaling position, index model coefficients of signaling position data indexes are larger than index model coefficients of other indexes, and the signaling position data indexes are indexes representing common living positions of family members.
The embodiment further provides a family relationship identifying device, including: a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus;
wherein,
the processor, the memory and the communication interface complete mutual communication through the bus;
the communication interface is used for information transmission between the test equipment and the communication equipment of the display device;
the processor is configured to call program instructions in the memory to perform the family relationship identification method provided by the above embodiments of the method, for example, including:
s1, acquiring two numbers with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
The embodiment further provides a family relationship identifying device, including:
at least one processor, at least one memory, a communication interface, and a bus; wherein,
the processor, the memory and the communication interface complete mutual communication through the bus;
the communication interface is used for information transmission between the test equipment and the communication equipment of the display device;
the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method provided by the method embodiments, for example, the method includes:
s1, acquiring two numbers with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
The present embodiments also disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments, for example, comprising:
s1, acquiring two numbers with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
The present embodiments also provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the above method embodiments, for example, including:
s1, acquiring two numbers with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the test equipment and the like of the display device are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
In summary, the invention provides a family relationship recognition method and system, which extracts the number pair with the call record as a positive sample by using the selected number pair with a more determined family relationship as a negative sample data, screens important variables by using an IV value, and then constructs an adaptive logistic regression model based on a signaling position, so as to ensure that the signaling position data has higher importance, effectively establish the family relationship existing among users, adaptively enhance the importance of the signaling position data based on the relative stability consideration of human social relationship and life habits during model training, make up the defects of the traditional model, improve the stability of the traditional model recognition, reduce the misjudgment rate of the algorithm, and more accurately and reasonably divide the users into family membership.
Finally, the method of the present invention is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A family relationship identification method is characterized by comprising the following steps:
s1, acquiring two numbers with call records as family members to be identified, and respectively extracting indexes for evaluating the family relationship between the two numbers from the two numbers; the indexes comprise a signaling position data index and an index representing the correlation between the two numbers, wherein the signaling position data index is an index representing the common living position of family members;
s2, identifying the family relation of the two numbers based on the trained family relation identification model;
the family relation identification model is a logistic regression model, and in the logistic regression model, index model coefficients of the signaling position data indexes are larger than index model coefficients of other indexes.
2. The family relationship identification method of claim 1, wherein the signaling location data indicators comprise the same number of nighttime signaling location cells, the same number of resident top10 signaling location cells, and the same number of weekend resident top10 signaling location cells.
3. The family relationship identification method according to claim 1, wherein the step S1 is preceded by:
constructing a multi-dimensional index for evaluating the family relation of the two numbers, and carrying out logistic regression model training on sample data;
and adjusting the index model coefficient of each index to enable the index model coefficient of the signaling position data index to be larger than the index model coefficients of other indexes, and establishing a logistic regression model based on the signaling position data.
4. The family relationship identification method according to claim 3, wherein constructing a multi-dimensional index for evaluating the family relationship of two numbers specifically comprises:
constructing a multi-dimensional index for evaluating the family relationship between the two numbers;
performing box separation processing on the indexes, calculating the evidence weight WOE value of each index, and calculating the information value IV value of each index according to the WOE value;
and sorting the indexes in a descending order according to the IV values, and selecting the first 20 percent of indexes as the indexes with strong prediction capability.
5. The family relationship identification method according to claim 3, wherein the sample data comprises a positive sample and a negative sample, the positive sample is two numbers of the same family short number network, and the negative sample is two numbers of the non-same family short number network where the call records exist.
6. A family relationship identification method as claimed in claim 5, wherein two numbers of the positive sample simultaneously satisfy: belonging to the same family short number network, having mutual payment relationship and the same resident cell.
7. The family relationship identification method according to claim 1, wherein adjusting the index model coefficient of each index so that the index model coefficient of the signaling location data index is greater than the index model coefficients of other indexes specifically comprises:
signaling location data indicators ofCorresponding index model coefficient is
Establishing punishment item based on index model coefficientLambda is a punishment coefficient, and s is the total index number;
and constraining the index model coefficient of each non-signaling position data index through a punishment item so as to enable the index model coefficient of the signaling position data index to be larger than the index model coefficient of the non-signaling position data index.
8. A family relationship identification system, comprising:
the number pair extraction module is used for extracting two numbers with call records as family members to be identified;
the data extraction module is used for extracting indexes used for evaluating the family relation of the two numbers from the two numbers;
the family relation recognition and calculation module is used for recognizing the family relation of the two numbers based on the trained family relation recognition model;
the family relation identification model is a logistic regression model based on a signaling position, in the logistic regression model based on the signaling position, index model coefficients of signaling position data indexes are larger than index model coefficients of other indexes, and the signaling position data indexes are indexes representing common living positions of family members.
9. A family relationship recognition device, comprising:
at least one processor, at least one memory, a communication interface, and a bus; wherein,
the processor, the memory and the communication interface complete mutual communication through the bus;
the communication interface is used for information transmission between the test equipment and the communication equipment of the display device;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 7.
10. A computer program product, characterized in that the computer program product comprises a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to carry out the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711309415.7A CN110019996A (en) | 2017-12-11 | 2017-12-11 | A kind of family relationship recognition methods and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711309415.7A CN110019996A (en) | 2017-12-11 | 2017-12-11 | A kind of family relationship recognition methods and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110019996A true CN110019996A (en) | 2019-07-16 |
Family
ID=67186887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711309415.7A Pending CN110019996A (en) | 2017-12-11 | 2017-12-11 | A kind of family relationship recognition methods and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110019996A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111581531A (en) * | 2020-05-08 | 2020-08-25 | 北京思特奇信息技术股份有限公司 | Family member structure identification method and device, storage medium and electronic equipment |
CN113065058A (en) * | 2020-01-02 | 2021-07-02 | 中国移动通信集团广东有限公司 | Family member identification method and device, electronic equipment and readable storage medium |
CN113378073A (en) * | 2020-03-10 | 2021-09-10 | 中国移动通信集团湖南有限公司 | User relationship identification method and device |
CN114143207A (en) * | 2020-08-14 | 2022-03-04 | 中国移动通信集团广东有限公司 | Home user identification method and electronic equipment |
CN115379051A (en) * | 2021-05-17 | 2022-11-22 | 中国联合网络通信集团有限公司 | Household user identification method, device and equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228371A (en) * | 2016-07-18 | 2016-12-14 | 南京坦道信息科技有限公司 | A kind of social network analysis based on the ultra-large user associating frequency and associate index and family relation recognizer |
CN106570014A (en) * | 2015-10-09 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and device for determining home attribute information of user |
-
2017
- 2017-12-11 CN CN201711309415.7A patent/CN110019996A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570014A (en) * | 2015-10-09 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and device for determining home attribute information of user |
CN106228371A (en) * | 2016-07-18 | 2016-12-14 | 南京坦道信息科技有限公司 | A kind of social network analysis based on the ultra-large user associating frequency and associate index and family relation recognizer |
Non-Patent Citations (3)
Title |
---|
刘荣辉: ""复杂电信社交网络中家庭群体的识别与应用"", 《工业工程与管理》 * |
沈志远: "《彩色超声血流成像中杂波抑制方法》", 31 March 2017 * |
王金珠: ""基于证据权重逻辑回归模型的P2P公司信用风险评估"", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113065058A (en) * | 2020-01-02 | 2021-07-02 | 中国移动通信集团广东有限公司 | Family member identification method and device, electronic equipment and readable storage medium |
CN113378073A (en) * | 2020-03-10 | 2021-09-10 | 中国移动通信集团湖南有限公司 | User relationship identification method and device |
CN113378073B (en) * | 2020-03-10 | 2023-04-07 | 中国移动通信集团湖南有限公司 | User relationship identification method and device |
CN111581531A (en) * | 2020-05-08 | 2020-08-25 | 北京思特奇信息技术股份有限公司 | Family member structure identification method and device, storage medium and electronic equipment |
CN111581531B (en) * | 2020-05-08 | 2023-06-09 | 北京思特奇信息技术股份有限公司 | Family member structure identification method and device, storage medium and electronic equipment |
CN114143207A (en) * | 2020-08-14 | 2022-03-04 | 中国移动通信集团广东有限公司 | Home user identification method and electronic equipment |
CN115379051A (en) * | 2021-05-17 | 2022-11-22 | 中国联合网络通信集团有限公司 | Household user identification method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110019996A (en) | A kind of family relationship recognition methods and system | |
CN110245213A (en) | Questionnaire generation method, device, equipment and storage medium | |
CN105281925B (en) | The method and apparatus that network service groups of users divides | |
CN106952159B (en) | Real estate collateral risk control method, system and storage medium | |
CN104040963A (en) | System and methods for spam detection using frequency spectra of character strings | |
CN112488716B (en) | Abnormal event detection system | |
CN112104642B (en) | Abnormal account number determination method and related device | |
CN104067567A (en) | Systems and methods for spam detection using character histograms | |
CN110008977B (en) | Clustering model construction method and device | |
CN110166344B (en) | Identity identification method, device and related equipment | |
CN110232405A (en) | Method and device for personal credit file | |
CN108319672A (en) | Mobile terminal malicious information filtering method and system based on cloud computing | |
CN109639478A (en) | There are the method, apparatus of family relationship client, equipment and media for identification | |
CN107358346A (en) | It is directed to the evaluation information treating method and apparatus of communication quality | |
CN115222303A (en) | Industry risk data analysis method and system based on big data and storage medium | |
CN107368499A (en) | A kind of client's tag modeling and recommendation method and device | |
CN113660687B (en) | Network difference cell processing method, device, equipment and storage medium | |
CN110457601A (en) | The recognition methods and device of social account, storage medium and electronic device | |
CN113850669A (en) | User grouping method and device, computer equipment and computer readable storage medium | |
CN107659982B (en) | Wireless network access point classification method and device | |
CN112269937A (en) | Method, system and device for calculating user similarity | |
WO2024001102A1 (en) | Method and apparatus for intelligently identifying family circle in communication industry, and device | |
CN110210884A (en) | Determine the method, apparatus, computer equipment and storage medium of user characteristic data | |
CN110400160B (en) | Method and device for identifying competitive product user, electronic equipment and storage medium | |
CN114861163A (en) | Abnormal account identification method, device, equipment, storage medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |