CN113506174A - Method, device and equipment for training risk early warning model of medium and small enterprises - Google Patents
Method, device and equipment for training risk early warning model of medium and small enterprises Download PDFInfo
- Publication number
- CN113506174A CN113506174A CN202110952873.2A CN202110952873A CN113506174A CN 113506174 A CN113506174 A CN 113506174A CN 202110952873 A CN202110952873 A CN 202110952873A CN 113506174 A CN113506174 A CN 113506174A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- enterprises
- sample
- medium
- small
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012549 training Methods 0.000 title claims abstract description 39
- 238000010801 machine learning Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims description 14
- 238000011161 development Methods 0.000 claims description 4
- 230000002068 genetic effect Effects 0.000 claims description 2
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 4
- 238000013024 troubleshooting Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 8
- 238000011835 investigation Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000012502 risk assessment Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012954 risk control Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Operations Research (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Finance (AREA)
- Educational Administration (AREA)
- Accounting & Taxation (AREA)
- Technology Law (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The application relates to a method, a device and equipment for training a risk early warning model of a medium-sized and small enterprise. The method comprises the steps of obtaining enterprise data of medium and small enterprises, selecting a positive sample and a negative sample according to a predefined sample division standard, extracting the features of the samples according to predefined basic attribute features, internal attribute features and external attribute features of the enterprises to obtain a plurality of feature variables, and training the feature variables by using a machine learning algorithm to obtain a risk early warning model of the medium and small enterprises. According to the arrangement, the acquisition of multi-point information and the evaluation of multiple risks can be realized, so that the credit risk of enterprises can be evaluated in a multidimensional and comprehensive manner, the risk can be actively predicted in advance and efficiently, the loan providers such as banks are prevented from generating great loss, and the risk troubleshooting difficulty and cost are reduced.
Description
Technical Field
The application relates to the technical field of enterprise risk early warning, in particular to a method, a device and equipment for training a risk early warning model of a medium-sized and small enterprise.
Background
Along with the development of economy, diversified and differentiated credit products of commercial banks are more and more, great impact is generated on the wind control means of the banks besides the fact that vast enterprises can conveniently obtain better financing, especially, the operation risk of small and medium-sized enterprises is remarkably improved, the banks lack corresponding data obtaining channels and wind control technology accumulation, and meanwhile, the risk expression forms of the enterprises are diversified, so that the bank risk identification difficulty is further increased. Therefore, how to find an effective and timely risk early warning method is an urgent pain point to be solved by loan providers such as commercial banks.
At present, traditional commercial bank client risk early warning is mainly based on the basic conditions of enterprises or traditional client financial indexes such as asset liability rate, and meanwhile, the timeliness of client risk early warning is further improved by trying to collect and analyze internet information such as public sentiment. However, in the process of due diligence investigation, credit review or post-loan management, the investigation, review and inspection methods used for risk early warning are still similar to traditional Chinese medicine inquiry, are mainly realized by 'looking' and 'smelling' and 'asking' and 'cutting' of credit granting subjects, are still limited to single-point information acquisition, single risk assessment, lack of a multi-dimensional and comprehensive credit risk assessment method, are difficult to proactively, prospectively and efficiently predict the occurrence of risks, risk items are often found when an enterprise approaches default, and once the enterprise goes out of insurance, the risk is too late. In addition, the early warning information has redundancy and high false alarm rate, and increases the difficulty and cost of risk investigation.
Disclosure of Invention
The application provides a method, a device and equipment for training a risk early warning model of a medium-sized and small-sized enterprise, which are used for solving the problems of insufficient timeliness, redundant early warning information, high false alarm rate, high risk troubleshooting difficulty and high cost of the existing risk early warning method at least to a certain extent.
The above object of the present application is achieved by the following technical solutions:
in a first aspect, an embodiment of the present application provides a method for training a risk early warning model of a medium-sized and small enterprise, including:
acquiring enterprise data of medium and small enterprises;
selecting a positive sample and a negative sample based on the enterprise data according to a predefined sample division standard; the positive sample is related data of an enterprise without loan overdue, and the negative sample is related data of an enterprise with loan overdue;
according to predefined enterprise basic attribute characteristics, internal attribute characteristics and external attribute characteristics, performing characteristic extraction on the selected positive sample and negative sample to obtain a plurality of characteristic variables; wherein the internal attribute features include, but are not limited to, features generated internally by the loan provider for the base data of the small and medium sized businesses, and the external attribute features include custom features generated by a particular third party business for the base data and the internal attribute features;
and training by using a machine learning algorithm based on the plurality of characteristic variables to obtain a risk early warning model of the medium and small enterprises.
Optionally, the predefined sample division criteria include: dividing enterprises with overdue debt days more than or equal to n days or with extension periods into default enterprises, wherein relevant data of the default enterprises are negative samples; dividing enterprises which normally repay after the loan is expired within a certain period into normal enterprises, wherein relevant data of the normal enterprises are positive samples; wherein n is a positive integer.
Optionally, the selecting a positive sample and a negative sample based on the enterprise data includes:
according to the preset condition of the expression period, different time point states existing in different time stages in the business life period of the enterprise are brought into the sample to be expanded so as to increase the sample amount; wherein the presentation period conditions include: and in a predefined time window, meeting the judgment condition of the default enterprise or meeting the judgment condition of the normal enterprise.
Optionally, when a positive sample is selected, taking data m months before the normal expiration of the loan as the positive sample; when a negative sample is selected, taking data of m months before an overdue time point which occurs more than n days for the first time or m months before an extended time point which occurs for the first time as the negative sample, wherein m is a positive integer.
Optionally, the extracting the features of the selected positive sample and the negative sample to obtain a plurality of feature variables includes:
dividing the selected medium and small enterprises into small and micro enterprises and general enterprises; wherein, the considered factors in the division comprise the enterprise scale and the industry;
and performing feature extraction on the selected positive sample and the negative sample to obtain feature variables for distinguishing the characteristic features of the small micro-enterprises and the feature variables for the common features of the common enterprises.
Optionally, the method includes, according to predefined enterprise basic attribute features, internal attribute features, and external attribute features, performing feature extraction on the selected positive sample and negative sample to obtain a plurality of feature variables, where:
for the same enterprise, a plurality of characteristics based on loan service varieties and guarantee modes are predefined.
Optionally, the training by using a machine learning algorithm based on the plurality of characteristic variables to obtain a risk early warning model for the medium-sized and small enterprises includes:
selecting the characteristic variables through a preset selection algorithm, and determining p characteristic variables with the highest importance degree, wherein p is a positive integer;
and training the p characteristic variables by using a machine learning algorithm to obtain a risk early warning model of the medium-sized and small enterprises.
Optionally, the internal attribute features include decision maker conditions, business conditions, fund conditions, risk features, loan information, and credit features; the external attribute features include enterprise profiles, ecological environments, genetic traits, soft strength, associated enterprises, contact information, business conditions, development continuances, risk features, innovation capabilities, and public attributes.
In a second aspect, an embodiment of the present application further provides a training device for a risk early warning model of a medium-sized and small enterprise, which includes:
The acquisition module is used for acquiring enterprise data of medium and small enterprises;
the selection module is used for selecting a positive sample and a negative sample based on the enterprise data according to a predefined sample division standard; the positive sample is related data of an enterprise without loan overdue, and the negative sample is related data of an enterprise with loan overdue;
the extraction module is used for extracting the characteristics of the selected positive sample and the negative sample according to the predefined enterprise basic attribute characteristics, internal attribute characteristics and external attribute characteristics to obtain a plurality of characteristic variables; wherein the internal attribute features include, but are not limited to, features generated internally by the loan provider for the base data of the small and medium sized businesses, and the external attribute features include custom features generated by a particular third party business for the base data and the internal attribute features;
and the training module is used for training by utilizing a machine learning algorithm based on the characteristic variables to obtain a risk early warning model of the medium-sized and small enterprises.
In a third aspect, an embodiment of the present application further provides an intelligent device, which includes:
a memory and a processor coupled to the memory;
the memory is used for storing a program, and the program is at least used for implementing the method for training the risk early warning model of the medium and small-sized enterprises in any one of the first aspect;
The processor is used for calling and executing the program stored in the memory.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the technical scheme provided by the embodiment of the application, enterprise data of medium and small enterprises are obtained, positive samples and negative samples are selected according to predefined sample division standards, the samples are subjected to feature extraction according to predefined enterprise basic attribute features, internal attribute features and external attribute features, a plurality of feature variables are obtained, and then the feature variables are trained by utilizing a machine learning algorithm, so that a medium and small enterprise risk early warning model is obtained. According to the arrangement, the acquisition of multi-point information and the evaluation of multiple risks can be realized, so that the credit risk of enterprises can be evaluated in a multidimensional and comprehensive manner, the risk can be actively predicted in advance and efficiently, the loan providers such as banks are prevented from generating great loss, and the risk troubleshooting difficulty and cost are reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flow chart of a method for training a risk early warning model of a medium-sized and small enterprise according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating predefined features in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a risk early warning model training device for medium and small enterprises according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an intelligent device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
In order to solve the problems, the application provides a method, a device and equipment for training a risk early warning model of a medium-sized and small-sized enterprise, and aims to construct a feature label by using data of the medium-sized and small-sized enterprise, customer data inside loan providers such as banks and the like and processing data of a specific third-party enterprise, train out the risk early warning model of the enterprise by using the feature label and sample data and by means of a machine learning technology, predict and judge the probability of default risks of the trust enterprise in the future, construct a default risk prediction mechanism suitable for the enterprise, deeply depict and quantify the default mechanism of the enterprise, and achieve benefit maximization between risk management and cost. The loan providers such as commercial banks can judge enterprises which are possibly overdue for repayment according to the early warning model, so that risk control measures such as loan collection urging, capital movement attention and the like are taken in advance, the credit risk is reduced, and the loss is reduced. The details of the embodiment are described below by way of examples.
Examples
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for training a risk early warning model of a medium-sized and small enterprise according to an embodiment of the present application. In this embodiment, a bank is taken as an example of a loan provider. It should be understood that the loan provider is not limited to just a bank, but may be other financial institutions. As shown in fig. 1, the method mainly comprises the following steps:
s101: acquiring enterprise data of medium and small enterprises;
where the total number of annual breaches continues to grow in view of the large amount of breached customer data that exists at commercial banks, these data are crucial to accurately assessing the probability of breach. Therefore, in this embodiment, the required enterprise data may originate from a certain bank, and the characteristic dimensions of the full-scale data of the enterprise clients of the bank are collected and sorted as much as possible, including but not limited to enterprise basic data, internal data of the bank, that is, data generated by processing the internal basic data of the selected bank (loan provider) for the enterprise, and external tag processing data, that is, data obtained by custom-processing the internal basic data of the enterprise and the internal data of the bank by a specific third-party enterprise, where the specific third-party enterprise refers to a third-party enterprise that specially analyzes and evaluates attributes of other enterprises.
Of course, it should be understood that other ways and approaches to obtaining enterprise data are possible, and are not limiting.
S102: selecting a positive sample and a negative sample based on the enterprise data according to a predefined sample division standard; the positive sample is related data of an enterprise without loan overdue, and the negative sample is related data of an enterprise with loan overdue;
in some specific embodiments, the predefined sample partition criteria includes: dividing enterprises with overdue debt days more than or equal to n days or with extension periods into default enterprises, wherein relevant data of the default enterprises are negative samples; dividing enterprises which normally repay after the loan is expired within a certain period into normal enterprises, wherein relevant data of the normal enterprises are positive samples; where n is a positive integer, such as 3.
It should be noted that, in the specific implementation process, when a positive sample is selected, based on 2021 years of application of the present application, related data of a normal payment enterprise after the loan is due in 2017 (that is, an enterprise that is overdue for more than n days or an extended period never occurs) can be selected, so that 2017 is selected as a time division point, and it is generally appropriate to take a state of each time slice in history as the sample needs to be traced back and combined with a practical situation, so that the state is about 3 years; similarly, 2017 is also selected as the traceable time for the specific third-party enterprise external tag processing data selected in the embodiment, and from the perspective of actual business, the risk condition of the last 3 years is closer to the risk condition of the current enterprise.
In addition, in practical applications, if the loan provider is a small or medium bank, it is a common situation that the sample size (especially the bad sample size) is insufficient. Therefore, when the positive sample and the negative sample are selected based on the enterprise data, different time point states existing in different time stages in the enterprise business service life can be brought into the samples for expansion according to preset expression period conditions so as to increase the sample amount; wherein the presentation period conditions include: and in a predefined time window, meeting the judgment condition of the default enterprise or meeting the judgment condition of the normal enterprise.
More specifically, because the payment mode of the enterprise is mainly due payment, the payment is generally paid on schedule before the payment is due. Therefore, a situation that the enterprise still can bear a small amount of interest, but the condition of the enterprise is in a poor state and does not normally settle loan settlement business may occur, and at this time, the corresponding enterprise can be regarded as the enterprise is not in the presentation period, that is, whether the enterprise is really a positive sample or not cannot be verified. Thus, in practice, when a positive sample is selected, the selected sample is the normal expired and settled traffic. The sample expansion mode can be suitable for various large, medium and small banks and is used for making up the problem of insufficient sample amount.
In addition, in consideration of a prediction scene, namely the final application of the scheme is to realize early warning and prediction of the risks of the medium and small enterprises, therefore, when a positive sample is selected, the data m months before the normal expiration of the loan is taken as the positive sample; when a negative sample is selected, taking data of m months before an overdue time point which occurs more than n days for the first time or m months before an extended time point which occurs for the first time as the negative sample. Wherein, the feasible value of m may be, for example, a positive integer such as 1, 2, or 3.
S103: according to predefined enterprise basic attribute characteristics, internal attribute characteristics and external attribute characteristics, performing characteristic extraction on the selected positive sample and negative sample to obtain a plurality of characteristic variables; wherein the internal attribute features include, but are not limited to, features generated internally by the loan provider for the base data of the small and medium sized businesses, and the external attribute features include custom features generated by a particular third party business for the base data and the internal attribute features;
specifically, the enterprise basic attribute feature, the internal attribute feature and the external attribute feature correspond to the enterprise basic data, the bank internal data and the external tag processing data, respectively, and when the enterprise basic attribute feature, the internal attribute feature and the external attribute feature are predefined, the features of the machine learning model can be defined according to various aspects of data applicability, comprehensiveness, interpretability, compliance, timeliness, accuracy and the like.
For enterprises in the credit observation period, basic attribute characteristics of historical innovation, historical operating conditions, historical financial conditions, current financial conditions, fund conditions, change rate, major changes and the like need to be considered. Especially, after an epidemic situation occurs, more characteristic variables (such as the change conditions of recent capital flow, secondary wages and the like) are supplemented, and 1-3 years of history and the conditions of 1-3 months are observed forwards at the sample time point, so that the long-term condition and the short-term condition of an enterprise are combined, and the state of the enterprise is more scientifically outlined.
In addition, in some embodiments, in the process of extracting the features of the selected positive sample and the negative sample to obtain a plurality of feature variables, the selected small and medium-sized enterprises can be divided into small and micro enterprises and general enterprises; wherein, the considered factors in the division comprise the enterprise scale and the industry; and further, performing feature extraction on the selected positive sample and the negative sample to obtain feature variables for distinguishing the characteristic features of the small and micro enterprises and the feature variables for distinguishing the common features of the general enterprises. Thus, the state of enterprises of different scales and industries can be better reflected.
In addition, the loan service varieties are considered to be different in degree of diversification. Under different loan service varieties and different guarantee modes, when selecting a sample, a plurality of services with clear results are selected for one enterprise and all taken as the sample to be brought in. Therefore, in order to extract the characteristic variables more favorably, before step S103, by analyzing the situation of the default business category, a plurality of characteristics based on the loan business category and the security method are defined in advance for the same company in terms of characteristics. And, according to practical experiments, in the mold-in characteristics, the characteristics are better represented.
S104: and training by using a machine learning algorithm based on the plurality of characteristic variables to obtain a risk early warning model of the medium and small enterprises.
The machine learning algorithm adopted by the training model can be selected according to actual needs, and is not particularly limited. The original classification type features are combined according to the internal relation of the features, feature dimensions are enriched, and the accuracy of a prediction result is improved. In addition, the training process can finally obtain a more accurate result through parameter adjustment and comparison experiments.
Further, in some embodiments, considering that although a large number of feature variables (usually up to several hundred) are extracted in the foregoing step, in practice, many feature variables may not greatly assist in training the model, in the process of training the model, in step S104, the multiple feature variables obtained in the foregoing step may be selected through a preset selection algorithm, and the p feature variables with the highest importance degree are determined; and then training the selected p characteristic variables by using a machine learning algorithm to obtain a risk early warning model of the medium and small enterprises. Wherein p is a positive integer, and the specific value thereof can be selected according to the actual situation, for example, can be 200.
The preset selection algorithm may be, for example, an IV value algorithm, etc., and the importance value of each feature variable in each sample may be calculated through the selection algorithm, so as to achieve the effect of interpretation.
According to the technical scheme provided by the embodiment of the application, enterprise data of medium and small enterprises are obtained, positive samples and negative samples are selected according to predefined sample division standards, and then the samples are subjected to feature extraction according to predefined enterprise basic attribute features, internal attribute features and external attribute features to obtain a plurality of feature variables, and then the feature variables are trained by utilizing a machine learning algorithm, so that a medium and small enterprise risk early warning model is obtained. According to the arrangement, the collection of multi-point information and the evaluation of multiple risks can be realized, so that the credit risk of enterprises can be evaluated in a multidimensional and comprehensive manner, the occurrence of risks can be actively predicted in a prospective manner and efficiently, the great loss of banks is avoided, and the difficulty and the cost of risk investigation are reduced.
On the basis of the scheme, the application also provides a specific implementation scheme for facilitating understanding and implementation. This is explained below with reference to fig. 2.
Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a process of predefining various features in the embodiment of the present application. As shown in fig. 2, in this embodiment, a default service definition is first performed, and the default scope includes: overdue principal, overdue interest, extended term and change contract due date, wherein overdue refers to more than or equal to n days. Then, sample selection is performed, and when positive and negative sample selection is performed, the following rule is followed: positive and negative samples are not overlapped, the positive and negative samples are selected according to the expression period, the time slices of the positive and negative samples are selected, the business range is removed, the weight of the historical samples is adjusted, and the balance of the historical samples is adjusted. And then, processing an internal label (namely internal attribute characteristic) by combining the existing data of the lender and the existing traceable external data source, wherein the obtained internal attribute characteristic comprises six dimensions of decision maker condition, operation condition, fund condition, risk characteristic, loan information and credit characteristic. And then deriving external attribute characteristics including enterprise outline, ecological environment, gene traits, soft strength, associated enterprises, contact information, operation conditions, development leather, risk characteristics, innovation capability and public attributes by combining the internal tag data of the lender and the tag of a specific third-party company.
Therefore, a large amount of characteristic data can be obtained based on enterprise customer basic data, enterprise decision maker data, credit service data, credit guarantee data, enterprise financial data, enterprise fund flow data, external credit investigation data, social security, electric quantity, wage data, related party risk data, judicial risk data and the like for extraction and derivation, and through time dimension establishing variation values, variation rates and other derivative variables.
In addition, based on the same inventive concept, the embodiment of the application also provides a device for training the risk early warning model of the medium and small enterprises, which corresponds to the method for training the risk early warning model of the medium and small enterprises.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a risk early warning model training device for medium and small enterprises according to an embodiment of the present application. As shown in fig. 3, the apparatus includes at least the following structure:
the acquiring module 31 is used for acquiring enterprise data of medium and small enterprises;
a selecting module 32, configured to select a positive sample and a negative sample based on the enterprise data according to a predefined sample division standard; the positive sample is related data of an enterprise without loan overdue, and the negative sample is related data of an enterprise with loan overdue;
The extraction module 33 is configured to perform feature extraction on the selected positive sample and the negative sample according to predefined enterprise basic attribute features, internal attribute features, and external attribute features to obtain a plurality of feature variables; wherein the internal attribute feature band is not limited to features generated by the loan provider internally for the base data of the medium-sized and small business, and the external attribute features include custom features generated by a particular third party business for the base data and the internal attribute features;
and the training module 34 is configured to perform training by using a machine learning algorithm based on the plurality of characteristic variables to obtain a risk early warning model for the medium-sized and small-sized enterprises.
The specific implementation process of the steps executed by the functional modules may refer to the foregoing method embodiment, and details are not described here.
In addition, the embodiment of the application also provides intelligent equipment for executing the risk early warning model training method for the medium and small enterprises. As shown in fig. 4, the smart device includes at least:
a memory 41 and a processor 42 connected to the memory 41;
the memory 41 is used for storing a program, and the program is at least used for implementing the method for training the risk early warning model of the medium and small enterprises described in the foregoing embodiments;
The processor 42 is used to call and execute the program stored in the memory 41.
For the specific implementation process of the method executed by the program, reference may be made to the foregoing method embodiment, which is not described herein again.
In addition, the embodiment of the present application further provides a storage medium, where a program is stored on the storage medium, and the program is called and executed by a processor, and is used to implement the method for training the risk early warning model of the medium and small-sized enterprises described in the foregoing embodiment.
Through the technical scheme, when the early warning model is trained, the acquisition of multi-point information and the evaluation of multiple risks can be realized, so that the credit risk of an enterprise can be evaluated in a multidimensional and comprehensive manner, further, the risk can be actively predicted in a prospective and efficient manner, great loss of loan providers such as banks is avoided, and the risk troubleshooting difficulty and cost are reduced.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. A method for training a risk early warning model of a medium-sized and small-sized enterprise is characterized by comprising the following steps:
acquiring enterprise data of medium and small enterprises;
selecting a positive sample and a negative sample based on the enterprise data according to a predefined sample division standard; the positive sample is related data of an enterprise without loan overdue, and the negative sample is related data of an enterprise with loan overdue;
according to predefined enterprise basic attribute characteristics, internal attribute characteristics and external attribute characteristics, performing characteristic extraction on the selected positive sample and negative sample to obtain a plurality of characteristic variables; wherein the internal attribute features include, but are not limited to, features generated by a loan provider internally for the base data of small and medium sized businesses, and the external attribute features include custom features generated by a particular third party business for the base data and the internal attribute features;
And training by using a machine learning algorithm based on the plurality of characteristic variables to obtain a risk early warning model of the medium and small enterprises.
2. The method of claim 1, wherein the predefined sample partitioning criteria comprises: dividing enterprises with overdue debt days more than or equal to n days or with extension periods into default enterprises, wherein relevant data of the default enterprises are negative samples; dividing enterprises which normally repay after the loan is expired within a certain period into normal enterprises, wherein relevant data of the normal enterprises are positive samples; wherein n is a positive integer.
3. The method of claim 2, wherein the selecting positive and negative examples based on the enterprise data comprises:
according to the preset condition of the expression period, different time point states existing in different time stages in the business life period of the enterprise are brought into the sample to be expanded so as to increase the sample amount; wherein the presentation period conditions include: and in a predefined time window, meeting the judgment condition of the default enterprise or meeting the judgment condition of the normal enterprise.
4. A method according to claim 2 or 3, wherein when a positive sample is selected, the data m months before the normal expiration of the loan is taken as the positive sample; when a negative sample is selected, taking data of m months before an overdue time point which occurs more than n days for the first time or m months before an extended time point which occurs for the first time as the negative sample, wherein m is a positive integer.
5. The method of claim 1, wherein the extracting features from the selected positive and negative samples to obtain a plurality of feature variables comprises:
dividing the selected medium and small enterprises into small and micro enterprises and general enterprises; wherein, the considered factors in the division comprise the enterprise scale and the industry;
and performing feature extraction on the selected positive sample and the negative sample to obtain feature variables for distinguishing the characteristic features of the small micro-enterprises and the feature variables for the common features of the common enterprises.
6. The method according to claim 1, wherein the extracting features of the selected positive and negative examples according to predefined enterprise basic attribute features, internal attribute features and external attribute features to obtain a plurality of feature variables, further comprising:
for the same enterprise, a plurality of characteristics based on loan service varieties and guarantee modes are predefined.
7. The method of claim 1, wherein the training by using a machine learning algorithm based on the plurality of characteristic variables to obtain a risk early warning model of the medium-sized and small-sized enterprises comprises:
selecting the characteristic variables through a preset selection algorithm, and determining p characteristic variables with the highest importance degree, wherein p is a positive integer;
And training the p characteristic variables by using a machine learning algorithm to obtain a risk early warning model of the medium-sized and small enterprises.
8. The method of claim 1, wherein the internal attribute features include decision maker status, business status, fund status, risk feature, loan information, and credit feature; the external attribute features include enterprise profiles, ecological environments, genetic traits, soft strength, associated enterprises, contact information, business conditions, development continuances, risk features, innovation capabilities, and public attributes.
9. The utility model provides a medium and small enterprise risk early warning model trainer which characterized in that includes:
the acquisition module is used for acquiring enterprise data of medium and small enterprises;
the selection module is used for selecting a positive sample and a negative sample based on the enterprise data according to a predefined sample division standard; the positive sample is related data of an enterprise without loan overdue, and the negative sample is related data of an enterprise with loan overdue;
the extraction module is used for extracting the characteristics of the selected positive sample and the negative sample according to the predefined enterprise basic attribute characteristics, internal attribute characteristics and external attribute characteristics to obtain a plurality of characteristic variables; wherein the internal attribute features include, but are not limited to, features generated internally by the loan provider for the base data of the small and medium sized businesses, and the external attribute features include custom features generated by a particular third party business for the base data and the internal attribute features;
And the training module is used for training by utilizing a machine learning algorithm based on the characteristic variables to obtain a risk early warning model of the medium-sized and small enterprises.
10. A smart device, comprising:
a memory and a processor coupled to the memory;
the memory is used for storing a program, and the program is at least used for realizing the method for training the risk early warning model of the medium and small-sized enterprises as claimed in any one of claims 1 to 8;
the processor is used for calling and executing the program stored in the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110952873.2A CN113506174A (en) | 2021-08-19 | 2021-08-19 | Method, device and equipment for training risk early warning model of medium and small enterprises |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110952873.2A CN113506174A (en) | 2021-08-19 | 2021-08-19 | Method, device and equipment for training risk early warning model of medium and small enterprises |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113506174A true CN113506174A (en) | 2021-10-15 |
Family
ID=78015875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110952873.2A Pending CN113506174A (en) | 2021-08-19 | 2021-08-19 | Method, device and equipment for training risk early warning model of medium and small enterprises |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113506174A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114066242A (en) * | 2021-11-11 | 2022-02-18 | 北京道口金科科技有限公司 | Enterprise risk early warning method and device |
CN114066603A (en) * | 2021-11-11 | 2022-02-18 | 中国建设银行股份有限公司 | Post-loan risk early warning method and device, electronic equipment and computer readable medium |
CN114091902A (en) * | 2021-11-22 | 2022-02-25 | 支付宝(杭州)信息技术有限公司 | Risk prediction model training method and device, and risk prediction method and device |
CN116703562A (en) * | 2023-06-01 | 2023-09-05 | 中科柏诚科技(北京)股份有限公司 | Financial wind control credit assessment method and device based on enterprise financial tax stamp data |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002163449A (en) * | 2000-11-29 | 2002-06-07 | World Business Management Kk | Method and system for financing and evaluating method for technology-secured credit |
US20130132269A1 (en) * | 2010-08-06 | 2013-05-23 | The Dun And Bradstreet Corporation | Method and system for quantifying and rating default risk of business enterprises |
CN107730283A (en) * | 2017-11-03 | 2018-02-23 | 中国银行股份有限公司 | A kind of reference method and device of medium-sized and small enterprises |
CN108876134A (en) * | 2018-06-08 | 2018-11-23 | 山东汇贸电子口岸有限公司 | A kind of medium and small micro- enterprise's credit system |
CN110930248A (en) * | 2020-01-22 | 2020-03-27 | 成都数联铭品科技有限公司 | Credit risk prediction model construction method and system, storage medium and electronic equipment |
CN111401798A (en) * | 2020-06-02 | 2020-07-10 | 南京百敖软件有限公司 | Enterprise waste escaping and debt risk early warning system and construction method |
CN111913994A (en) * | 2020-08-12 | 2020-11-10 | 武汉众邦银行股份有限公司 | Client risk data monitoring method based on inline data and external data |
WO2021000678A1 (en) * | 2019-07-04 | 2021-01-07 | 平安科技(深圳)有限公司 | Business credit review method, apparatus, and device, and computer-readable storage medium |
CN112330441A (en) * | 2020-11-12 | 2021-02-05 | 北京宸信征信有限公司 | Method for evaluating business value credit loan of medium and small enterprises |
CN112365339A (en) * | 2020-11-12 | 2021-02-12 | 北京宸信征信有限公司 | Method for judging commercial value credit loan amount of small and medium-sized enterprises |
CN112801773A (en) * | 2021-01-20 | 2021-05-14 | 招商银行股份有限公司 | Enterprise risk early warning method, device, equipment and storage medium |
CN113177839A (en) * | 2021-05-20 | 2021-07-27 | 中国建设银行股份有限公司 | Credit risk assessment method, device, storage medium and equipment |
-
2021
- 2021-08-19 CN CN202110952873.2A patent/CN113506174A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002163449A (en) * | 2000-11-29 | 2002-06-07 | World Business Management Kk | Method and system for financing and evaluating method for technology-secured credit |
US20130132269A1 (en) * | 2010-08-06 | 2013-05-23 | The Dun And Bradstreet Corporation | Method and system for quantifying and rating default risk of business enterprises |
CN107730283A (en) * | 2017-11-03 | 2018-02-23 | 中国银行股份有限公司 | A kind of reference method and device of medium-sized and small enterprises |
CN108876134A (en) * | 2018-06-08 | 2018-11-23 | 山东汇贸电子口岸有限公司 | A kind of medium and small micro- enterprise's credit system |
WO2021000678A1 (en) * | 2019-07-04 | 2021-01-07 | 平安科技(深圳)有限公司 | Business credit review method, apparatus, and device, and computer-readable storage medium |
CN110930248A (en) * | 2020-01-22 | 2020-03-27 | 成都数联铭品科技有限公司 | Credit risk prediction model construction method and system, storage medium and electronic equipment |
CN111401798A (en) * | 2020-06-02 | 2020-07-10 | 南京百敖软件有限公司 | Enterprise waste escaping and debt risk early warning system and construction method |
CN111913994A (en) * | 2020-08-12 | 2020-11-10 | 武汉众邦银行股份有限公司 | Client risk data monitoring method based on inline data and external data |
CN112330441A (en) * | 2020-11-12 | 2021-02-05 | 北京宸信征信有限公司 | Method for evaluating business value credit loan of medium and small enterprises |
CN112365339A (en) * | 2020-11-12 | 2021-02-12 | 北京宸信征信有限公司 | Method for judging commercial value credit loan amount of small and medium-sized enterprises |
CN112801773A (en) * | 2021-01-20 | 2021-05-14 | 招商银行股份有限公司 | Enterprise risk early warning method, device, equipment and storage medium |
CN113177839A (en) * | 2021-05-20 | 2021-07-27 | 中国建设银行股份有限公司 | Credit risk assessment method, device, storage medium and equipment |
Non-Patent Citations (4)
Title |
---|
乔亚男: "中小企业贷款偿还能力预警研究", 万方数据—学位论文, no. 01, pages 1 - 62 * |
唐春阳,等: "企业短期贷款违约预测Bayes模型构建", 当代经济科学, vol. 28, no. 01, pages 41 - 45 * |
孙宛青: "商业银行客户信贷风险预警模型构建", 合肥工业大学学报:社会科学版, vol. 16, no. 02, pages 115 - 120 * |
方敏,等: "面向农业生产的信贷管理系统开发与设计", 中国新技术新产品, vol. 06, no. 12, pages 29 - 32 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114066242A (en) * | 2021-11-11 | 2022-02-18 | 北京道口金科科技有限公司 | Enterprise risk early warning method and device |
CN114066603A (en) * | 2021-11-11 | 2022-02-18 | 中国建设银行股份有限公司 | Post-loan risk early warning method and device, electronic equipment and computer readable medium |
CN114091902A (en) * | 2021-11-22 | 2022-02-25 | 支付宝(杭州)信息技术有限公司 | Risk prediction model training method and device, and risk prediction method and device |
CN116703562A (en) * | 2023-06-01 | 2023-09-05 | 中科柏诚科技(北京)股份有限公司 | Financial wind control credit assessment method and device based on enterprise financial tax stamp data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113506174A (en) | Method, device and equipment for training risk early warning model of medium and small enterprises | |
US20130138577A1 (en) | Methods and systems for predicting market behavior based on news and sentiment analysis | |
US20220343433A1 (en) | System and method that rank businesses in environmental, social and governance (esg) | |
Feuerriegel et al. | News or noise? How news drives commodity prices | |
Hammer et al. | Reverse-engineering country risk ratings: a combinatorial non-recursive model | |
WO2020118019A1 (en) | Adaptive transaction processing system | |
US20150221038A1 (en) | Methods and system for financial instrument classification | |
US8126790B2 (en) | System for cost-sensitive autonomous information retrieval and extraction | |
US20110087573A1 (en) | Method and system for dynamically producing detailed trade payment experience for enhancing credit evaluation | |
Byrnes | Developing automated applications for clustering and outlier detection: Data mining implications for auditing practice | |
CN117764724A (en) | Intelligent credit rating report construction method and system | |
US20080103882A1 (en) | Method for cost-sensitive autonomous information retrieval and extraction | |
CN114742402A (en) | Information monitoring method, device, equipment and medium | |
Koyuncugil et al. | Early warning system for financially distressed hospitals via data mining application | |
Nasrizar | Big Data & Accounting Measurements | |
Hadlock et al. | Does the KZ index provide a useful measure of financial constraints | |
Pham et al. | Do Fintech-Related keywords influence bank return? A case study from Vietcombank and Sacombank in Vietnam | |
Hutapea et al. | Comparison of Accuracy Between Two Methods: Naїve Bayes Algorithm and Decision Tree-J48 to Predict The Stock Price of Pt Astra International tbk Using Data From Indonesia Stock Exchange | |
Stupp et al. | Analysis of the impact of adopting international accounting standards in predicting the insolvency of businesses listed on the BM&FBovespa brazilian stock exchange | |
Biglari et al. | Big data applications in accounting: Implications for the tropics | |
Zhang | Research on Intelligent Analysis and Processing System of Financial Big Data Based on Machine Learning | |
Abideen et al. | Sustainability reporting and investor sentiment. A sustainable development approach to Chinese-listed firms. | |
Bourezk et al. | An Overview on Sentiment Mining for Stock Market prediction | |
Niu et al. | Comparison of different individual credit risk assessment models | |
KR20240013349A (en) | Method and device for operating a platform for diagnosing corporate insolvency |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211015 |
|
RJ01 | Rejection of invention patent application after publication |