CN112734352A - Document auditing method and device based on data dimensionality - Google Patents
Document auditing method and device based on data dimensionality Download PDFInfo
- Publication number
- CN112734352A CN112734352A CN201911029683.2A CN201911029683A CN112734352A CN 112734352 A CN112734352 A CN 112734352A CN 201911029683 A CN201911029683 A CN 201911029683A CN 112734352 A CN112734352 A CN 112734352A
- Authority
- CN
- China
- Prior art keywords
- data
- document
- training
- dimension
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000013499 data model Methods 0.000 claims abstract description 82
- 238000012549 training Methods 0.000 claims abstract description 82
- 238000012360 testing method Methods 0.000 claims abstract description 59
- 238000012550 audit Methods 0.000 claims description 47
- 238000003860 storage Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 abstract description 20
- 238000011156 evaluation Methods 0.000 abstract description 7
- 238000010276 construction Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 16
- 238000012545 processing Methods 0.000 description 14
- 241000288113 Gallirallus australis Species 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000009467 reduction Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000007781 pre-processing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000007635 classification algorithm Methods 0.000 description 5
- 238000007418 data mining Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 102100024607 DNA topoisomerase 1 Human genes 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 101000830681 Homo sapiens DNA topoisomerase 1 Proteins 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000013524 data verification Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000012553 document review Methods 0.000 description 2
- 238000013209 evaluation strategy Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101150041570 TOP1 gene Proteins 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/015—Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
- G06Q30/016—After-sales
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Accounting & Taxation (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Finance (AREA)
Abstract
The invention discloses a document auditing method and device based on data dimensionality, and relates to the technical field of computers. One embodiment of the method comprises: determining data dimension, extracting feature data corresponding to the data dimension in the bill, and generating a data set by combining an auditing result of the bill; training and testing a data model according to the data set to obtain a trained data model; and receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain the auditing result of the document to be audited. According to the embodiment, model construction and evaluation are carried out on a large amount of existing document data, the expected value of the classification result can be dynamically configured according to the training accuracy, the problems of subjectivity and uncertainty in the manual checking process are solved, and meanwhile, the method can replace a method for automatically checking a service order by customer service.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a document auditing method and device based on data dimensionality.
Background
A general approval flow, such as credit card approval, directly verifies whether the collected customer information meets the approval condition, and no transaction is generated with the customer. The entrance of the after-sales process as a reverse process (selling goods as a forward process to be delivered to the customer) is a product of brand economy, and the after-sales service of high quality is a variety of service activities provided for the goods/customer problem after the goods are sold.
The auditing of the service bill is a crucial link after sale, and if the price of the commodity is reduced after the commodity is purchased by a customer or the problems of loss of goods, loss of logistics and the like occur, the application needs to be filled and then the customer service audits the commodity. Considering that different customers have different requirements and different processing modes of different types of commodities, the audit of the service list becomes a very complex problem. According to the principle of priority of customer satisfaction, the customer service may need to communicate with the customer several times to solve the problem, and the cost is high.
In addition, different auditing results also influence the customer satisfaction and cost, and how to win high satisfaction with lower cost is the problem that the scheme mainly solves. The current e-commerce platform has two main ways to treat such a fallback maintenance service:
1) no service ticket audits: the user directly communicates with the merchant in the modes of telephone, IM and the like and carries out offline processing;
2) manual review + rule configuration: compared with the first mode, the mode can alternately process the same service order to realize data sharing.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:
1) for the first mode, when the amount after sale is gradually increased, the processing difficulty of the merchant is increased, the payment cost is high, and the user cannot pay attention to/check the processing dynamics in real time;
2) for the second mode, more human resources are consumed, and the timeliness and the quality of the customer service mobility and new customer service training are difficult to guarantee;
3) for the second mode, the rules are only manually set for specific services, and the method has the following use limitations:
firstly, the service orders outside the rules cannot be audited and suggested, the use scene is single, and the coverage rate of the orders is small;
secondly, interaction exists among the services, conflict among the rules needs to be solved when the rules are set, and the maintenance cost is high;
and thirdly, the method has no prepositive and stricter data verification support and has subjectivity and uncertainty.
In summary, the current service list auditing has subjectivity, uncertainty and uncontrollable property, and the auditing aging cost is high and has many problems.
Disclosure of Invention
In view of this, embodiments of the present invention provide a document auditing method and apparatus based on data dimensionality, which can at least solve the problems of subjectivity, uncertainty, uncontrollable property, high auditing timeliness cost and many problems existing in current service ticket auditing in the prior art.
In order to achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a document auditing method based on data dimension, including:
determining data dimension, extracting feature data corresponding to the data dimension in the bill, and generating a data set by combining an auditing result of the bill;
training and testing a data model according to the data set to obtain a trained data model;
and receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain the auditing result of the document to be audited.
Optionally, the determining the data dimension and extracting feature data corresponding to the data dimension in the document include:
and if the characteristic data does not exist in the document, acquiring a data record of the document according to the document identification and the generation time of the document so as to extract the characteristic data from the data record.
Optionally, after determining the data dimension and extracting feature data corresponding to the data dimension in the document, the method further includes:
and counting the data volume of the feature data under each data dimension, and eliminating the data dimension of which the data volume exceeds a preset data volume threshold value and the feature data corresponding to the eliminated data dimension.
Optionally, the training and testing the data model according to the data set to obtain the trained data model includes:
dividing the data set into a training set and a test set;
inputting training data and audit results in the training set into the data model, obtaining total training accuracy according to the training accuracy under each audit result, and generating a data model to be tested;
inputting the test data in the test set into the data model to be tested, and if the total test accuracy is greater than or equal to the training accuracy and the test accuracy under each audit result is greater than or equal to the training accuracy, determining that the data model to be tested is the trained data model.
In order to achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a document auditing apparatus based on data dimension, including:
the data dimension determining module is used for determining data dimensions, extracting feature data corresponding to the data dimensions in the bill, and generating a data set by combining an auditing result of the bill;
the data model training module is used for carrying out training test on a data model according to the data set to obtain a trained data model;
and the document auditing module is used for receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain an auditing result of the document to be audited.
Optionally, the data dimension determining module is configured to:
and if the characteristic data does not exist in the document, acquiring a data record of the document according to the document identification and the generation time of the document so as to extract the characteristic data from the data record.
Optionally, the data dimension determining module is further configured to:
and counting the data volume of the feature data under each data dimension, and eliminating the data dimension of which the data volume exceeds a preset data volume threshold value and the feature data corresponding to the eliminated data dimension.
Optionally, the data model training module is configured to:
dividing the data set into a training set and a test set;
inputting training data and audit results in the training set into the data model, obtaining total training accuracy according to the training accuracy under each audit result, and generating a data model to be tested;
inputting the test data in the test set into the data model to be tested, and if the total test accuracy is greater than or equal to the training accuracy and the test accuracy under each audit result is greater than or equal to the training accuracy, determining that the data model to be tested is the trained data model.
To achieve the above object, according to a further aspect of the embodiments of the present invention, an electronic device for document auditing based on data dimensionality is provided.
The electronic device of the embodiment of the invention comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize any one of the above document auditing methods based on the data dimension.
To achieve the above object, according to a further aspect of the embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, the computer program, when executed by a processor, implementing any of the above-mentioned document auditing method based on data dimension.
According to the scheme provided by the invention, one embodiment of the invention has the following advantages or beneficial effects: the existing large amount of document data is used for model construction and evaluation, the expected value of the classification result can be dynamically configured according to the training accuracy, the problems of subjectivity and uncertainty in the manual checking process are solved, and meanwhile, the method can replace a method for automatically checking a service order by customer service.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic main flow diagram of a document auditing method based on data dimensionality according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the model construction evaluation results according to the embodiment of the present invention;
FIG. 3 is a schematic diagram of a model training process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of decision tree-J48 in weka, according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a weka classifier evaluation strategy usage according to an embodiment of the invention;
FIG. 6 is a flowchart illustrating an alternative data dimension-based document review method according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a server architecture of an automatic audit system for a service ticket according to an embodiment of the present invention;
FIG. 8 is a block diagram of an automated audit system according to an embodiment of the invention;
FIG. 9 is a schematic diagram of the main modules of a document auditing apparatus based on data dimension according to an embodiment of the present invention;
FIG. 10 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 11 is a schematic block diagram of a computer system suitable for use with a mobile device or server implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiment of the present invention is applicable to a scenario with an audit (approval) action, for example, document audit (approval), where the document is not limited to a service form. For convenience of explaining the whole operation process, the present invention is described by taking a service list as an example.
The words to which the invention relates are to be construed as follows:
weka: machine learning (machine learning) and data mining (data mining) software based on open sources in a JAVA environment.
The two classification problems: if we want to identify whether a picture is a cat or not. That is, a classifier is trained to input a picture, represented by the feature vector xx, and to output if a cat is present, represented by yy being 0 or 1.
Principle of J48: based on a top-down strategy, a recursive divide and conquer strategy, an attribute is selected to be placed at a root node, a branch is generated for each possible attribute value, an instance is divided into a plurality of subsets, each subset corresponding to a branch of the root node, and then the process is recursively repeated on each branch. When all instances have the same classification, stop.
Enumerated types: in practical applications, there are only a few possible values of some variables. If the sex of a person has only two possible values, the week has only seven possible values. Variables that are more specific to such values in C language can be defined as follows. Enumeration refers to enumerating values of variables one by one, and the variables are limited to values within the enumerated values.
Referring to fig. 1, a main flowchart of a document auditing method based on data dimension provided by the embodiment of the present invention is shown, and includes the following steps:
s101: determining data dimension, extracting feature data corresponding to the data dimension in the bill, and generating a data set by combining an auditing result of the bill;
s102: training and testing a data model according to the data set to obtain a trained data model;
s103: and receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain the auditing result of the document to be audited.
In the above embodiment, for step S101, the feature selection preliminarily selects multiple dimensions according to the service research, for example, within 200 dimensions may be selected according to the actual service scene.
The invention refers to the past data for training the data model, and the past data model predicts the current result according to the historical data.
But some data is changing in real time, such as a certain user level applying for a service order a year ago; or the service system is not completely constructed before, so that the characteristic data is not recorded and stored. For example, the user level in the scoring system, the total after-sales rate of the user, corresponds to the total after-sales rate of SKU (Stock Keeping Unit) goods.
It should be noted that the real-time feature data herein refers to data that may change in real-time, but does not represent that the current data is to be used to replace the data in the obtained document. And because the documents are historical documents, if part of the information in the documents is changed, the auditing result may change, and a certain influence is generated on the data model training or the data model training.
In consideration of these factors, the information in the document needs to be collected to ensure real-time performance and accuracy of the data. The collection mode can be as follows:
1) determining a data record during generation according to the generation time of the document and the document identification, and calling the real-time characteristic data from the data record;
2) JSF (Java Server Faces, Java building framework) can be used for remote calling, or calling is carried out from another department through one interface; if locally stored, it can be directly acquired locally. The present invention is primarily directed to the first mode.
The historical documents are processed completely, so that the historical documents have audit results, and the number of the audit results in different business scenes may be different.
Based on the number of data dimensions SA and the number of audit results SR (e.g., audit success, audit failure), the number of samples actually required is determined as SA × SR × 1000 (or other values are also possible).
TABLE 1 sample data
Dimension 1 | Dimension 2 | Dimension 3 | Dimension 4 | Dimension 5 | Audit results | |
Document 1 | ||||||
Document 2 |
Further, before inputting the data set into the data model, a preprocessing operation may be performed on the data, specifically:
1) normalization (numeric conversion to enumeration): commodity type, brand, order type, etc.;
2) data missing value processing: for example, commodity prices, substituting average values for missing values; commodity class, replacing missing value with most appeared value;
3) discretizing: for example, a continuity feature such as a level is converted into a classification attribute. The grade is a numerical attribute, and the continuous grades are divided into a stack, for example, 200-.
For the above data preprocessing, weka, tensorflow, python language, R language, etc. can be used as a data analysis tool. Taking weka as an example, the data pre-processing maps to weka as: enumerate-String To Nominal, Numberic To Nominal, Missing value processing-Replace Missing Values, discretization-Discription.
Furthermore, the training of the data model is an iterative process, so that the data in the provided data set and the auditing result may be reused. In order to facilitate the subsequent re-extraction of the data for application, the data may be transferred to a persistent state for storage, for example, written to MySQL.
In addition, for the processing bottleneck or storage capacity limitation of the existing MySQL, the data can be pulled to the data mart (data migration) during the low-traffic period to reduce the traffic system pressure. Such as removing old data from 2012.
For step S102, the data set may be divided into a training set and a test set in a certain proportion (e.g., 60% to 80%), and the output value is known. The training set is used to generate models and the test set is used to validate models.
The prior art is generally concerned with whether a model test is possible or not, based on a predetermined expected value, and if the expected value is exceeded, the model is available.
In the invention, the accuracy of the model training is compared with the feasibility of the model test. Since there is also an accuracy (i.e., model accuracy) when the data model is actually trained using the training set, a decrease in accuracy indicates an overfitting if the test set is used (the model overfitts the training set, not necessarily on other data).
After testing the data model based on the test set, the obtained test results may include audit results to which the test data belongs. The auditing result can be probability values of test data belonging to different categories or TOP1, and is set according to actual conditions.
It should be noted that, in practical applications, each class in a service scenario has a corresponding probability value, and if the predicted auditing result of each class is lower than the corresponding training accuracy, the model needs to be adjusted. In addition, the maximum probability value TOP1 in the obtained result may also be extracted, and if the maximum probability value is also lower than the corresponding training accuracy, the model needs to be adjusted.
The business scenario here includes, but is not limited to, audit closing, direct claims, pickup by the gate, replacement by the gate, delivery by the customer, and the current business customer service audit results can be summarized as these six types. The different classification results represent the subsequent processing direction of the document by the business party.
For example, as shown in fig. 2, the overall accuracy of the evaluation result is about 70%, but the accuracy of the classes 0 and 10 is significantly lower than the overall accuracy, and model adjustment is required.
In addition, the auditing result of each classification can also be compared with the expected classification, for example, the accuracy of each classification is required to reach 80%, and a certain dimension can be adjusted under the condition that other dimensions are not changed, so that the algorithm or the classifier is perfected through dimension adjustment.
However, if the result does not meet the expected value, for example, the error rate is too high, the training dimension in the model can be adjusted subsequently, and then the data is retrained. But it is also possible to re-enter the process of selecting dimension data according to data dimensions, because some data are not actually needed, and referring specifically to fig. 3, the present invention selects the re-enter dimension selection step to consider the most complete flow.
The expected value may be an expected value derived from the cost loss model and the customer satisfaction model, for example, the cost loss rate is 0.1, the customer satisfaction is 0.8, and the final expected value is (1-0.9) × 0.8 ═ 0.72. However, the expected value may not be set too low, and there may be a plurality of situations that are greater than the preset value, so Top1 or Top N may be selected according to actual needs.
The model is continuously trained so that the accuracy rate approaches the expected value, and the expected value should be greater than 1/n and less than or equal to 1 (when equal to 1/n is the n-face-throwing screen test, less than 1/n is different from the n-face-throwing screen test, and equal to 1 is a perfect model), of course, the actual situation also needs to adjust the expected value according to the sample distribution.
The classification algorithm is supplemented here, many classification algorithms exist in the current data mining research, the present invention mainly uses the supervised classification algorithm in weka (other algorithms can be used, the present invention is not limited), the corresponding relation in weka is decision tree-J48, and a simple decision tree is shown in fig. 4.
In the decision tree, each node on the created tree represents a location where a decision must be made based on the input, and there will be one node moving to the next until a leaf node is reached where the predicted output can be derived.
For step S103, after the training model is finished, the model can be applied to serve as a foundation for auditing (applying for thousands of services) to efficiently complete document auditing.
And deploying the constructed model into a production environment, wherein the verification process is similar to the data training process, and only customer service audit results are recorded at the same time. And collecting data for a period of time, and comparing the overall accuracy with the accuracy of each audit result.
It is explained that the customer service auditing result is recorded, and the document auditing is usually operated by the data model and the customer service simultaneously before the model is determined to be completely OK, and if the two results are different, the model dimension and the characteristic data need to be adjusted. After the test set tests the model, the customer service performs verification once more.
Of course, if the model is fully available at this time, it may not be necessary to record customer service reviews.
For documents to be tested in practical application, operations such as preprocessing, feature data extraction and the like are required before the documents are input into a data model.
And if the test result meets the condition that the test result is more than or equal to the preset expected value, storing the data to the database cluster with successful auditing. And the documents which are successfully checked can be provided for an after-sale system to be called back for use. The design is a design of a system architecture, and aims to decouple, the after-sales system is a business system, and the predicted auditing result can be directly pushed to the after-sales system.
If the test result does not meet the preset expected value, the data can be stored in a database cluster with failed audit, and the data can be returned to the after-sales system for customer service audit.
In addition to the above-mentioned manner, the data verification method may also be performed by combining other evaluation strategies, for example:
and (3) cross validation: the initial sample is divided into K sub-samples, one individual sub-sample is retained as data for the verification model, and the other K-1 samples are used for training. Repeating the cross validation for K times, validating each subsample once, averaging the results of the K times or using other combination modes, and finally obtaining a single estimation;
the classifier evaluates the policy, see for example the weka classifier evaluation policy in fig. 5.
In the above, evaluation needs to be carried out on the cyclic digital words in the model construction process (the training accuracy and the testing accuracy are similar and close to 1, which means that the model is stable and extremely accurate), and the basic indexes of data mining are as shown in the following table 2:
TABLE 2 data mining basic metrics
True prediction | p | n | |
p' | TP(True Positives) | FP(False Positives) | P' |
n' | FN(False Negatives) | TN(True Negatives) | N' |
P | N |
Wherein, TP: true Positive, predicted by the model as a Positive sample, actually Positive, a predicted pair. tprate is TP/(TP + FN)
TN: true Negative, the model predicts as Negative samples, actually Negative, prediction pairs. tnrate ═ TN/(TP + TN).
FN: false Negative, samples predicted by the model to be Negative, actually positive, are predicted to be False. fnrate is FN/(TP + FN).
FP: false Positive, samples predicted by the model to be Positive, actually negative, and prediction error fprate ═ FP/(FP + TN).
Precision: in the predicted result, the true correct number is a proportion of the overall result. precision is TP/(TP + FP).
Recall: in the predicted result, the ratio of the number of true correct bits to the number of true correct bits in the entire data set (actually positive) is precision or tprate.
F-Measure: is Precision and Recall weighted harmonic mean, in the F-measure function, when the dimension alpha is 1, F1 integrates the results of Precision and Recall, and when F1 is higher, it can be said that the test method is more effective F1=2TP/(2TP+FP+FN)。。
ROC: the ROC curve, whose abscissa is False Positive Rate (FPR) and ordinate is True Positive Rate (TPR), can be kept constant when the distribution of positive and negative samples in the test set changes.
PRC: the PRC curve, in the case of very uneven distribution of positive and negative samples (highly skewed data), reflects the classifier's quality more effectively than the ROC.
The method provided by the embodiment carries out model construction evaluation by using a large amount of existing document data, can dynamically configure the expected value of the classification result according to the training accuracy, solves the problems of subjectivity and uncertainty in the manual examination process, and can replace a method for automatically examining and examining the service list by customer service; the automatic auditing model has stronger stability, can also carry out supervised or unsupervised learning, and along with the increase of the document quantity, the model base can also carry out self-learning and self-perfection.
Referring to fig. 6, a schematic flow chart of an optional data dimension-based document auditing method according to an embodiment of the present invention is shown, including the following steps:
s601: determining a data dimension, and extracting feature data corresponding to the data dimension in the bill;
s602: counting the data volume of the feature data under each data dimension, and eliminating the data dimension of which the data volume exceeds a preset data volume threshold value and the feature data corresponding to the eliminated data dimension;
s603: generating a data set according to the feature data after dimension screening and by combining the examination result of the document;
s604: training and testing a data model according to the data set to obtain a trained data model;
s605: and receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain the auditing result of the document to be audited.
In the above embodiment, steps S601, S603, S604 and S605 can refer to the descriptions of steps S101 to S103 shown in fig. 1, and are not described again here.
In the above embodiment, for step S602, in the initial establishment stage of the data model, a large and comprehensive feature set is usually found for step-by-step screening, so that there may be operations of dimension reduction or addition subsequently.
The dimension reduction is taken as an example for specific explanation:
the enumerated value of a feature is severely skewed (e.g., order type, 90% of self-run) or severely dispersed (e.g., 5000 or more commodity classes), so if the feature is subsequently measured to have a low impact on the accuracy of the classifier (< 0.1%), the feature can be deleted or merged (when dispersed).
The invention mainly audits documents, such as service sheets, and usually has a plurality of audit results, and the corresponding problem is a multi-classification problem.
Because the data in the document may have a condition that the quantity of a certain feature space is too large, the problem that the model is too large and cannot be loaded occurs when the classifier is used in the subsequent training process, and redundant features not only influence or mislead the classifier, but also mean that the fitting risk is too large.
In order to solve the problem, the invention uses an algorithm (such as a clustering algorithm) to perform dimension adjustment (hereinafter, referred to as dimension reduction) operation before model training, and meanwhile, the smaller the dimension, the less the data set of the required training test, the more ideal the training speed and the time consumption, and the better result can be obtained.
The clustering algorithm used by the invention is mainly a KNN algorithm. The KNN algorithm is not influenced by an overlarge image of a certain feature space, can perform feature deletion accuracy rate change test, determines a feature screening result through the test, and finally reduces the feature space occupying the overlarge space, which is similar to digital discretization as a whole. For example, there are 5000 original categories, and after dimensionality reduction, the categories are classified to obtain 200 categories.
In addition, it should be noted that the deletion of the features by the KNN algorithm may have a certain effect on the training of the model, but may be negligible.
The invention can also carry out dimension reduction by two modes:
1) characteristic extraction: the new feature after feature extraction is a mapping of the original features, and can also be understood as a combination of the original features. For example, the original feature is [ a, b, c, d, e ], and the new feature is [ a + b, c x d, e/a ];
2) selecting characteristics: the features after feature selection are a subset of the original features, and can also be understood as useless features directly removed from the original features. For example, the original feature is [ a, b, c, d, e ], and the new feature is [ a, c, e ].
Through the dimension reduction operation, the data dimension can be reasonably planned, and a classifier or a classification algorithm, such as J48, is used after the dimension is not reduced. And dimension reduction can be performed before data preprocessing, and the strict sequence is not required.
It should be noted that feature addition may also be performed during training of the test data model, for example, important features are omitted, such as user-level features.
The method provided by the embodiment can eliminate some features with smaller classification checking result images through dimension reduction operation, so that the accuracy of the checking result is improved, and the data storage capacity in a data set is reduced.
The method provided by the embodiment of the invention can refer to the original document and has stronger generalization capability; the usability of the data model is high, complex business rules can be completely replaced, the setting of dynamic expected values is supported, and the fault tolerance rate is high; the stability is higher, can carry out supervision or non-supervision formula study, and to the continuous increase of document volume, the model base also can self-study and perfect, realizes the effect of high-efficient, high accuracy processing document.
Referring to fig. 7, a technical architecture of the present invention is shown, and the entire system is composed of a client module, an after-sales system server cluster, an automatic audit server cluster, and a data storage system in terms of server architecture.
1) Client module
The system comprises an APP client and a PC client, wherein the APP client and the PC client are used for applying for submitting/modifying a service list;
2) after-sales system service cluster
Mainly process the request sent by the client, process the data preprocessing, belong to the application layer server cluster;
3) automatic audit server cluster
In the face of the large number of service orders per day, the automated audit servers need to form a cluster architecture (which works to perform computing tasks with a high degree of closeness of cooperation through a set of loosely integrated computer software and/or hardware connections).
4) Data storage system
The cluster storage servers storing the feature data (i.e. service list) can be divided into three types: the system comprises a server for storing the data which are successfully checked, a server for storing the data which are unsuccessfully checked and a server for storing the automatic checking model.
Referring to fig. 8, the whole system is composed of an after-market system server, an automatic audit server and a data storage system in terms of service details.
1) After-sale system server
Including a main thread and a worker thread. The main thread manages the working thread, the working thread comprises a message forwarding thread and a service calling thread, and the main thread and the working thread are not divided in detail here.
However, it is common to perform partitioning, for example, where one worker thread fails and the main thread specifies which other thread to replace.
2) Automatic auditing system server
The method comprises three steps of characteristic filling, characteristic preprocessing and model prediction. The existing data model generated by the classification algorithm is mainly used, the existing data is predicted according to the existing data model, and whether the audit is successful is judged according to conditions.
3) The successful audit storage server is used for storing the service list which is successfully audited
4) The audit failure storage server is used for storing the service list with audit failure
5) And the automatic auditing model set storage server is used for storing the data model generated by the J48 algorithm.
Referring to fig. 9, a schematic diagram of main modules of a document auditing apparatus 900 based on data dimension according to an embodiment of the present invention is shown, including:
the data dimension determining module 901 is configured to determine a data dimension, extract feature data corresponding to the data dimension in a document, and generate a data set in combination with an audit result of the document;
a data model training module 902, configured to perform a training test on a data model according to the data set to obtain a trained data model;
and the document auditing module 903 is used for receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain an auditing result of the document to be audited.
In the implementation apparatus of the present invention, the data dimension determining module 901 is configured to:
and if the characteristic data does not exist in the document, acquiring a data record of the document according to the document identification and the generation time of the document so as to extract the characteristic data from the data record.
In the implementation apparatus of the present invention, the data dimension determining module 901 is further configured to:
and counting the data volume of the feature data under each data dimension, and eliminating the data dimension of which the data volume exceeds a preset data volume threshold value and the feature data corresponding to the eliminated data dimension.
In the implementation apparatus of the present invention, the data model training module 902 is configured to:
dividing the data set into a training set and a test set;
inputting training data and audit results in the training set into the data model, obtaining total training accuracy according to the training accuracy under each audit result, and generating a data model to be tested;
inputting the test data in the test set into the data model to be tested, and if the total test accuracy is greater than or equal to the training accuracy and the test accuracy under each audit result is greater than or equal to the training accuracy, determining that the data model to be tested is the trained data model.
In addition, the detailed implementation of the device in the embodiment of the present invention has been described in detail in the above method, so that the repeated description is not repeated here.
Fig. 10 shows an exemplary system architecture 1000 of a data dimension-based document auditing method or a data dimension-based document auditing apparatus to which an embodiment of the present invention may be applied.
As shown in fig. 10, the system architecture 1000 may include terminal devices 1001, 1002, 1003, a network 1004, and a server 1005 (by way of example only). The network 1004 is used to provide a medium for communication links between the terminal devices 1001, 1002, 1003 and the server 1005. Network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 1001, 1002, 1003 to interact with a server 1005 via a network 1004 to receive or transmit messages or the like. The terminal devices 1001, 1002, 1003 may have installed thereon various messenger client applications such as shopping applications, web browser applications, search applications, instant messenger, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 1001, 1002, 1003 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 1005 may be a server that provides various services, such as a backend management server (for example only) that supports shopping websites browsed by users using the terminal devices 1001, 1002, 1003. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the document auditing method based on the data dimension provided by the embodiment of the present invention is generally executed by the server 1005, and accordingly, the document auditing apparatus based on the data dimension is generally disposed in the server 1005.
It should be understood that the number of terminal devices, networks, and servers in fig. 10 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 11, shown is a block diagram of a computer system 1100 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU)1101, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 1101.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a data dimension determination module, a data model training module, and a document review module. The names of the modules do not form a limitation on the modules themselves in some cases, for example, the data model training module may also be described as a "module for training a data model based on feature data and audit results".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
determining data dimension, extracting feature data corresponding to the data dimension in the bill, and generating a data set by combining an auditing result of the bill;
training and testing a data model according to the data set to obtain a trained data model;
and receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain the auditing result of the document to be audited.
According to the technical scheme of the embodiment of the invention, the original documents can be referred to, and the generalization capability is strong; the usability of the data model is high, complex business rules can be completely replaced, the setting of dynamic expected values is supported, and the fault tolerance rate is high; the stability is higher, can carry out supervision or non-supervision formula study, and to the continuous increase of document volume, the model base also can self-study and perfect, realizes the effect of high-efficient, high accuracy processing document.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A document auditing method based on data dimension is characterized by comprising the following steps:
determining data dimension, extracting feature data corresponding to the data dimension in the bill, and generating a data set by combining an auditing result of the bill;
training and testing a data model according to the data set to obtain a trained data model;
and receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain the auditing result of the document to be audited.
2. The method of claim 1, wherein the determining the data dimension and extracting feature data corresponding to the data dimension from the document comprises:
and if the characteristic data does not exist in the document, acquiring a data record of the document according to the document identification and the generation time of the document so as to extract the characteristic data from the data record.
3. The method of claim 1, wherein after determining the data dimension and extracting feature data corresponding to the data dimension from the document, the method further comprises:
and counting the data volume of the feature data under each data dimension, and eliminating the data dimension of which the data volume exceeds a preset data volume threshold value and the feature data corresponding to the eliminated data dimension.
4. The method of claim 1, wherein the training and testing the data model according to the data set to obtain the trained data model comprises:
dividing the data set into a training set and a test set;
inputting training data and audit results in the training set into the data model, obtaining total training accuracy according to the training accuracy under each audit result, and generating a data model to be tested;
inputting the test data in the test set into the data model to be tested, and if the total test accuracy is greater than or equal to the training accuracy and the test accuracy under each audit result is greater than or equal to the training accuracy, determining that the data model to be tested is the trained data model.
5. A document auditing device based on data dimensionality is characterized by comprising:
the data dimension determining module is used for determining data dimensions, extracting feature data corresponding to the data dimensions in the bill, and generating a data set by combining an auditing result of the bill;
the data model training module is used for carrying out training test on a data model according to the data set to obtain a trained data model;
and the document auditing module is used for receiving a document to be audited, extracting the characteristic data corresponding to the data dimension in the document to be audited, and inputting the extracted characteristic data into the trained data model to obtain an auditing result of the document to be audited.
6. The apparatus of claim 5, wherein the data dimension determination module is configured to:
and if the characteristic data does not exist in the document, acquiring a data record of the document according to the document identification and the generation time of the document so as to extract the characteristic data from the data record.
7. The apparatus of claim 5, wherein the data dimension determination module is further configured to:
and counting the data volume of the feature data under each data dimension, and eliminating the data dimension of which the data volume exceeds a preset data volume threshold value and the feature data corresponding to the eliminated data dimension.
8. The apparatus of claim 5, wherein the data model training module is configured to:
dividing the data set into a training set and a test set;
inputting training data and audit results in the training set into the data model, obtaining total training accuracy according to the training accuracy under each audit result, and generating a data model to be tested;
inputting the test data in the test set into the data model to be tested, and if the total test accuracy is greater than or equal to the training accuracy and the test accuracy under each audit result is greater than or equal to the training accuracy, determining that the data model to be tested is the trained data model.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911029683.2A CN112734352A (en) | 2019-10-28 | 2019-10-28 | Document auditing method and device based on data dimensionality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911029683.2A CN112734352A (en) | 2019-10-28 | 2019-10-28 | Document auditing method and device based on data dimensionality |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112734352A true CN112734352A (en) | 2021-04-30 |
Family
ID=75589058
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911029683.2A Pending CN112734352A (en) | 2019-10-28 | 2019-10-28 | Document auditing method and device based on data dimensionality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112734352A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529309A (en) * | 2022-02-09 | 2022-05-24 | 北京沃东天骏信息技术有限公司 | Information auditing method and device, electronic equipment and computer readable medium |
CN114638681A (en) * | 2022-03-30 | 2022-06-17 | 浪潮通用软件有限公司 | Receipt identification method, system and computer readable storage medium |
CN116469120A (en) * | 2023-05-31 | 2023-07-21 | 国网浙江省电力有限公司营销服务中心 | Automatic data processing method and device for electric charge bill and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679995A (en) * | 2017-08-31 | 2018-02-09 | 平安科技(深圳)有限公司 | Electronic installation, insurance case Claims Review method and computer-readable recording medium |
CN108960782A (en) * | 2018-07-10 | 2018-12-07 | 北京木瓜移动科技股份有限公司 | content auditing method and device |
US20190163666A1 (en) * | 2017-11-29 | 2019-05-30 | International Business Machines Corporation | Assessment of machine learning performance with limited test data |
CN109858927A (en) * | 2019-01-16 | 2019-06-07 | 深圳壹账通智能科技有限公司 | A kind of trade company's checking method, device, computer readable storage medium and server |
CN110264342A (en) * | 2019-06-19 | 2019-09-20 | 深圳前海微众银行股份有限公司 | A kind of business audit method and device based on machine learning |
CN110333886A (en) * | 2019-07-02 | 2019-10-15 | 上海企创信息科技有限公司 | A kind of review procedure iteration update method, device, server and storage medium |
-
2019
- 2019-10-28 CN CN201911029683.2A patent/CN112734352A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107679995A (en) * | 2017-08-31 | 2018-02-09 | 平安科技(深圳)有限公司 | Electronic installation, insurance case Claims Review method and computer-readable recording medium |
US20190163666A1 (en) * | 2017-11-29 | 2019-05-30 | International Business Machines Corporation | Assessment of machine learning performance with limited test data |
CN108960782A (en) * | 2018-07-10 | 2018-12-07 | 北京木瓜移动科技股份有限公司 | content auditing method and device |
CN109858927A (en) * | 2019-01-16 | 2019-06-07 | 深圳壹账通智能科技有限公司 | A kind of trade company's checking method, device, computer readable storage medium and server |
CN110264342A (en) * | 2019-06-19 | 2019-09-20 | 深圳前海微众银行股份有限公司 | A kind of business audit method and device based on machine learning |
CN110333886A (en) * | 2019-07-02 | 2019-10-15 | 上海企创信息科技有限公司 | A kind of review procedure iteration update method, device, server and storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529309A (en) * | 2022-02-09 | 2022-05-24 | 北京沃东天骏信息技术有限公司 | Information auditing method and device, electronic equipment and computer readable medium |
CN114638681A (en) * | 2022-03-30 | 2022-06-17 | 浪潮通用软件有限公司 | Receipt identification method, system and computer readable storage medium |
CN116469120A (en) * | 2023-05-31 | 2023-07-21 | 国网浙江省电力有限公司营销服务中心 | Automatic data processing method and device for electric charge bill and storage medium |
CN116469120B (en) * | 2023-05-31 | 2023-09-05 | 国网浙江省电力有限公司营销服务中心 | Automatic data processing method and device for electric charge bill and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119413A (en) | The method and apparatus of data fusion | |
CN110705719A (en) | Method and apparatus for performing automatic machine learning | |
CN111145009A (en) | Method and device for evaluating risk after user loan and electronic equipment | |
US20230103753A1 (en) | Generating adaptive textual explanations of output predicted by trained artificial-intelligence processes | |
CN114078050A (en) | Loan overdue prediction method and device, electronic equipment and computer readable medium | |
CN112950359B (en) | User identification method and device | |
CN112734352A (en) | Document auditing method and device based on data dimensionality | |
CN113313279A (en) | Document auditing method and device | |
CN111210109A (en) | Method and device for predicting user risk based on associated user and electronic equipment | |
CN112990311A (en) | Method and device for identifying admitted client | |
CN111062600B (en) | Model evaluation method, system, electronic device, and computer-readable storage medium | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
CN110895761B (en) | After-sales service application information processing method and device | |
CN110930238A (en) | Method, device, equipment and computer readable medium for improving audit task efficiency | |
CN112907362A (en) | Loan transaction processing method and device, electronic equipment and storage medium | |
CN114511022B (en) | Feature screening, behavior recognition model training and abnormal behavior recognition method and device | |
CN117437019A (en) | Credit card overdue risk prediction method, apparatus, device, medium and program product | |
CN111429257B (en) | Transaction monitoring method and device | |
CN115795345A (en) | Information processing method, device, equipment and storage medium | |
CN115147195A (en) | Bidding purchase risk monitoring method, apparatus, device and medium | |
CN114066603A (en) | Post-loan risk early warning method and device, electronic equipment and computer readable medium | |
KR20230103025A (en) | Method, Apparatus, and System for provision of corporate credit analysis and rating information | |
CN114357523A (en) | Method, device, equipment, storage medium and program product for identifying risk object | |
CN113450208A (en) | Loan risk change early warning and model training method and device | |
CN113094595A (en) | Object recognition method, device, computer system and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |