CN116720118A - Label quality intelligent analysis method and device, electronic equipment and storage medium - Google Patents
Label quality intelligent analysis method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN116720118A CN116720118A CN202310702551.1A CN202310702551A CN116720118A CN 116720118 A CN116720118 A CN 116720118A CN 202310702551 A CN202310702551 A CN 202310702551A CN 116720118 A CN116720118 A CN 116720118A
- Authority
- CN
- China
- Prior art keywords
- label
- tag
- quality
- data set
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 123
- 238000012360 testing method Methods 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 7
- 229910052739 hydrogen Inorganic materials 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013441 quality evaluation Methods 0.000 abstract 1
- 238000004891 communication Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 238000007726 management method Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004445 quantitative analysis Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000004069 differentiation Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Educational Administration (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Development Economics (AREA)
- Evolutionary Computation (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an intelligent decision technique in the financial field, and discloses a label quality intelligent analysis method which can be used for carrying out quality evaluation on user labels of financial products and comprises the following steps: extracting sample data corresponding to class labels of a label system to be evaluated according to a service scene; selecting an initial characteristic data set with distinction in the sample data, and calculating information gain between the initial characteristic data set and the class label so as to select a target characteristic data set of the class label from the initial characteristic data set; constructing tag test data of a target characteristic data set, and identifying a prediction tag of the tag test data; inquiring a label quality analysis index of the label system, and calculating an index score of the label quality analysis index according to the predicted label and the label test data; and calculating the quality score of each label in the label system according to the index score, and generating a label quality analysis report of the label system. The invention can improve the accuracy of label quality analysis of financial products.
Description
Technical Field
The present invention relates to the field of intelligent decision making, and in particular, to a method and apparatus for intelligent analysis of label quality, an electronic device, and a storage medium.
Background
In recent years, with the progress and rapid development of information technology, we enter the big data information age, and the big data acquainted influences our daily life and becomes a precious and abundant asset. For enterprises, especially finance companies, how to better use big data to serve users needs to be fully and comprehensively known, and a complete and available user tag system is established to help the finance companies describe and delineate the users from different dimensions, so that financial product recommendation, such as recommending insurance products or financial products, is better executed. The tags are classifications or content indicating the target. The tag quality is used to measure how good the tag system is to address the user's needs.
At present, label quality assessment in the industry is mostly carried out by measuring two indexes of saturation and label usage, whether the value segmentation of partial labels is reasonable or not, the accuracy of the user labels, the measurement of data fluctuation and the like are reflected, no specific measurement index exists, and for financial products with more customer classification dimensions, the label quality of a user label system is not comprehensively evaluated from multiple indexes, so that the accuracy of label quality analysis can be influenced.
Disclosure of Invention
The invention provides a label quality intelligent analysis method, a device, electronic equipment and a storage medium, and mainly aims to improve label quality analysis accuracy of user labels of financial products.
In order to achieve the above object, the present invention provides a method for intelligently analyzing quality of a label, comprising:
identifying a service scene of a label system to be evaluated and a class label corresponding to the service scene, and extracting sample data corresponding to the class label according to the service scene;
selecting a data set with distinction in the sample data as an initial characteristic data set, calculating information gain between the initial characteristic data set and the class label, and selecting a target characteristic data set of the class label from the initial characteristic data set according to the information gain;
constructing tag test data of the target feature data set, and identifying a predicted tag of the tag test data by using a tag classifier in a trained tag analysis model;
inquiring a tag quality analysis index of the tag system, and calculating an index score of the tag quality analysis index by using a tag quality analysis function in the trained tag analysis model according to the predicted tag and the tag test data;
And calculating the quality score of each label in the label system according to the index score, and generating a label quality analysis report of the label system according to the quality score.
Optionally, said calculating an information gain between said initial feature dataset and said class label comprises:
calculating the information gain between the initial feature data set and the class labels using the formula:
IG(f i ;T)=H(f i )+H(T)-H(f i ,T)
wherein IG (f i The method comprises the steps of carrying out a first treatment on the surface of the T) represents the ith feature f in the initial feature dataset i Information gain, H (f) i ) Representing the ith feature f i H (T) represents the information entropy of class label set T, H (f) i T) represents the ith feature f i And (5) joint information entropy of the class label set T.
Optionally, the selecting the target feature data set of the class label from the initial feature data set according to the information gain includes:
normalizing the information gain to obtain a normalized gain;
calculating the average value of the normalized gain, taking the average value as a threshold value, selecting the features in the initial feature data set which are not smaller than the threshold value, and generating a target feature data set.
Optionally, the normalizing the information gain to obtain a normalized gain includes:
Normalization of information gain using the following formula:
wherein SU (f) i T) represents the ith feature f in the initial feature dataset i Normalized information gain between class label set T, IG (f i The method comprises the steps of carrying out a first treatment on the surface of the T) represents the ith feature f in the initial feature dataset i Information gain, H (f) i ) Representing the ith feature f i And H (T) represents the information entropy of the class label set T.
Optionally, the constructing of the tag classifier includes:
wherein, gamma t T training iteration weight vector representing label classifier, t represents training iteration number of label classifier, gamma t-1 T-1 th training iteration weight vector, x representing label classifier i The i-th training sample data vector is represented, ρ represents the label weight learning rate, and T represents the transpose operation symbol of the vector.
Optionally, the tag quality analysis function includes:
wherein Acc represents the accuracy score of label prediction, R j True tag, Z, representing jth tag test data j Represents the predictive label corresponding to the jth label test data, |R j ∩Z j I represents the number of labels predicted correctly, R j ∪Z j The I represents the total number of the appearance of the real label and the predicted label, the U represents an intersection symbol, the U represents a collection symbol, and the n represents label test data Is a number of (3).
Optionally, the calculating a quality score of each tag in the tag system according to the index score includes:
performing uniform processing on the index corresponding to the index score to obtain an index uniform score, and performing data standardization processing on the index uniform score to obtain an index standardization score;
analyzing the importance degree of the index in the service scene of the tag system, and distributing a weight coefficient for the index according to the importance degree;
and calculating the product sum of the weight coefficient and the index score according to the weight coefficient and the index standardization score to obtain the quality score of each label in the label system.
In order to solve the above problems, the present invention also provides an intelligent tag quality analysis apparatus, the apparatus comprising:
the system comprises a class label identification module, a class label identification module and a data processing module, wherein the class label identification module is used for identifying a service scene of a label system to be evaluated and class labels corresponding to the service scene, and extracting sample data corresponding to the class labels according to the service scene;
the target feature data set generation module is used for selecting a data set with distinction in the sample data as an initial feature data set, calculating information gain between the initial feature data set and the class label, and selecting the target feature data set of the class label from the initial feature data set according to the information gain;
The predicted tag identification module is used for constructing tag test data of the target characteristic data set and identifying predicted tags of the tag test data by using a tag classifier in the trained tag analysis model;
the index score calculation module is used for inquiring the label quality analysis index of the label system and calculating the index score of the label quality analysis index by utilizing the label quality analysis function in the trained label analysis model according to the predicted label and the label test data;
and the label quality analysis report generation module is used for calculating the quality score of each label in the label system according to the index score and generating a label quality analysis report of the label system according to the quality score.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to implement the tag quality intelligent analysis method described above.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned tag quality intelligent analysis method.
According to the embodiment of the invention, the labels in the label system to be evaluated are subjected to label-like identification, an operation object is provided for subsequent evaluation of label quality, sample data corresponding to the label-like are extracted according to the service scene, redundant data in original data can be preliminarily removed, a data set with distinction in the sample data is selected as an initial characteristic data set, redundant data in the sample data is further effectively removed, and a characteristic data set with higher classification capability is reserved; and then, according to the embodiment of the invention, the association degree of each feature in the initial feature data set and the label set can be measured by calculating the information gain between the initial feature data set and the class labels, and according to the information gain, the classification performance of the subsequent multi-label classifier is more effectively improved by selecting the target feature data set of the class labels from the initial feature data set, an operation object is provided for the classification effect of the subsequent test label classifier by constructing the label test data of the target feature data set, and the label test result can be obtained by identifying the predictive label of the label test data by using the label classifier in the trained label analysis model, so as to measure the classification quality of the label classifier. Further, the embodiment of the invention helps to control the label quality by inquiring the label quality analysis index of the label system, guides a label manager and a developer to continuously improve the label quality to better serve users, calculates the index score of the label quality analysis index by utilizing the label quality analysis function in the trained label analysis model according to the predicted label and the label test data, can establish visual quantitative analysis on the label quality of the label system, calculates the quality score of each label in the label system according to the index score to judge the practical value of the label system, provides a guarantee for further dynamically adjusting the label of the label system, and generates a label quality analysis report of the label system according to the quality score to analyze the label quality more intuitively and quantitatively so as to improve the label classification accuracy of the label system and more accurately serve users. Therefore, the intelligent tag quality analysis method, the intelligent tag quality analysis device, the electronic equipment and the storage medium provided by the embodiment of the invention can improve the accuracy of customer tag quality analysis of financial products with more customer classification dimensions.
Drawings
FIG. 1 is a flow chart of a method for intelligent analysis of tag quality according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of an intelligent tag quality analyzer according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing a tag quality intelligent analysis method according to an embodiment of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides an intelligent analysis method for label quality. The execution subject of the tag quality intelligent analysis method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the invention. In other words, the tag quality intelligent analysis method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 1, a flow chart of a method for intelligent analysis of tag quality according to an embodiment of the invention is shown. In the embodiment of the invention, the intelligent label quality analysis method comprises the following steps:
s1, identifying a service scene of a label system to be evaluated and a class label corresponding to the service scene, and extracting sample data corresponding to the class label according to the service scene;
the embodiment of the invention can obtain the application scene and the label category of the label by identifying the service scene and the corresponding class label of the label system to be evaluated, and provides an operation object for the subsequent evaluation of the label quality. The label system to be evaluated refers to a system for describing and depicting users from different dimensions to realize business requirements, for example, a financial product system can comprise labels for describing and depicting users from dimensions such as age, income, consumption preference, financial habit and the like. The label system to be evaluated can be obtained through a data script, and the data script can be compiled through a JS script language. The tag is a category or content of the index mark object. The business scenario is used to describe the application environment, such as investment financing, online shopping commodity pushing, etc., that may be needed and associated with the product or service provided to the user. The class labels are labels for characterizing meaning classes of data, such as class labels for age, income, consumption preference, financial habits, and the like.
Further, in an optional embodiment of the present invention, the identifying the service scenario of the tag system to be evaluated and the class tag corresponding to the service scenario may be implemented by analyzing the service requirement of the tag system and the semantic meaning of the data representation.
Further, according to the service scene, the embodiment of the invention extracts the sample data corresponding to the class label to preliminarily remove redundant data in the original data, and provides support for the subsequent extraction of the characteristic data set.
Further, according to the service scenario, an optional embodiment of the present invention extracts sample data corresponding to the class label, including: analyzing the semantic meaning of the class label, and filtering redundant data in the original data of the label system according to the semantic meaning to obtain the sample data.
S2, selecting a data set with distinction in the sample data as an initial characteristic data set, calculating information gain between the initial characteristic data set and the class label, and selecting a target characteristic data set of the class label from the initial characteristic data set according to the information gain;
according to the embodiment of the invention, the data set with the distinguishing degree in the sample data (for example, the data set with the distinguishing degree such as user income, consumption preference, financial habit and the like can be selected from the sample data) is selected as the initial characteristic data set, so that redundant data in the sample data can be effectively removed, and the characteristic data set with higher classifying capability is reserved, so that the classifying effect of the subsequent multi-label is improved. Wherein the differentiation refers to the ability to distinguish between certain characteristics or conditions that are effectively measured or measured to be understood. Such as user income data, can measure the purchasing power of the user on the financial products, and can be used as a data set with differentiation. By a data set is meant a collection of structured data, which is understood to contain many members, each of which has many features.
Further, in an alternative embodiment of the present invention, the selecting the data set having the distinction in the sample data as the initial feature data set may be implemented through a distinction analysis method, such as a differential index method, a correlation coefficient method, and the like.
Further, the embodiment of the invention can measure the association degree of each feature in the initial feature data set and the label set by calculating the information gain between the initial feature data set and the class labels, and provides basis for subsequent further feature selection.
Further, in an alternative embodiment of the present invention, the information gain between the initial feature data set and the class label may be calculated by the following formula:
IG(f i ;T)=H(f i )+H(T)-H(f i ,T)
wherein IG (f i The method comprises the steps of carrying out a first treatment on the surface of the T) represents the ith feature f in the initial feature dataset i (e.g., ith feature f i May be "high consumption") and information gain between class label sets T (e.g., class label sets of age, annual revenue, consumption preferences, financial habits, etc.), H (f) i ) Representing the ith feature f i H (T) represents the information entropy of class label set T, H (f) i T) represents the ith feature f i And (5) joint information entropy of the class label set T.
Further, according to the information gain, the embodiment of the invention selects the target feature data set of the class label from the initial feature data set, so that the classification performance of the subsequent multi-label classifier can be improved more effectively.
Further, in an optional embodiment of the present invention, the selecting, according to the information gain, the target feature data set of the class label from the initial feature data set includes: normalizing the information gain to obtain a normalized gain; calculating the average value of the normalized gain, taking the average value as a threshold value, selecting the features in the initial feature data set which are not smaller than the threshold value, and generating a target feature data set. For example, the embodiment of the invention is a meterNormalized gain SU (f) of various data sets such as age, annual income, consumption preference, financial habit and the like is calculated 1 ;T)、SU(f 2 ;T)、SU(f 3 The method comprises the steps of carrying out a first treatment on the surface of the T) and SU (f) 4 The method comprises the steps of carrying out a first treatment on the surface of the After T), calculate SU (f) 1 ;T)、SU(f 2 ;T)、SU(f 3 The method comprises the steps of carrying out a first treatment on the surface of the T) and SU (f) 4 The method comprises the steps of carrying out a first treatment on the surface of the T) mean SU (f) Are all The method comprises the steps of carrying out a first treatment on the surface of the T), normalized gain SU (f) of this data set is earned in the current year 2 The method comprises the steps of carrying out a first treatment on the surface of the T) is greater than the mean SU (f Are all The method comprises the steps of carrying out a first treatment on the surface of the T) using the annual revenue data set as the target characteristic data set.
Still further, in an alternative embodiment of the present invention, the normalization of the information gain may be implemented by the following formula:
wherein SU (f) i T) represents the ith feature f in the initial feature dataset i Normalized information gain between class label set T, IG (f i The method comprises the steps of carrying out a first treatment on the surface of the T) represents the ith feature f in the initial feature dataset i Information gain, H (f) i ) Representing the ith feature f i And H (T) represents the information entropy of the class label set T.
S3, constructing tag test data of the target feature data set, and identifying a predicted tag of the tag test data by using a tag classifier in the trained tag analysis model.
In the embodiment of the invention, the label test data of the target characteristic data set is constructed to provide an operation object for the classification effect of the subsequent test label classifier. The label test data are used for testing the label classification effect and quality of a label classifier in the neural network model.
Further, in an optional embodiment of the present invention, the tag test data for constructing the target feature data set may be constructed by screening, according to semantic features of the target feature data set, data having the semantic features as the tag test data.
Further, in the embodiment of the present invention, the label test result may be obtained by identifying the predicted label of the label test data by using the label classifier in the trained label analysis model, so as to measure the classification quality of the label classifier. The tag analysis model is a highly parallel system model with strong self-adaptive learning capability and good robustness to system parameter changes and external interference of a controlled object, is independent of a mathematical model of the study object, and comprises an input layer, a hidden layer and an output layer, wherein the different layers are fully connected. The label classifier is used for judging the category to which a new observation sample belongs on the basis of the training data marked with the label category, such as a decision tree classifier, a selection tree classifier, a naive Bayesian classifier and the like.
Further, in an optional embodiment of the present invention, the method for identifying the predicted tag of the tag test data by using the tag classifier in the trained tag analysis model may test the prediction accuracy and reliability of the tag classifier, so as to provide a basis for subsequently evaluating the tag quality.
Further, in an optional embodiment of the present invention, the constructing of the tag classifier includes:
wherein, gamma t T training iteration weight vector representing label classifier, t represents training iteration number of label classifier, gamma t-1 T-1 th training iteration weight vector, x representing label classifier i The i-th training sample data vector is represented, ρ represents the label weight learning rate, and T represents the transpose operation symbol of the vector.
S4, inquiring a label quality analysis index of the label system, and calculating an index score of the label quality analysis index by using a label quality analysis function in the trained label analysis model according to the predicted label and the label test data.
In the embodiment of the invention, the label quality analysis index of the label system is queried to help control the label quality, so that a label manager and a label developer are guided to continuously improve the label quality to better serve users. The label quality analysis index refers to a standard capable of measuring the quality degree of the label-like label of the label system, such as concentration, stability, accuracy, richness, saturation, coverage of application personnel, usage, label service value and the like.
Further, according to an alternative embodiment of the present invention, the querying the tag quality analysis index of the tag system may be implemented by analyzing the service scenario and the service requirement of the tag system to determine the associated tag quality analysis index.
Further, in the embodiment of the present invention, by using the predicted tag and the tag test data and using the tag quality analysis function in the trained tag analysis model to calculate the index score of the tag quality analysis index, an intuitive quantitative analysis can be established for the tag quality of the tag system, so as to provide a guarantee for a tag system developer to adjust the decision for the tag. The tag quality analysis function is an index function for representing tag quality so as to evaluate the classification performance of the tags in the tag system, such as an accuracy function, a hamming loss function, an accuracy function, a recall function and the like.
Further, according to an alternative embodiment of the present invention, according to the predicted tag and the tag test data, an index score of the tag quality analysis index is calculated using a tag quality analysis function in the trained tag analysis model, where the tag quality analysis function is as follows:
Wherein Acc represents the accuracy score of label prediction, R j True tag, Z, representing jth tag test data j Represents the predictive label corresponding to the jth label test data, |R j ∩Z j I representsPredict correct number of tags, |R j ∪Z j The I represents the total number of the occurrence of the real label and the predicted label, the U represents an intersection symbol, the U represents a collection symbol, and the n represents the number of the label test data.
S5, calculating the quality score of each label in the label system according to the index score, and generating a label quality analysis report of the label system according to the quality score.
According to the embodiment of the invention, the quality score of each label in the label system is calculated according to the index score, so that the quality score of the whole label system can be obtained, the practical value of the label system is judged, and a guarantee is provided for further dynamically adjusting the labels of the label system.
Further, according to an alternative embodiment of the present invention, calculating a quality score of each tag in the tag system according to the index score includes: performing uniform processing on the index corresponding to the index score to obtain an index uniform score, and performing data standardization processing on the index uniform score to obtain an index standardization score; analyzing the importance degree of the index in the service scene of the tag system, and distributing the weight coefficient of the index according to the importance degree; and calculating the product sum of the weight coefficient and the index score according to the weight coefficient and the index standardization score to obtain the quality score of each label in the label system. The index consistency processing refers to a process of converting indexes with different properties (including positive indexes and reverse indexes) into indexes with the same properties, and the reverse index consistency processing can be realized by methods of reciprocal consistency, subtractive consistency and the like; the normalization processing means that the original data is converted according to a certain proportion by a certain mathematical transformation mode, so that the original data falls into a specific interval, such as [0,1], [ -1,1] and the like. The weight coefficient can be set according to the service scene or can be generated by a random function.
Further, the data normalization according to an alternative embodiment of the present invention may be implemented by a data normalization method, such as min-max normalization, log function transformation, atan function transformation, z-score normalization, fuzzy quantization, etc.
Further, in the embodiment of the present invention, by generating the label quality analysis report of the label system according to the quality score, the label quality can be analyzed more intuitively and quantitatively, so as to improve the accuracy of label classification of the label system, and more accurately serve users.
Further, according to an alternative embodiment of the present invention, generating a label quality analysis report of the label system according to the quality score includes: constructing a relation pair matrix of the index score and the corresponding weight, and marking class labels of the label system corresponding to the index score; and generating a label quality analysis report of the label system according to the relation pair matrix, the class labels and the quality scores.
It can be seen that, in the embodiment of the invention, by performing class tag identification on the tag system to be evaluated, providing an operation object for subsequent evaluation of tag quality, extracting sample data corresponding to the class tag according to the service scene can primarily remove redundant data in original data, and selecting a data set with distinction in the sample data as an initial characteristic data set, so as to further effectively remove redundant data in the sample data, and reserving a characteristic data set with higher classification capability; and then, according to the embodiment of the invention, the association degree of each feature in the initial feature data set and the label set can be measured by calculating the information gain between the initial feature data set and the class labels, and according to the information gain, the classification performance of the subsequent multi-label classifier is more effectively improved by selecting the target feature data set of the class labels from the initial feature data set, an operation object is provided for the classification effect of the subsequent test label classifier by constructing the label test data of the target feature data set, and the label test result can be obtained by identifying the predictive label of the label test data by using the label classifier in the trained label analysis model, so as to measure the classification quality of the label classifier. Further, the embodiment of the invention helps to control the label quality by inquiring the label quality analysis index of the label system, guides a label manager and a developer to continuously improve the label quality to better serve users, calculates the index score of the label quality analysis index by utilizing the label quality analysis function in the trained label analysis model according to the predicted label and the label test data, can establish visual quantitative analysis on the label quality of the label system, calculates the quality score of each label in the label system according to the index score to judge the practical value of the label system, provides a guarantee for further dynamically adjusting the label of the label system, and generates a label quality analysis report of the label system according to the quality score to analyze the label quality more intuitively and quantitatively so as to improve the label classification accuracy of the label system and more accurately serve users. Therefore, the intelligent tag quality analysis method, the intelligent tag quality analysis device, the electronic equipment and the storage medium provided by the embodiment of the invention can improve the accuracy of customer tag quality analysis of financial products with more customer classification dimensions.
FIG. 2 is a functional block diagram of the intelligent tag quality analyzer according to the present invention.
The tag quality intelligent analyzing apparatus 100 of the present invention may be mounted in an electronic device. Depending on the implemented functionality, the tag quality intelligent analysis device may include a tag-like identification module 101, a target feature data set generation module 102, a predictive tag identification module 103, an index score calculation module 104, and a tag quality analysis report generation module 105. The module according to the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the class label identification module 101 is configured to identify a service scenario of a label system to be evaluated and class labels corresponding to the service scenario, and extract sample data corresponding to the class labels according to the service scenario;
the target feature data set generating module 102 is configured to select a data set with a distinction degree in the sample data as an initial feature data set, calculate an information gain between the initial feature data set and the class label, and select a target feature data set of the class label from the initial feature data set according to the information gain;
The predicted tag identification module 103 is configured to construct tag test data of the target feature data set, and identify a predicted tag of the tag test data by using a tag classifier in a trained tag analysis model;
the index score calculation module 104 is configured to query a label quality analysis index of the label system, and calculate an index score of the label quality analysis index according to the predicted label and the label test data by using a label quality analysis function in the trained label analysis model;
the tag quality analysis report generating module 105 is configured to calculate a quality score of each tag in the tag system according to the index score, and generate a tag quality analysis report of the tag system according to the quality score.
In detail, the modules in the tag quality intelligent analysis apparatus 100 in the embodiment of the present invention use the same technical means as the tag quality intelligent analysis method described in fig. 1, and can produce the same technical effects, which are not described herein.
Fig. 3 is a schematic structural diagram of an electronic device 1 for implementing the intelligent analysis method of label quality according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a tag quality intelligent analyser program, stored in the memory 11 and executable on the processor 10.
The processor 10 may be formed by an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be formed by a plurality of integrated circuits packaged with the same function or different functions, including one or more central processing units (Central Processing unit, CPU), a microprocessor, a digital processing chip, a graphics processor, a combination of various control chips, and so on. The processor 10 is a Control Unit (Control Unit) of the electronic device 1, connects the respective components of the entire electronic device 1 using various interfaces and lines, executes or executes programs or modules (for example, a tag quality intelligent analysis program or the like) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.
The memory 11 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of a tag quality intelligent analysis program, etc., but also for temporarily storing data that has been output or is to be output.
The communication bus 12 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
The communication interface 13 is used for communication between the electronic device 1 and other devices, including a network interface and an employee interface. Optionally, the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device 1 and other electronic devices 1. The employee interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), or alternatively a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual staff interface.
Fig. 3 shows only an electronic device 1 with components, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
It should be understood that the embodiments described are for illustrative purposes only and are not limited in scope by this configuration.
The intelligent analysis program of tag quality stored in the memory 11 of the electronic device 1 is a combination of a plurality of computer programs, which when run in the processor 10, can implement:
Identifying a service scene of a label system to be evaluated and a class label corresponding to the service scene, and extracting sample data corresponding to the class label according to the service scene;
selecting a data set with distinction in the sample data as an initial characteristic data set, calculating information gain between the initial characteristic data set and the class label, and selecting a target characteristic data set of the class label from the initial characteristic data set according to the information gain;
constructing tag test data of the target feature data set, and identifying a predicted tag of the tag test data by using a tag classifier in a trained tag analysis model;
inquiring a tag quality analysis index of the tag system, and calculating an index score of the tag quality analysis index by using a tag quality analysis function in the trained tag analysis model according to the predicted tag and the tag test data;
and calculating the quality score of each label in the label system according to the index score, and generating a label quality analysis report of the label system according to the quality score.
In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the integrated modules/units of the electronic device 1 may be stored in a non-volatile computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. The computer readable storage medium may be volatile or nonvolatile. For example, the computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device 1, may implement:
identifying a service scene of a label system to be evaluated and a class label corresponding to the service scene, and extracting sample data corresponding to the class label according to the service scene;
selecting a data set with distinction in the sample data as an initial characteristic data set, calculating information gain between the initial characteristic data set and the class label, and selecting a target characteristic data set of the class label from the initial characteristic data set according to the information gain;
Constructing tag test data of the target feature data set, and identifying a predicted tag of the tag test data by using a tag classifier in a trained tag analysis model;
inquiring a tag quality analysis index of the tag system, and calculating an index score of the tag quality analysis index by using a tag quality analysis function in the trained tag analysis model according to the predicted tag and the tag test data;
and calculating the quality score of each label in the label system according to the index score, and generating a label quality analysis report of the label system according to the quality score.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. An intelligent analysis method for tag quality, which is characterized by comprising the following steps:
identifying a service scene of a label system to be evaluated and a class label corresponding to the service scene, and extracting sample data corresponding to the class label according to the service scene;
selecting a data set with distinction in the sample data as an initial characteristic data set, calculating information gain between the initial characteristic data set and the class label, and selecting a target characteristic data set of the class label from the initial characteristic data set according to the information gain;
Constructing tag test data of the target feature data set, and identifying a predicted tag of the tag test data by using a tag classifier in a trained tag analysis model;
inquiring a tag quality analysis index of the tag system, and calculating an index score of the tag quality analysis index by using a tag quality analysis function in the trained tag analysis model according to the predicted tag and the tag test data;
and calculating the quality score of each label in the label system according to the index score, and generating a label quality analysis report of the label system according to the quality score.
2. The method of claim 1, wherein said calculating information gain between said initial feature dataset and said class labels comprises:
calculating the information gain between the initial feature data set and the class labels using the formula:
IG(f i ;T)=H(f i )+H(T)-H(f i ,T)
wherein IG (f i The method comprises the steps of carrying out a first treatment on the surface of the T) represents the ith feature f in the initial feature dataset i Information gain, H (f) i ) Representing the ith feature f i H (T) represents the information entropy of class label set T, H (f) i T) represents the ith feature f i And (5) joint information entropy of the class label set T.
3. The method for intelligent analysis of tag quality according to claim 1, wherein selecting the target feature data set of the class tag from the initial feature data set according to the information gain comprises:
normalizing the information gain to obtain a normalized gain;
calculating the average value of the normalized gain, taking the average value as a threshold value, selecting the features in the initial feature data set which are not smaller than the threshold value, and generating a target feature data set.
4. The method for intelligent analysis of tag quality according to claim 1, wherein said normalizing the information gain to obtain a normalized gain comprises:
normalization of information gain using the following formula:
wherein SU (f) i T) represents the ith feature f in the initial feature dataset i Normalized gain between class label set T, IG (f i The method comprises the steps of carrying out a first treatment on the surface of the T) represents the ith feature f in the initial feature dataset i Information gain, H (f) i ) Representing the ith feature f i And H (T) represents the information entropy of the class label set T.
5. The method for intelligent analysis of tag quality according to claim 1, wherein the constructing of the tag classifier comprises:
Wherein, gamma t T training iteration weight vector representing label classifier, t representing labelNumber of training iterations, gamma, of signature classifier t-1 T-1 th training iteration weight vector, x representing label classifier i The i-th training sample data vector is represented, ρ represents the label weight learning rate, and T represents the transpose operation symbol of the vector.
6. The method of claim 1, wherein the tag quality analysis function comprises:
wherein Acc represents the accuracy score of label prediction, R j True tag, Z, representing jth tag test data j Represents the predictive label corresponding to the jth label test data, |R j ∩Z j I represents the number of labels predicted correctly, R j ∪Z j The I represents the total number of the occurrence of the real label and the predicted label, the U represents an intersection symbol, the U represents a collection symbol, and the n represents the number of the label test data.
7. The method of claim 1, wherein calculating a quality score for each tag in the tag system based on the index score comprises:
performing uniform processing on the index corresponding to the index score to obtain an index uniform score, and performing data standardization processing on the index uniform score to obtain an index standardization score;
Analyzing the importance degree of the index in the service scene of the tag system, and distributing the weight coefficient of the index according to the importance degree;
and calculating the product sum of the weight coefficient and the index score according to the weight coefficient and the index standardization score to obtain the quality score of each label in the label system.
8. An intelligent tag quality analysis device, the device comprising:
the system comprises a class label identification module, a class label identification module and a data processing module, wherein the class label identification module is used for identifying a service scene of a label system to be evaluated and class labels corresponding to the service scene, and extracting sample data corresponding to the class labels according to the service scene;
the target feature data set generation module is used for selecting a data set with distinction in the sample data as an initial feature data set, calculating information gain between the initial feature data set and the class label, and selecting the target feature data set of the class label from the initial feature data set according to the information gain;
the predicted tag identification module is used for constructing tag test data of the target characteristic data set and identifying predicted tags of the tag test data by using a tag classifier in the trained tag analysis model;
The index score calculation module is used for inquiring the label quality analysis index of the label system and calculating the index score of the label quality analysis index by utilizing the label quality analysis function in the trained label analysis model according to the predicted label and the label test data;
and the label quality analysis report generation module is used for calculating the quality score of each label in the label system according to the index score and generating a label quality analysis report of the label system according to the quality score.
9. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the tag quality intelligent analysis method of any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the tag quality intelligent analysis method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310702551.1A CN116720118A (en) | 2023-06-13 | 2023-06-13 | Label quality intelligent analysis method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310702551.1A CN116720118A (en) | 2023-06-13 | 2023-06-13 | Label quality intelligent analysis method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116720118A true CN116720118A (en) | 2023-09-08 |
Family
ID=87871161
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310702551.1A Pending CN116720118A (en) | 2023-06-13 | 2023-06-13 | Label quality intelligent analysis method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116720118A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118225254A (en) * | 2024-05-07 | 2024-06-21 | 南京海汇装备科技有限公司 | Multi-parameter anomaly calibration system and method applied to thermal infrared imager |
-
2023
- 2023-06-13 CN CN202310702551.1A patent/CN116720118A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118225254A (en) * | 2024-05-07 | 2024-06-21 | 南京海汇装备科技有限公司 | Multi-parameter anomaly calibration system and method applied to thermal infrared imager |
CN118225254B (en) * | 2024-05-07 | 2024-11-08 | 南京海汇装备科技有限公司 | Multi-parameter anomaly calibration system and method applied to thermal infrared imager |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110704572B (en) | Suspected illegal fundraising risk early warning method, device, equipment and storage medium | |
CN109409677A (en) | Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium | |
CN111401777A (en) | Enterprise risk assessment method and device, terminal equipment and storage medium | |
CN115002200B (en) | Message pushing method, device, equipment and storage medium based on user portrait | |
CN114648392B (en) | Product recommendation method and device based on user portrait, electronic equipment and medium | |
CN115391669B (en) | Intelligent recommendation method and device and electronic equipment | |
CN112488507B (en) | Expert classification portrait method and device based on clustering and storage medium | |
CN113626607B (en) | Abnormal work order identification method and device, electronic equipment and readable storage medium | |
US20190080352A1 (en) | Segment Extension Based on Lookalike Selection | |
CN115081025A (en) | Sensitive data management method and device based on digital middlebox and electronic equipment | |
CN113887930B (en) | Question-answering robot health evaluation method, device, equipment and storage medium | |
CN117217812A (en) | User behavior prediction method and device, computer equipment and storage medium | |
Lo et al. | Do polluting firms suffer long term? Can government use data‐driven inspection policies to catch polluters? | |
CN116720118A (en) | Label quality intelligent analysis method and device, electronic equipment and storage medium | |
CN111784053A (en) | Transaction risk detection method, device and readable storage medium | |
CN114756669A (en) | Intelligent analysis method and device for problem intention, electronic equipment and storage medium | |
CN112836754A (en) | Image description model generalization capability evaluation method | |
Liu et al. | MuST: An interpretable multidimensional strain theory model for corporate misreporting prediction | |
CN116308416A (en) | Empty shell enterprise identification method and system | |
CN115099680A (en) | Risk management method, device, equipment and storage medium | |
CN113849464A (en) | Information processing method and apparatus | |
CN116484230B (en) | Method for identifying abnormal business data and training method of AI digital person | |
CN114781833B (en) | Capability assessment method, device and equipment based on business personnel and storage medium | |
Zhang et al. | Application of Adaboost Algorithm in Enterprise Financial Risk Analysis Model | |
CN114239595B (en) | Intelligent return visit list generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |