[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114722189A - Multi-label unbalanced text classification method in budget execution audit - Google Patents

Multi-label unbalanced text classification method in budget execution audit Download PDF

Info

Publication number
CN114722189A
CN114722189A CN202111534284.9A CN202111534284A CN114722189A CN 114722189 A CN114722189 A CN 114722189A CN 202111534284 A CN202111534284 A CN 202111534284A CN 114722189 A CN114722189 A CN 114722189A
Authority
CN
China
Prior art keywords
label
word
training
sentence
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111534284.9A
Other languages
Chinese (zh)
Other versions
CN114722189B (en
Inventor
伍之昂
张璐
方昌健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Weishen Information Technology Co ltd
NANJING AUDIT UNIVERSITY
Original Assignee
Guangdong Weishen Information Technology Co ltd
NANJING AUDIT UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Weishen Information Technology Co ltd, NANJING AUDIT UNIVERSITY filed Critical Guangdong Weishen Information Technology Co ltd
Priority to CN202111534284.9A priority Critical patent/CN114722189B/en
Publication of CN114722189A publication Critical patent/CN114722189A/en
Application granted granted Critical
Publication of CN114722189B publication Critical patent/CN114722189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/381Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for classifying multi-label unbalanced texts in budget execution audit, which comprises the following steps: constructing a keyword library in the budget execution and audit field, selecting seed words from the keyword library as label descriptions, then performing word segmentation based on a word segmentation tool and the keyword library, and calculating labels and embedded matrixes corresponding to the word segmentation; building a similarity matrix of the neural network calculation words, phrases and labels (namely label description), solving word weight based on the built pooling layer, solving a sentence embedding matrix by combining the word embedding matrix, and outputting the sentence embedding matrix to a classifier to obtain a prediction result; unbalanced data weight is introduced into the loss function, label description is added into the loss function to strengthen learning of small categories and labels, a model is obtained by training with the minimum loss function as a target, and payment abstract text data of unknown labels can be effectively classified. The invention effectively solves the problem of multi-label unbalanced classification of the payment voucher abstract text in budget execution audit.

Description

Multi-label unbalanced text classification method in budget execution audit
Technical Field
The invention relates to the field of text classification, in particular to a multi-label unbalanced text classification method in budget execution audit.
Background
In financial budget performance audits, payment summaries of money need to be sorted to identify whether their use is consistent with the budget items, to review payment compliance, and even to identify high-risk transactions. At present, a large amount of text classification work still depends on manual labeling of auditors, and the explosive growth of audit data under a large data environment is difficult to deal with more and more. Although the research on the text classification problem has been long, it is still clear that audit is performed completely facing budget so as to develop research and application of payment summary text classification in a targeted manner, and a general text classification algorithm and a general text classification tool are obviously difficult to be completely applied to the field with extremely strong specialty. The problems that text professional vocabularies in the audit field are more, budget subject categories are more, sample sizes are unbalanced and the like exist in a text analysis scene in budget execution audit, and meanwhile, the traditional text classification method is difficult to capture the importance degree of different words influencing a classification model by using an unsupervised sentence representation mechanism based on average word vectors. Aiming at the problems, the invention provides a multi-label unbalanced text classification method in budget execution audit, which integrates sentence representation learning and multi-label unbalanced classification model training in a supervised learning mode, is expected to quickly and accurately solve the classification problem of payment purpose abstract and improves the efficiency of audit work.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a method for classifying multi-label unbalanced texts in budget execution audit, which can solve the problem of classifying the multi-label unbalanced texts in the budget execution audit.
The technical scheme is as follows:
a multi-label unbalanced text classification method in budget execution audit comprises the following steps:
the method comprises the following steps: data preprocessing and word embedding training to obtainInput data for the model: giving abstract text data of the payment voucher with the label, wherein the number of samples is different among different categories, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting a representative seed word from the keyword library as a description of a label; performing word segmentation on the text by using a word bank and a word segmentation tool, completing pre-training of word embedding vectors on full audit text data, and obtaining a word matrix Ei=[ei1,…,eiL]TWherein i is the serial number of the sentence, L is the serial number of the word in the sentence, and L is the length of the sentence, mapping the seed words to the word embedding matrix, and then averaging the word embedding matrix of each category seed word to obtain the embedding matrix L of all the tags1,…,lK]T
Step two: constructing a model, and constructing a classification framework of the multi-label unbalanced text: firstly, model construction is carried out, a similarity matrix is obtained by utilizing words and labels in sentences, then, the neural network is used for calculating the similarity of context information, namely the phrases and the labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, a proper sentence embedding matrix can be obtained after the training process is completed, namely, the sentence embedding matrix fused with the domain knowledge, and the formula is as follows:
Figure BDA0003412580650000021
wherein ZiFor the embedding matrix of the ith sentence, f1To be EiL input, ZiA mapping function that is an output;
the sentence embedding matrix is then used as input to classify the sentence using a classifier where 2 sets of parameters need to be trained, i.e., W2And b2(ii) a The formula is as follows:
Figure BDA0003412580650000022
wherein
Figure BDA0003412580650000023
As a sentence ZiProbability demonstration of the corresponding category of the prediction, f2Is represented by ZiInput,
Figure BDA0003412580650000024
A mapping function that is an output;
step three: constructing a target function with sentence embedding and unbalanced multi-classification unification, and guiding neural network training; using a cross entropy loss function as a basic objective function, introducing weight data to make the loss function biased to a small category, strengthening the training of the small category by a classifier, finally, embedding a label word into a matrix, introducing the label word into the loss function to strengthen the learning of a label, and realizing the training of a model by taking a minimized currently constructed unbalanced objective function as a target; after training, effectively classifying the payment abstract text data of the unknown label;
further, in the second step, a model is constructed, and a classification framework of the multi-label unbalanced text is built: firstly, model construction is carried out, a similarity matrix is obtained by utilizing words and labels in sentences, then, the neural network is used for calculating the similarity of context information, namely the phrases and the labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, and a proper sentence embedding matrix can be obtained after the training process is finished, namely the sentence embedding matrix with the domain knowledge fused;
the method specifically comprises the following steps: in the first stage, firstly, a similarity matrix is obtained, and the formula is as follows:
Figure BDA0003412580650000025
similarity matrix GiIs L × KWherein | | · | | represents L2And (4) norm.
Then, the similarity between the phrases containing context semantics in the sentence and the tags is calculated, and the formula is as follows:
Figure BDA0003412580650000026
wherein j represents the sequence number of the word at the center of the phrase, j-p, j + p is the sequence number of the leftmost and rightmost words of the phrase, W1And b1Performing iterative training for two groups of parameters in the neural network in the training process;
then, calculating a related weight value matrix of the word:
Figure BDA0003412580650000031
wherein c isjkSimilarity of the jth word with the corresponding kth category label;
for beta againjA normalization calculation was performed, the formula is as follows:
Figure BDA0003412580650000032
where exp represents an exponential function with e as base, betaj′The similarity value of the jth word in the sentence is shown;
finally, an embedding matrix of the sentence is obtained, and the formula is as follows:
Figure BDA0003412580650000033
the above process is expressed as equation (1) as a whole;
in the second stage, a three-layer full-connection layer neural network classifier is built, and an embedding matrix Z of sentences is constructediInputting the classifier, training to obtain effective prediction output
Figure BDA0003412580650000034
The overall process is expressed as formula (2);
further, in the third step, a sentence embedding and unbalanced multi-classification unified target function is constructed, and neural network training is guided. And finally, embedding the label words into the loss function to strengthen the learning of the labels, and realizing the training of the model by taking the minimized unbalanced objective function constructed at present as a target 99as standard. After training, the payment abstract text data of the unknown label can be effectively classified;
the method specifically comprises the following steps: first, the inverse weight of each category is calculated, and the formula is as follows:
Figure BDA0003412580650000035
where c (-) is the number of samples in the class, mean (-) represents the median, ykA label vector representing class k, the number of samples of class k' being the median of the number of all classes, yk′A label vector representing class k';
and then smoothing the reverse weight to obtain a final weight vector, wherein the formula is as follows:
Figure BDA0003412580650000036
wherein S (-) represents a sigmoid function, rkIs the inverse weight of the kth class, rk′The inverse weight for the kth' category;
then, a weight vector is introduced to construct a loss function, and the formula is as follows:
Figure BDA0003412580650000037
wherein N is a sentence in the data setTotal number, CE (·) is a cross-entropy loss function;
Figure BDA0003412580650000043
Figure BDA0003412580650000044
the meaning of (d) is that the function f can be decomposed into two parts: f. of1And f2By a function f1As a function f2The input of (1); y isiIs the actual label matrix of the ith sentence, sigma is the weight vector, sigmaTRepresenting the transpose of the weight vector, yikThe value of the k-th tag, which represents the ith sentence, is 1 corresponding to the actual tag position, 0 for the remaining positions,
Figure BDA0003412580650000041
a predicted probability of a k-th tag representing an ith sentence;
in order to improve the importance of the label in training, a special label loss function is added, and the formula is as follows:
Figure BDA0003412580650000042
where k is the serial number of the corresponding category, α is the penalty coefficient, ykIs a category label matrix;
finally, the model is trained based on the Adam algorithm with the objective of minimizing equation (11).
Has the advantages that: the invention effectively solves the problem of multi-label unbalanced classification of the payment voucher abstract text in budget execution audit, obviously improves the recall rate and the overall performance on subclasses in the introduction of label similarity calculation, and greatly improves the efficiency of auditors for checking budget execution compliance and identifying high-risk transactions.
Drawings
Fig. 1 is a flowchart of an unbalanced text classification method for the audit field according to an embodiment of the present invention.
Fig. 2 is a diagram illustrating a neural network framework according to a first embodiment of the present invention.
FIG. 3 is a schematic diagram of a model training process according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings. Fig. 1 is a diagram illustrating an unbalanced text classification method for the audit field according to an embodiment of the present invention. As shown in fig. 1, the present embodiment includes the following steps:
the method comprises the following steps: data preprocessing and word embedding training are carried out to obtain input data of the model; giving abstract text data of the payment voucher with the label, wherein the number of samples is different among different categories, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting a representative seed word from the keyword library as a description of a label; performing word segmentation on the text by using a word bank and a word segmentation tool, completing pre-training of word embedding vectors on full audit text data, and obtaining a word matrix Ei=[ei1,…,eiL]TWherein i is the serial number of the sentence, L is the serial number of the word in the sentence, the seed words are mapped to the word embedding matrix, and then the word embedding matrix of each class seed word is averaged to obtain the embedding matrix L of all the labels [ L ═ L1,…,lK]T
Step two: constructing a model, and constructing a classification frame of the multi-label unbalanced text; firstly, model construction is carried out, as shown in FIG. 2, a similarity matrix is solved by using words and labels in sentences; then using neural network to calculate similarity of context information, i.e. phrase and label, there are 2 sets of parameters W1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, a proper sentence embedding matrix can be obtained after the training process is completed, namely, the sentence embedding matrix fused with the domain knowledge, and the formula is as follows:
Figure BDA0003412580650000055
wherein ZiFor the embedding matrix of the ith sentence, f1To be EiL input, ZiA mapping function that is an output;
finally, the sentence embedding matrix is used as input to classify the sentences by using a classifier, wherein 2 sets of parameters need to be trained, namely W2And b2The formula is as follows:
Figure BDA0003412580650000051
wherein
Figure BDA0003412580650000052
As a sentence ZiPredicted corresponding class probability matrix, f2Is represented by ZiInput,
Figure BDA0003412580650000053
A mapping function that is an output;
step three: and constructing a sentence embedding and unbalanced multi-classification unified target function to guide neural network training. And finally, embedding the label words into the loss function to strengthen the learning of the label and realize the training of the model by taking the currently constructed unbalanced objective function as a target. After training, the payment abstract text data of the unknown label can be effectively classified;
in a specific embodiment, a method for classifying text with multiple labels and imbalance in budget execution audit is described in detail:
firstly, executing audit text data according to the existing budget, segmenting sentences by utilizing a segmentation tool LAC (lexical Analysis of Chinese), counting corresponding word frequencies in all categories, and constructing a budget execution and audit field keyword library and seed words according to segmentation results and a collected professional field word library:
the key word library and the seed words in the field of budget execution and audit are shown in the following table:
Figure BDA0003412580650000054
performing word segmentation results obtained by using LAC based on budget execution audit field word stock and conventional stop words, as shown in the following table;
serial number Sentence Word segmentation result
1 Lodging fee for Shenzhen specialist attending zhushai following project Lodging fee for Shenzhen specialist attending zhushai following project
And characterizing the seed words by using CBOW (contents Bag of words) to obtain an embedded matrix corresponding to the label. Taking the travel category as an example, the embedded matrix of the seed words and the embedded matrix of the tags are shown in the following table:
Figure BDA0003412580650000061
the average value of the seed word embedding matrix in the traveling fare category is obtained to obtain the embedding matrix of the label, which is shown in the following table:
Figure BDA0003412580650000062
then, the word segmentation result is characterized by CBOW to obtain an embedding matrix corresponding to the word, which is shown in the following table:
Figure BDA0003412580650000063
the data is divided into a training set and a test set according to the scores, the training set is input into a model for training, and the training process is shown in fig. 3.
After the training is finished, inputting the test set into the trained model, and calculating to obtain betajAfter the sentence embedding matrix is introduced as a weight, a sentence embedding matrix is obtained by calculation, and the following table shows that:
Figure BDA0003412580650000064
Figure BDA0003412580650000071
the final prediction result obtained after the sentence is embedded into the matrix input classifier is shown in the following table:
Figure BDA0003412580650000072
overall predicted results, as shown in the following table:
Precision Recall F1-score support
five-risk one-gold 0.965 0.971 0.968 17573
Salary and subsidy of personnel 0.905 0.907 0.906 11075
Office expenses 0.931 0.905 0.918 3955
Property management fee 0.874 0.873 0.874 1983
Cost of infrastructure 0.896 0.791 0.840 826
Travelling fee 0.780 0.751 0.765 719
Special procurement 0.697 0.685 0.677 691
Official expenses 0.645 0.690 0.667 519
Others 0.500 0.757 0.602 189
Macro Avg 0.799 0.811 0.805 37530
Weigthed Avg 0.922 0.921 0.921 37530
Big Avg 0.911 0.867 0.888 15856
Small Avg 0.743 0.783 0.759 21674
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (3)

1. A multi-label unbalanced text classification method in budget execution audit is characterized by comprising the following steps:
the method comprises the following steps: data preprocessing and word embedding training are carried out, and input data of a model are obtained: giving abstract text data of the payment voucher with the label, wherein the number of samples is different among different types, and the number of the types in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting a representative seed word from the keyword library as a description of a label; performing word segmentation on the text by using a word bank and a word segmentation tool, completing pre-training of word embedding vectors on full audit text data, and obtaining a word matrix Ei=[ei1,…,eiL]TWherein i is the serial number of the sentence, L is the serial number of the word in the sentence, and L is the length of the sentence, mapping the seed words to the word embedding matrix, and then averaging the word embedding matrix of each category seed word to obtain the embedding matrix L of all the tags1,…,lK]T
Step two: constructing a model, and constructing a classification frame of the multi-label unbalanced text; firstly, model construction is carried outEstablishing, using words and labels in sentences to find a similarity matrix, and then using a neural network to calculate the similarity of context information, namely phrases and labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, a proper sentence embedding matrix can be obtained after the training process is completed, namely, the sentence embedding matrix fused with the domain knowledge, and the formula is as follows:
Figure FDA0003412580640000014
wherein ZiFor the embedding matrix of the ith sentence, f1To be EiL input, ZiA mapping function that is an output;
the sentence embedding matrix is then used as input to classify the sentence using a classifier where 2 sets of parameters need to be trained, i.e., W2And b2The formula is as follows:
Figure FDA0003412580640000011
wherein
Figure FDA0003412580640000012
As a sentence ZiPredicted corresponding class probability matrix, f2Is represented by ZiInput,
Figure FDA0003412580640000013
A mapping function that is an output;
step three: constructing a target function with sentence embedding and unbalanced multi-classification unification, and guiding neural network training; using a cross entropy loss function as a basic objective function, introducing weight data to make the loss function biased to a small category, strengthening the training of the small category by a classifier, finally, embedding a label word into a matrix, introducing the label word into the loss function to strengthen the learning of a label, and realizing the training of a model by taking a minimized currently constructed unbalanced objective function as a target; and after training, effectively classifying the payment abstract text data of the unknown label.
2. The method for classifying the multi-label unbalanced text in the budget execution audit as claimed in claim 1, wherein in the second step, a model is constructed, and a classification framework of the multi-label unbalanced text is built: firstly, model construction is carried out, a similarity matrix is obtained by utilizing words and labels in sentences, then, the neural network is used for calculating the similarity of context information, namely the phrases and the labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, and a proper sentence embedding matrix can be obtained after the training process is finished, namely the sentence embedding matrix with the domain knowledge fused;
the method specifically comprises the following steps: in the first stage, firstly, a similarity matrix is obtained, and the formula is as follows:
Figure FDA0003412580640000021
similarity matrix GiIs LxK, where | | · | | | represents L2A norm;
then, the similarity between the phrases containing context semantics in the sentence and the tags is calculated, and the formula is as follows:
ci=ReLU(Gi,j-p:j+pW1 T+b1),1≤j≤L (4)
wherein j represents the sequence number of the word at the center of the phrase, j-p, j + p is the sequence number of the leftmost and rightmost words of the phrase, W1And b1Performing iterative training for two groups of parameters in the neural network in the training process;
then, calculating a related weight value matrix of the word:
Figure FDA0003412580640000022
wherein c isjkSimilarity of the phrase corresponding to the jth word and the corresponding kth category label is obtained;
for beta againjA normalization calculation was performed, the formula is as follows:
Figure FDA0003412580640000023
where exp represents an exponential function with e as base, betaj′The similarity value of the jth word in the sentence is shown;
finally, an embedding matrix of the sentence is obtained, and the formula is as follows:
Figure FDA0003412580640000024
the above process is expressed as equation (1) as a whole;
in the second stage, a three-layer full-connection layer neural network classifier is built, and an embedding matrix Z of sentences is constructediInputting into classifier, training, and obtaining effective prediction output
Figure FDA0003412580640000025
The overall process is represented as equation (2).
3. The method for classifying the multi-label unbalanced text in budget execution audit, according to the claim 1, is characterized in that in the third step, a sentence is constructed and embedded into an objective function unified with the unbalanced multi-classification, and neural network training is guided; using a cross entropy loss function as a basic objective function, introducing weight data to make the loss function biased to a small category, strengthening the training of a classifier on the small category, finally, embedding a label word into the loss function to strengthen the learning of the label, and realizing the training of a model by taking a minimized currently constructed unbalanced objective function as a target; after training, effectively classifying the payment abstract text data of the unknown label;
the method specifically comprises the following steps: first, the inverse weight of each category is calculated, and the formula is as follows:
Figure FDA0003412580640000031
where c (-) is the number of samples in the class, mean (-) represents the median, ykA label vector representing class k, the number of samples of class k' being the median of the number of all classes, yk′A label vector representing class k';
and then smoothing the reverse weight to obtain a final weight vector, wherein the formula is as follows:
Figure FDA0003412580640000032
wherein S (-) represents a sigmoid function, rkIs the inverse weight of the kth class, rk′The inverse weight for the kth' category;
then, a weight vector is introduced to construct a loss function, and the formula is as follows:
Figure FDA0003412580640000033
where N is the total number of sentences in the dataset and CE (-) is the cross entropy loss function;
Figure FDA0003412580640000034
Figure FDA0003412580640000035
the meaning of (d) is that the function f can be decomposed into two parts: f. of1And f2By a function f1As a function f2The input of (1); y isiAs the fact of the ith sentenceThe inter-label matrix, Σ, is a weight vector, ΣTRepresenting the transpose of the weight vector, yikThe value of the k-th tag, which represents the ith sentence, is 1 corresponding to the actual tag position, 0 for the remaining positions,
Figure FDA0003412580640000036
a predicted probability of a k-th tag representing an ith sentence;
in order to improve the importance of the label in training, a label loss function is added, and the formula is as follows:
Figure FDA0003412580640000037
where k is the serial number of the corresponding category, α is the penalty coefficient, ykIs a category label matrix;
and finally training the model based on the Adam algorithm and aiming at minimizing the formula (11).
CN202111534284.9A 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit Active CN114722189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111534284.9A CN114722189B (en) 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111534284.9A CN114722189B (en) 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit

Publications (2)

Publication Number Publication Date
CN114722189A true CN114722189A (en) 2022-07-08
CN114722189B CN114722189B (en) 2023-06-23

Family

ID=82236185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111534284.9A Active CN114722189B (en) 2021-12-15 2021-12-15 Multi-label unbalanced text classification method in budget execution audit

Country Status (1)

Country Link
CN (1) CN114722189B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008102737A (en) * 2006-10-19 2008-05-01 Nippon Telegr & Teleph Corp <Ntt> Stored document classification apparatus, stored document classification method, program, and recording medium
CN110609898A (en) * 2019-08-19 2019-12-24 中国科学院重庆绿色智能技术研究院 Self-classification method for unbalanced text data
CN111737476A (en) * 2020-08-05 2020-10-02 腾讯科技(深圳)有限公司 Text processing method and device, computer readable storage medium and electronic equipment
US20200327193A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Displaying text classification anomalies predicted by a text classification model
US20200327381A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Evaluating text classification anomalies predicted by a text classification model
US20210042580A1 (en) * 2018-10-10 2021-02-11 Tencent Technology (Shenzhen) Company Limited Model training method and apparatus for image recognition, network device, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008102737A (en) * 2006-10-19 2008-05-01 Nippon Telegr & Teleph Corp <Ntt> Stored document classification apparatus, stored document classification method, program, and recording medium
US20210042580A1 (en) * 2018-10-10 2021-02-11 Tencent Technology (Shenzhen) Company Limited Model training method and apparatus for image recognition, network device, and storage medium
US20200327193A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Displaying text classification anomalies predicted by a text classification model
US20200327381A1 (en) * 2019-04-10 2020-10-15 International Business Machines Corporation Evaluating text classification anomalies predicted by a text classification model
CN110609898A (en) * 2019-08-19 2019-12-24 中国科学院重庆绿色智能技术研究院 Self-classification method for unbalanced text data
CN111737476A (en) * 2020-08-05 2020-10-02 腾讯科技(深圳)有限公司 Text processing method and device, computer readable storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHIANG WU ET AL: "hPSD: A Hybrid PU-Learning-Based Spammer Detection Model for Product Reviews", vol. 50, no. 4, pages 1595 - 1606, XP011774311, DOI: 10.1109/TCYB.2018.2877161 *
陈志等: "不平衡训练数据下的基于深度学习的文本分类", vol. 41, no. 1, pages 1 - 5 *

Also Published As

Publication number Publication date
CN114722189B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN107861951A (en) Session subject identifying method in intelligent customer service
CN111177374A (en) Active learning-based question and answer corpus emotion classification method and system
CN106980608A (en) A kind of Chinese electronic health record participle and name entity recognition method and system
Yang et al. Automatic academic paper rating based on modularized hierarchical convolutional neural network
CN113255321B (en) Financial field chapter-level event extraction method based on article entity word dependency relationship
CN108875809A (en) The biomedical entity relationship classification method of joint attention mechanism and neural network
CN110807084A (en) Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
CN113420145B (en) Semi-supervised learning-based bid-bidding text classification method and system
CN109492105B (en) Text emotion classification method based on multi-feature ensemble learning
CN108959305A (en) A kind of event extraction method and system based on internet big data
CN111222318A (en) Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
CN109492230A (en) A method of insurance contract key message is extracted based on textview field convolutional neural networks interested
CN110781297B (en) Classification method of multi-label scientific research papers based on hierarchical discriminant trees
CN111191031A (en) Entity relation classification method of unstructured text based on WordNet and IDF
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN112989830B (en) Named entity identification method based on multiple features and machine learning
CN115544252A (en) Text emotion classification method based on attention static routing capsule network
Yu et al. Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt
CN113312907B (en) Remote supervision relation extraction method and device based on hybrid neural network
CN110287495A (en) A kind of power marketing profession word recognition method and system
CN110245234A (en) A kind of multi-source data sample correlating method based on ontology and semantic similarity
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN114722810A (en) Real estate customer portrait method and system based on information extraction and multi-attribute decision
Gillmann et al. Quantification of Economic Uncertainty: a deep learning approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant