CN114722189A - Multi-label unbalanced text classification method in budget execution audit - Google Patents
Multi-label unbalanced text classification method in budget execution audit Download PDFInfo
- Publication number
- CN114722189A CN114722189A CN202111534284.9A CN202111534284A CN114722189A CN 114722189 A CN114722189 A CN 114722189A CN 202111534284 A CN202111534284 A CN 202111534284A CN 114722189 A CN114722189 A CN 114722189A
- Authority
- CN
- China
- Prior art keywords
- label
- word
- training
- sentence
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/381—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using identifiers, e.g. barcodes, RFIDs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for classifying multi-label unbalanced texts in budget execution audit, which comprises the following steps: constructing a keyword library in the budget execution and audit field, selecting seed words from the keyword library as label descriptions, then performing word segmentation based on a word segmentation tool and the keyword library, and calculating labels and embedded matrixes corresponding to the word segmentation; building a similarity matrix of the neural network calculation words, phrases and labels (namely label description), solving word weight based on the built pooling layer, solving a sentence embedding matrix by combining the word embedding matrix, and outputting the sentence embedding matrix to a classifier to obtain a prediction result; unbalanced data weight is introduced into the loss function, label description is added into the loss function to strengthen learning of small categories and labels, a model is obtained by training with the minimum loss function as a target, and payment abstract text data of unknown labels can be effectively classified. The invention effectively solves the problem of multi-label unbalanced classification of the payment voucher abstract text in budget execution audit.
Description
Technical Field
The invention relates to the field of text classification, in particular to a multi-label unbalanced text classification method in budget execution audit.
Background
In financial budget performance audits, payment summaries of money need to be sorted to identify whether their use is consistent with the budget items, to review payment compliance, and even to identify high-risk transactions. At present, a large amount of text classification work still depends on manual labeling of auditors, and the explosive growth of audit data under a large data environment is difficult to deal with more and more. Although the research on the text classification problem has been long, it is still clear that audit is performed completely facing budget so as to develop research and application of payment summary text classification in a targeted manner, and a general text classification algorithm and a general text classification tool are obviously difficult to be completely applied to the field with extremely strong specialty. The problems that text professional vocabularies in the audit field are more, budget subject categories are more, sample sizes are unbalanced and the like exist in a text analysis scene in budget execution audit, and meanwhile, the traditional text classification method is difficult to capture the importance degree of different words influencing a classification model by using an unsupervised sentence representation mechanism based on average word vectors. Aiming at the problems, the invention provides a multi-label unbalanced text classification method in budget execution audit, which integrates sentence representation learning and multi-label unbalanced classification model training in a supervised learning mode, is expected to quickly and accurately solve the classification problem of payment purpose abstract and improves the efficiency of audit work.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a method for classifying multi-label unbalanced texts in budget execution audit, which can solve the problem of classifying the multi-label unbalanced texts in the budget execution audit.
The technical scheme is as follows:
a multi-label unbalanced text classification method in budget execution audit comprises the following steps:
the method comprises the following steps: data preprocessing and word embedding training to obtainInput data for the model: giving abstract text data of the payment voucher with the label, wherein the number of samples is different among different categories, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting a representative seed word from the keyword library as a description of a label; performing word segmentation on the text by using a word bank and a word segmentation tool, completing pre-training of word embedding vectors on full audit text data, and obtaining a word matrix Ei=[ei1,…,eiL]TWherein i is the serial number of the sentence, L is the serial number of the word in the sentence, and L is the length of the sentence, mapping the seed words to the word embedding matrix, and then averaging the word embedding matrix of each category seed word to obtain the embedding matrix L of all the tags1,…,lK]T;
Step two: constructing a model, and constructing a classification framework of the multi-label unbalanced text: firstly, model construction is carried out, a similarity matrix is obtained by utilizing words and labels in sentences, then, the neural network is used for calculating the similarity of context information, namely the phrases and the labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, a proper sentence embedding matrix can be obtained after the training process is completed, namely, the sentence embedding matrix fused with the domain knowledge, and the formula is as follows:
wherein ZiFor the embedding matrix of the ith sentence, f1To be EiL input, ZiA mapping function that is an output;
the sentence embedding matrix is then used as input to classify the sentence using a classifier where 2 sets of parameters need to be trained, i.e., W2And b2(ii) a The formula is as follows:
whereinAs a sentence ZiProbability demonstration of the corresponding category of the prediction, f2Is represented by ZiInput,A mapping function that is an output;
step three: constructing a target function with sentence embedding and unbalanced multi-classification unification, and guiding neural network training; using a cross entropy loss function as a basic objective function, introducing weight data to make the loss function biased to a small category, strengthening the training of the small category by a classifier, finally, embedding a label word into a matrix, introducing the label word into the loss function to strengthen the learning of a label, and realizing the training of a model by taking a minimized currently constructed unbalanced objective function as a target; after training, effectively classifying the payment abstract text data of the unknown label;
further, in the second step, a model is constructed, and a classification framework of the multi-label unbalanced text is built: firstly, model construction is carried out, a similarity matrix is obtained by utilizing words and labels in sentences, then, the neural network is used for calculating the similarity of context information, namely the phrases and the labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, and a proper sentence embedding matrix can be obtained after the training process is finished, namely the sentence embedding matrix with the domain knowledge fused;
the method specifically comprises the following steps: in the first stage, firstly, a similarity matrix is obtained, and the formula is as follows:
similarity matrix GiIs L × KWherein | | · | | represents L2And (4) norm.
Then, the similarity between the phrases containing context semantics in the sentence and the tags is calculated, and the formula is as follows:
wherein j represents the sequence number of the word at the center of the phrase, j-p, j + p is the sequence number of the leftmost and rightmost words of the phrase, W1And b1Performing iterative training for two groups of parameters in the neural network in the training process;
then, calculating a related weight value matrix of the word:
wherein c isjkSimilarity of the jth word with the corresponding kth category label;
for beta againjA normalization calculation was performed, the formula is as follows:
where exp represents an exponential function with e as base, betaj′The similarity value of the jth word in the sentence is shown;
finally, an embedding matrix of the sentence is obtained, and the formula is as follows:
the above process is expressed as equation (1) as a whole;
in the second stage, a three-layer full-connection layer neural network classifier is built, and an embedding matrix Z of sentences is constructediInputting the classifier, training to obtain effective prediction outputThe overall process is expressed as formula (2);
further, in the third step, a sentence embedding and unbalanced multi-classification unified target function is constructed, and neural network training is guided. And finally, embedding the label words into the loss function to strengthen the learning of the labels, and realizing the training of the model by taking the minimized unbalanced objective function constructed at present as a target 99as standard. After training, the payment abstract text data of the unknown label can be effectively classified;
the method specifically comprises the following steps: first, the inverse weight of each category is calculated, and the formula is as follows:
where c (-) is the number of samples in the class, mean (-) represents the median, ykA label vector representing class k, the number of samples of class k' being the median of the number of all classes, yk′A label vector representing class k';
and then smoothing the reverse weight to obtain a final weight vector, wherein the formula is as follows:
wherein S (-) represents a sigmoid function, rkIs the inverse weight of the kth class, rk′The inverse weight for the kth' category;
then, a weight vector is introduced to construct a loss function, and the formula is as follows:
wherein N is a sentence in the data setTotal number, CE (·) is a cross-entropy loss function; the meaning of (d) is that the function f can be decomposed into two parts: f. of1And f2By a function f1As a function f2The input of (1); y isiIs the actual label matrix of the ith sentence, sigma is the weight vector, sigmaTRepresenting the transpose of the weight vector, yikThe value of the k-th tag, which represents the ith sentence, is 1 corresponding to the actual tag position, 0 for the remaining positions,a predicted probability of a k-th tag representing an ith sentence;
in order to improve the importance of the label in training, a special label loss function is added, and the formula is as follows:
where k is the serial number of the corresponding category, α is the penalty coefficient, ykIs a category label matrix;
finally, the model is trained based on the Adam algorithm with the objective of minimizing equation (11).
Has the advantages that: the invention effectively solves the problem of multi-label unbalanced classification of the payment voucher abstract text in budget execution audit, obviously improves the recall rate and the overall performance on subclasses in the introduction of label similarity calculation, and greatly improves the efficiency of auditors for checking budget execution compliance and identifying high-risk transactions.
Drawings
Fig. 1 is a flowchart of an unbalanced text classification method for the audit field according to an embodiment of the present invention.
Fig. 2 is a diagram illustrating a neural network framework according to a first embodiment of the present invention.
FIG. 3 is a schematic diagram of a model training process according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings. Fig. 1 is a diagram illustrating an unbalanced text classification method for the audit field according to an embodiment of the present invention. As shown in fig. 1, the present embodiment includes the following steps:
the method comprises the following steps: data preprocessing and word embedding training are carried out to obtain input data of the model; giving abstract text data of the payment voucher with the label, wherein the number of samples is different among different categories, and the number of the categories in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting a representative seed word from the keyword library as a description of a label; performing word segmentation on the text by using a word bank and a word segmentation tool, completing pre-training of word embedding vectors on full audit text data, and obtaining a word matrix Ei=[ei1,…,eiL]TWherein i is the serial number of the sentence, L is the serial number of the word in the sentence, the seed words are mapped to the word embedding matrix, and then the word embedding matrix of each class seed word is averaged to obtain the embedding matrix L of all the labels [ L ═ L1,…,lK]T;
Step two: constructing a model, and constructing a classification frame of the multi-label unbalanced text; firstly, model construction is carried out, as shown in FIG. 2, a similarity matrix is solved by using words and labels in sentences; then using neural network to calculate similarity of context information, i.e. phrase and label, there are 2 sets of parameters W1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, a proper sentence embedding matrix can be obtained after the training process is completed, namely, the sentence embedding matrix fused with the domain knowledge, and the formula is as follows:
wherein ZiFor the embedding matrix of the ith sentence, f1To be EiL input, ZiA mapping function that is an output;
finally, the sentence embedding matrix is used as input to classify the sentences by using a classifier, wherein 2 sets of parameters need to be trained, namely W2And b2The formula is as follows:
whereinAs a sentence ZiPredicted corresponding class probability matrix, f2Is represented by ZiInput,A mapping function that is an output;
step three: and constructing a sentence embedding and unbalanced multi-classification unified target function to guide neural network training. And finally, embedding the label words into the loss function to strengthen the learning of the label and realize the training of the model by taking the currently constructed unbalanced objective function as a target. After training, the payment abstract text data of the unknown label can be effectively classified;
in a specific embodiment, a method for classifying text with multiple labels and imbalance in budget execution audit is described in detail:
firstly, executing audit text data according to the existing budget, segmenting sentences by utilizing a segmentation tool LAC (lexical Analysis of Chinese), counting corresponding word frequencies in all categories, and constructing a budget execution and audit field keyword library and seed words according to segmentation results and a collected professional field word library:
the key word library and the seed words in the field of budget execution and audit are shown in the following table:
performing word segmentation results obtained by using LAC based on budget execution audit field word stock and conventional stop words, as shown in the following table;
serial number | Sentence | Word segmentation result |
1 | Lodging fee for Shenzhen specialist attending zhushai following project | Lodging fee for Shenzhen specialist attending zhushai following project |
And characterizing the seed words by using CBOW (contents Bag of words) to obtain an embedded matrix corresponding to the label. Taking the travel category as an example, the embedded matrix of the seed words and the embedded matrix of the tags are shown in the following table:
the average value of the seed word embedding matrix in the traveling fare category is obtained to obtain the embedding matrix of the label, which is shown in the following table:
then, the word segmentation result is characterized by CBOW to obtain an embedding matrix corresponding to the word, which is shown in the following table:
the data is divided into a training set and a test set according to the scores, the training set is input into a model for training, and the training process is shown in fig. 3.
After the training is finished, inputting the test set into the trained model, and calculating to obtain betajAfter the sentence embedding matrix is introduced as a weight, a sentence embedding matrix is obtained by calculation, and the following table shows that:
the final prediction result obtained after the sentence is embedded into the matrix input classifier is shown in the following table:
overall predicted results, as shown in the following table:
Precision | Recall | F1-score | support | |
five-risk one-gold | 0.965 | 0.971 | 0.968 | 17573 |
Salary and subsidy of personnel | 0.905 | 0.907 | 0.906 | 11075 |
Office expenses | 0.931 | 0.905 | 0.918 | 3955 |
Property management fee | 0.874 | 0.873 | 0.874 | 1983 |
Cost of infrastructure | 0.896 | 0.791 | 0.840 | 826 |
Travelling fee | 0.780 | 0.751 | 0.765 | 719 |
Special procurement | 0.697 | 0.685 | 0.677 | 691 |
Official expenses | 0.645 | 0.690 | 0.667 | 519 |
Others | 0.500 | 0.757 | 0.602 | 189 |
Macro Avg | 0.799 | 0.811 | 0.805 | 37530 |
Weigthed Avg | 0.922 | 0.921 | 0.921 | 37530 |
Big Avg | 0.911 | 0.867 | 0.888 | 15856 |
Small Avg | 0.743 | 0.783 | 0.759 | 21674 |
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (3)
1. A multi-label unbalanced text classification method in budget execution audit is characterized by comprising the following steps:
the method comprises the following steps: data preprocessing and word embedding training are carried out, and input data of a model are obtained: giving abstract text data of the payment voucher with the label, wherein the number of samples is different among different types, and the number of the types in the data is K; constructing a keyword library for budget execution and audit from a given text, namely proper nouns in the field, and selecting a representative seed word from the keyword library as a description of a label; performing word segmentation on the text by using a word bank and a word segmentation tool, completing pre-training of word embedding vectors on full audit text data, and obtaining a word matrix Ei=[ei1,…,eiL]TWherein i is the serial number of the sentence, L is the serial number of the word in the sentence, and L is the length of the sentence, mapping the seed words to the word embedding matrix, and then averaging the word embedding matrix of each category seed word to obtain the embedding matrix L of all the tags1,…,lK]T;
Step two: constructing a model, and constructing a classification frame of the multi-label unbalanced text; firstly, model construction is carried outEstablishing, using words and labels in sentences to find a similarity matrix, and then using a neural network to calculate the similarity of context information, namely phrases and labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, a proper sentence embedding matrix can be obtained after the training process is completed, namely, the sentence embedding matrix fused with the domain knowledge, and the formula is as follows:
wherein ZiFor the embedding matrix of the ith sentence, f1To be EiL input, ZiA mapping function that is an output;
the sentence embedding matrix is then used as input to classify the sentence using a classifier where 2 sets of parameters need to be trained, i.e., W2And b2The formula is as follows:
whereinAs a sentence ZiPredicted corresponding class probability matrix, f2Is represented by ZiInput,A mapping function that is an output;
step three: constructing a target function with sentence embedding and unbalanced multi-classification unification, and guiding neural network training; using a cross entropy loss function as a basic objective function, introducing weight data to make the loss function biased to a small category, strengthening the training of the small category by a classifier, finally, embedding a label word into a matrix, introducing the label word into the loss function to strengthen the learning of a label, and realizing the training of a model by taking a minimized currently constructed unbalanced objective function as a target; and after training, effectively classifying the payment abstract text data of the unknown label.
2. The method for classifying the multi-label unbalanced text in the budget execution audit as claimed in claim 1, wherein in the second step, a model is constructed, and a classification framework of the multi-label unbalanced text is built: firstly, model construction is carried out, a similarity matrix is obtained by utilizing words and labels in sentences, then, the neural network is used for calculating the similarity of context information, namely the phrases and the labels, wherein 2 groups of parameters W are provided1And b1Training is required; then, a newly constructed chiny pooling layer is used for solving weight vectors between phrases and all category labels, finally, the weight vectors are used for weighting original words, and a proper sentence embedding matrix can be obtained after the training process is finished, namely the sentence embedding matrix with the domain knowledge fused;
the method specifically comprises the following steps: in the first stage, firstly, a similarity matrix is obtained, and the formula is as follows:
similarity matrix GiIs LxK, where | | · | | | represents L2A norm;
then, the similarity between the phrases containing context semantics in the sentence and the tags is calculated, and the formula is as follows:
ci=ReLU(Gi,j-p:j+pW1 T+b1),1≤j≤L (4)
wherein j represents the sequence number of the word at the center of the phrase, j-p, j + p is the sequence number of the leftmost and rightmost words of the phrase, W1And b1Performing iterative training for two groups of parameters in the neural network in the training process;
then, calculating a related weight value matrix of the word:
wherein c isjkSimilarity of the phrase corresponding to the jth word and the corresponding kth category label is obtained;
for beta againjA normalization calculation was performed, the formula is as follows:
where exp represents an exponential function with e as base, betaj′The similarity value of the jth word in the sentence is shown;
finally, an embedding matrix of the sentence is obtained, and the formula is as follows:
the above process is expressed as equation (1) as a whole;
3. The method for classifying the multi-label unbalanced text in budget execution audit, according to the claim 1, is characterized in that in the third step, a sentence is constructed and embedded into an objective function unified with the unbalanced multi-classification, and neural network training is guided; using a cross entropy loss function as a basic objective function, introducing weight data to make the loss function biased to a small category, strengthening the training of a classifier on the small category, finally, embedding a label word into the loss function to strengthen the learning of the label, and realizing the training of a model by taking a minimized currently constructed unbalanced objective function as a target; after training, effectively classifying the payment abstract text data of the unknown label;
the method specifically comprises the following steps: first, the inverse weight of each category is calculated, and the formula is as follows:
where c (-) is the number of samples in the class, mean (-) represents the median, ykA label vector representing class k, the number of samples of class k' being the median of the number of all classes, yk′A label vector representing class k';
and then smoothing the reverse weight to obtain a final weight vector, wherein the formula is as follows:
wherein S (-) represents a sigmoid function, rkIs the inverse weight of the kth class, rk′The inverse weight for the kth' category;
then, a weight vector is introduced to construct a loss function, and the formula is as follows:
where N is the total number of sentences in the dataset and CE (-) is the cross entropy loss function; the meaning of (d) is that the function f can be decomposed into two parts: f. of1And f2By a function f1As a function f2The input of (1); y isiAs the fact of the ith sentenceThe inter-label matrix, Σ, is a weight vector, ΣTRepresenting the transpose of the weight vector, yikThe value of the k-th tag, which represents the ith sentence, is 1 corresponding to the actual tag position, 0 for the remaining positions,a predicted probability of a k-th tag representing an ith sentence;
in order to improve the importance of the label in training, a label loss function is added, and the formula is as follows:
where k is the serial number of the corresponding category, α is the penalty coefficient, ykIs a category label matrix;
and finally training the model based on the Adam algorithm and aiming at minimizing the formula (11).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534284.9A CN114722189B (en) | 2021-12-15 | 2021-12-15 | Multi-label unbalanced text classification method in budget execution audit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534284.9A CN114722189B (en) | 2021-12-15 | 2021-12-15 | Multi-label unbalanced text classification method in budget execution audit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114722189A true CN114722189A (en) | 2022-07-08 |
CN114722189B CN114722189B (en) | 2023-06-23 |
Family
ID=82236185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111534284.9A Active CN114722189B (en) | 2021-12-15 | 2021-12-15 | Multi-label unbalanced text classification method in budget execution audit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114722189B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008102737A (en) * | 2006-10-19 | 2008-05-01 | Nippon Telegr & Teleph Corp <Ntt> | Stored document classification apparatus, stored document classification method, program, and recording medium |
CN110609898A (en) * | 2019-08-19 | 2019-12-24 | 中国科学院重庆绿色智能技术研究院 | Self-classification method for unbalanced text data |
CN111737476A (en) * | 2020-08-05 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer readable storage medium and electronic equipment |
US20200327193A1 (en) * | 2019-04-10 | 2020-10-15 | International Business Machines Corporation | Displaying text classification anomalies predicted by a text classification model |
US20200327381A1 (en) * | 2019-04-10 | 2020-10-15 | International Business Machines Corporation | Evaluating text classification anomalies predicted by a text classification model |
US20210042580A1 (en) * | 2018-10-10 | 2021-02-11 | Tencent Technology (Shenzhen) Company Limited | Model training method and apparatus for image recognition, network device, and storage medium |
-
2021
- 2021-12-15 CN CN202111534284.9A patent/CN114722189B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008102737A (en) * | 2006-10-19 | 2008-05-01 | Nippon Telegr & Teleph Corp <Ntt> | Stored document classification apparatus, stored document classification method, program, and recording medium |
US20210042580A1 (en) * | 2018-10-10 | 2021-02-11 | Tencent Technology (Shenzhen) Company Limited | Model training method and apparatus for image recognition, network device, and storage medium |
US20200327193A1 (en) * | 2019-04-10 | 2020-10-15 | International Business Machines Corporation | Displaying text classification anomalies predicted by a text classification model |
US20200327381A1 (en) * | 2019-04-10 | 2020-10-15 | International Business Machines Corporation | Evaluating text classification anomalies predicted by a text classification model |
CN110609898A (en) * | 2019-08-19 | 2019-12-24 | 中国科学院重庆绿色智能技术研究院 | Self-classification method for unbalanced text data |
CN111737476A (en) * | 2020-08-05 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer readable storage medium and electronic equipment |
Non-Patent Citations (2)
Title |
---|
ZHIANG WU ET AL: "hPSD: A Hybrid PU-Learning-Based Spammer Detection Model for Product Reviews", vol. 50, no. 4, pages 1595 - 1606, XP011774311, DOI: 10.1109/TCYB.2018.2877161 * |
陈志等: "不平衡训练数据下的基于深度学习的文本分类", vol. 41, no. 1, pages 1 - 5 * |
Also Published As
Publication number | Publication date |
---|---|
CN114722189B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN107861951A (en) | Session subject identifying method in intelligent customer service | |
CN111177374A (en) | Active learning-based question and answer corpus emotion classification method and system | |
CN106980608A (en) | A kind of Chinese electronic health record participle and name entity recognition method and system | |
Yang et al. | Automatic academic paper rating based on modularized hierarchical convolutional neural network | |
CN113255321B (en) | Financial field chapter-level event extraction method based on article entity word dependency relationship | |
CN108875809A (en) | The biomedical entity relationship classification method of joint attention mechanism and neural network | |
CN110807084A (en) | Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy | |
CN113420145B (en) | Semi-supervised learning-based bid-bidding text classification method and system | |
CN109492105B (en) | Text emotion classification method based on multi-feature ensemble learning | |
CN108959305A (en) | A kind of event extraction method and system based on internet big data | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN109492230A (en) | A method of insurance contract key message is extracted based on textview field convolutional neural networks interested | |
CN110781297B (en) | Classification method of multi-label scientific research papers based on hierarchical discriminant trees | |
CN111191031A (en) | Entity relation classification method of unstructured text based on WordNet and IDF | |
CN115952292B (en) | Multi-label classification method, apparatus and computer readable medium | |
CN112989830B (en) | Named entity identification method based on multiple features and machine learning | |
CN115544252A (en) | Text emotion classification method based on attention static routing capsule network | |
Yu et al. | Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt | |
CN113312907B (en) | Remote supervision relation extraction method and device based on hybrid neural network | |
CN110287495A (en) | A kind of power marketing profession word recognition method and system | |
CN110245234A (en) | A kind of multi-source data sample correlating method based on ontology and semantic similarity | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure | |
CN114722810A (en) | Real estate customer portrait method and system based on information extraction and multi-attribute decision | |
Gillmann et al. | Quantification of Economic Uncertainty: a deep learning approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |