CN117473096A

CN117473096A - Knowledge point labeling method fusing LATEX labels and model thereof

Info

Publication number: CN117473096A
Application number: CN202311834982.XA
Authority: CN
Inventors: 罗文兵; 王岚清; 陶聪; 梁佳美; 黄琪; 罗凯威; 王明文
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-01-30
Anticipated expiration: 2043-12-28
Also published as: CN117473096B

Abstract

The invention discloses a knowledge point labeling method and a model thereof fused with LATEX labels, comprising the following steps: constructing a data set, inputting the constructed original problem text in the data set into a sentence coder module, and outputting a result; the output results are input into a subject knowledge fusion module, and the calculation results are respectively final semantic representation; inputting the final semantic representation into a gating screening module, wherein the output result is the information of the original problem text which is finally reserved under the influence of subject knowledge information; and inputting the output result into a linear layer with a sigmoid function to obtain a final classification probability vector, and converting the final classification probability vector into a predictive label through a threshold classifier. The beneficial effects of the invention are as follows: two kinds of finer discipline knowledge, namely information of LATEX label concepts and term types, are introduced, and key information is provided for labeling most knowledge points under the condition of unbalanced sample distribution.

Description

Knowledge point labeling method fusing LATEX labels and model thereof

Technical Field

The invention relates to the field related to multi-label text classification tasks, in particular to a knowledge point labeling method and a model thereof fused with LATEX labels.

Background

Since the end of the 90 s of the 20 th century, researchers have extensively explored the area of text classification, from traditional single-tag classification methods to multi-tag classification methods, with the development of the internet and the massive generation of digitized information. In recent years, with the expansion of the scale of internet education and the increase of the demand of online learning of students, the application of big data technology in the education field becomes more and more important, and meanwhile, problems play a very important role in course teaching. The knowledge points are evaluated by analyzing the problems made by students, but how to accurately mark the knowledge points examined by the problems is a key problem for optimizing the problem base construction and personalized learning.

In the field of mathematical disciplines, mathematical knowledge points are basic organization units and transfer units in mathematical education information for describing and expressing core concepts and gist of mathematical disciplines. The problem knowledge point labeling task aims at labeling core concepts and key points examined in problems. Because the core concepts and the key points examined in the problems are not unique, the problem knowledge point labeling task can be regarded as a multi-label text classification task. However, problems such as unbalanced sample distribution, layering of labels, limited field and the like exist in the problem knowledge labeling task. More critical is that the mathematical discipline knowledge has specificity that makes the model unable to understand the semantics of the problem text deeply. For example, problems have particularities such as symbolization, formulation, logic complexity, expression refining, and the like, which are difficult problems of the task of marking the knowledge points of the research problems.

The number of knowledge point labels in the problem knowledge point automatic labeling task is more, and most problem examples only contain 1 to 3 knowledge points through statistics of sampling data, so that the label space is sparse. The problem of sparse labeling can cause poor labeling effect of the existing model on knowledge points with fewer training examples, and the performance of the model is difficult to improve.

Most of the traditional knowledge point labeling methods adopt a mode of combining statistics and a machine learning algorithm, a plurality of works are performed later to generate space vectors based on vector space models (Vector Space Model, VSMs), and knowledge point labeling of texts in respective fields is achieved by calculating text similarity. However, the method is based on the shallow features only, does not consider the context information of the text, is excessively dependent on a corpus, and is not strong in universality. There have therefore been proposed, in recent years, a deep learning method based on word vector representation. But the word vector representation in such methods is static and cannot learn its context representation effectively for the newly added training problem. With the advent of BERT (which is a deep learning model based on an attention mechanism), the word vector characterization problem was solved, and more work is performed to improve the performance of the respective domain model in the form of embedding a pre-training framework.

Although the way of embedding the pre-training framework directly is very powerful in terms of vocabulary and semantic expressions, the semantic coding effect on the prior knowledge specific to the respective domain is not good, especially in the mathematical discipline domain. Therefore, the pre-training model is utilized and the specificity of the mathematical text is combined, such as priori knowledge of mathematical symbols, formulas, problem analysis and the like is integrated, and finally, the performance on the problem knowledge point labeling task is further improved. However, when the prior knowledge is fused, the models are directly cascaded (Concate) by the word vector representation and the original problem text representation, and finally the cascading result is sent to the classifier, and the explicit fusion actually introduces some noise to interfere with the original semantic representation of the problem. And the method for marking the knowledge points of the intermediate representation of the problems is realized by carrying out the problem cleaning and replacement on the original problem text in advance through the field knowledge, and the problem that the complete semantic representation of the original problem text is damaged also exists, so that the characteristic of effective information is lost during model classification.

Disclosure of Invention

In order to solve the problems, the invention provides a knowledge point labeling method and a model thereof integrating LATEX labels, which take the particularities of formulation, expression refining and the like of the representation of mathematical discipline knowledge into consideration, introduce two kinds of finer discipline knowledge, namely information of LATEX label concepts and term types, and further provide key information for labeling most knowledge points under the condition of unbalanced sample distribution.

The technical scheme of the invention is as follows: a knowledge point labeling method integrating LATEX labels comprises the following steps:

step S1, constructing a data set, collecting problems in a junior middle school mathematics test paper, and preprocessing the collected problems; marking the knowledge points of the collected problems after pretreatment; finally obtaining a problem data set, wherein problems in the problem data set are called an original problem text w;

step S2, inputting the original problem text w constructed in the step S1, and LATEX label concept text lc and term type text tt in the original problem text w into a sentence encoder module of the knowledge point automatic labeling model, and outputting an original problem text representation e and a LATEX label concept representation e as the output results ^lc And the term type represents e ^tt ；

Step S3, inputting the output result obtained in the step S2 into a discipline knowledge fusion module, and using a cross attention mechanism to represent the LATEX label concept as e ^lc And the term type represents e ^tt Respectively fusing with the original problem text representation e, and outputting a deep semantic representation M with the LATEX label concept as an output result ^lc And deep semantic representation M of term types ^tt The method comprises the steps of carrying out a first treatment on the surface of the The calculation result after the average pooling operation in the discipline knowledge fusion module is respectively used as the final semantic representation of the LATEX label concept and the term type, namely the pooling representation of the LATEX label concept Pooling representation with term type +.>；

Step S4, inputting the final semantic representation in the step S3 into a gating and screening module, and reserving key information related to discipline knowledge in the original discipline text representation e in a form of few parameters through a gating and screening mechanism of implicitly fusing various discipline knowledge, wherein the output result of the gating and screening module is information of the original discipline text w which is finally reserved under the influence of LATEX label concept information and term type information, namely the finally reserved information e for short ^cls-remain2 ；

Step S5, the final reserved information e output by the gating and screening module in step S4 is processed ^cls-remain2 As an input to the prediction module, the input is passed through a linear layer with a sigmoid function to obtain a final classification probability vector, which is a representation of the prediction tag, which is converted into the prediction tag by a threshold classifier.

Further, in step S1, the data set is constructed specifically as follows:

step S11, collecting 16226 problems from 800 junior middle school mathematics test paper, wherein the collected problems cover all knowledge points related to junior middle school mathematics, and four problems comprise selection problems, blank filling problems, answer solutions and judgment problems;

Step S12, preprocessing the collected problems, firstly performing invalid character removal, duplicate removal and completion cleaning operations on the problems to obtain 14200 problems; then adopting a mathematical formula identification tool to identify the formulas existing in the form of pictures into a formula format supported by Word;

step S13, marking the knowledge points of the problems in an automatic mode after preprocessing, wherein the marked knowledge points of the problems are derived from two aspects, namely, the query result of an online education platform and the knowledge point grading standard constructed by referring to the teaching materials of the middle school people on the other hand;

step S14, finally obtaining a data set containing 12073 problems through problem preprocessing and knowledge point labeling.

Further, in step S13, knowledge points of the problem are labeled, specifically:

step S131, finding a plurality of three-level knowledge points corresponding to the problems by means of the problem query function of the online education platform;

step S132, inquiring first, second and third knowledge points corresponding to the problems in the knowledge point grading standard;

step S133, taking the three-level knowledge points obtained by the online education platform as the main, screening out the three-level knowledge points inquired from the knowledge point grading standard, and inquiring the first-level knowledge points and the second-level knowledge points to which the three-level knowledge points belong from the three-level knowledge points;

Step S134, carrying out similarity judgment on the knowledge point labeling results of all problems by means of a Levenstein similarity algorithm and a semantic similarity model, unifying the labeling results with large similarity, and ensuring that the labeled knowledge points are not redundant;

and S135, removing the knowledge points and the corresponding problems which are not examined in the middle examination according to the middle examination class provided by the mathematics education expert in the junior middle school.

Further, in step S2, the sentence encoder module specifically includes:

step S21, a sentence encoder module selects RoBERTa as a pre-training language model, wherein the RoBERTa pre-training language model is a robust optimized BERT method, and the sentence encoder module inputs a text lc comprising an original problem text w, a LATEX label concept and a term type text tt, and the three share parameters of the RoBERTa pre-training language model;

step S22, roBERTa Pre-trained language model as a function, w _i The original problem text, lc, for the ith index _i LATEX label concept text, tt for the ith index _i For the term type text of the ith index, the specific calculation process is shown in formula (1);

（1）；

wherein e _i Original problem text w for the ith index _i Through Roberta pre-training language model The resulting vector representation of the type, i.e. the original problem text representation e called the i-th index _i ，e _i ^lc Vector representation obtained for the text of the LATEX label concept of the ith index through the RoBERTa pre-trained language model, namely the LATEX label concept representation e called the ith index _i ^lc ，e _i ^tt Vector representation of text of term type for the ith index via Roberta pre-trained language model, namely term type representation e called the ith index _i ^tt ；

Step S23, extracting the output of the model in the last layer of natural language processing field as text word vector representation, i.e. the original problem text representation e of the ith index _i LATEX tag conceptual representation e of ith index _i ^lc The term type of the ith index represents e _i ^tt 。

Further, in step S3, the scientific knowledge fusion module specifically includes:

step S31, inputting text word vector representation output by a model in the last layer of natural language processing field in a sentence coder module;

step S32, using cross-attention mechanism to express the i-th index LATEX label concept as e _i ^lc The term type of the ith index represents e _i ^tt Original problem text representation e respectively associated with the ith index _i Fusing and outputting a deep semantic representation M of LATEX label concept with the result of ith index _i ^lc And the deep semantic representation M of the term type of the ith index _i ^tt ；

Step S33, simultaneously, learning stable characteristic representation of the knowledge point automatic labeling model in a plurality of independent characteristic spaces, introducing a multi-head attention mechanism, and finally, the attention calculation process is shown as a formula (2) and a formula (3);

(2)；

(3)；

wherein head _ij ^lc The feature representation for the j-th attention calculation for the i-th indexed LATEX label conceptual representation, softmax as an activation function, converts the input non-normalized score into a probability distribution, W _j ^Q 、W _j ^K 、W _j ^V Projection parameter matrix respectively representing query vector, key vector and value vector in jth self-attention calculation, T is LATEX label conceptual representation e of ith index _i ^lc Sum key vector W _j ^K Transpose of multiplication d _K Original problem text representation e for the ith index _i Is a second dimension of size;

head _ij ^tt a feature representation for performing a jth attention calculation for the term type representation of the ith index;

M _i ^lc for the deep semantic representation of the LATEX label concept obtained by cascade after the attention calculation of h times on the LATEX label concept representation of the ith index, the deep semantic representation M of the LATEX label concept of the ith index is called _i ^lc ，Representing a cascading operation, h representing the number of attention calculations;

M _i ^tt For the deep semantic representation of the term type concatenated after h times of attention calculations on the term type representation of the ith index, the deep semantic representation M of the term type of the ith index is called _i ^tt ；

Step S34, extracting the average pooling result of the last layer of embedded vector of the model in the natural language processing field as sentence information representation, and representing the deep semantic representation M of the LATEX label concept of the ith index _i ^lc And the deep semantic representation M of the term type of the ith index _i ^tt Carrying out average pooling, respectively taking the calculation result as final semantic characterization of LATEX label concepts and term types, and calculating as shown in a formula (4);

(4)；

wherein,for the result of averaging the deep semantic representation of the label concept of the i-th index, the pooled representation of the LATEX label concept called i-th index +.>，For the result of the averaged pooling of the deep semantic representation of the term type of the i-th index, the pooled representation of the term type called i-th index +.>The method comprises the steps of carrying out a first treatment on the surface of the AvgPool is the deep semantic representation M of the LATEX tag concept for the ith index, respectively _i ^lc And the deep semantic representation M of the term type of the ith index _i ^tt And (3) averaging and pooling operation.

Further, in step S4, the gating and screening module specifically includes:

Step S41, the input data, i.e., the pooled representation of the LATEX tag concept of the ith indexPooling of term types with the ith index is +.>；

Step S42, the LATEX label concept is represented by pooling acting on the ith indexAnd a CLS tag vector (CLS tag vector is a specially position-coded vector for representing the whole sequence or sentence meaning, here a sentence representation for replacing the original problem text) e _cls Is controlled by the gate of (2)The mechanism calculates the proportion of the original problem text information to be kept under the influence of the LATEX label concept information so as to screen out the related key information in the original problem text, and the calculation process is shown in a formula (5);

(5)；

wherein r is _i ^lc For the weight values that are retained under the influence of the LATEX tag concept information of the i-th index,to activate a function, W ^lc Pooled representation of LATEX tag concept for index i>And CLS tag vector e _cls Spliced learnable matrix, b ^lc Is a bias vector, [ e ] _cls ,]For CLS tag vector e _cls And the pooling representation of the LATEX label concept of the ith index +.>Splicing to obtain a result;

e _i ^cls-remain1 for weight value r reserved under influence of the i-th indexed LATEX label concept information _i ^lc And CLS tag vector e _cls The result obtained by multiplication represents the information of the original problem text reserved under the influence of the LATEX label conceptual information of the ith index, which is simply called as preliminary reserved information e _i ^cls-remain1 ；

Step S43, the original problem text is finally kept under the influence of the LATEX label conceptual information of the ith index and the term type information of the ith index _i ^cls-remain2 The calculation process is shown in a formula (6);

(6)；

wherein r is _i ^tt For the weight value reserved under the influence of the term type information of the i-th index,representing a sigmoid activation function, inputting information e reserved for preliminary _i ^cls-remain1 ，W ^tt For the preliminary reserved information e _i ^cls-remain1 And the pooling representation of the term type of the ith index +.>Spliced learnable matrix, b ^tt Is a bias vector, [ e ] _i ^cls-remain1 , ]Information e reserved for preliminary _i ^cls-remain1 And the pooling representation of the term type of the ith index +.>Splicing to obtain a result;

e _i ^cls-remain2 then it is the preliminary reserved information e _i ^cls-remain1 And r _i ^tt The final output of the gate screening module obtained by multiplication, namely the information which represents the final reserved information of the original problem text under the influence of the LATEX label conceptual information of the ith index and the term type information of the ith index, is called the final reserved information e under the influence of the subject knowledge information of the ith index _i ^cls-remain2 ；

Step S5, the final reserved information e under the influence of the subject knowledge information of the ith index output by the gating screening module _i ^cls-remain2 As an input to the predictive module, the input is passed through a linear layer with a sigmoid function to obtain a final classification probability vector, which is a representation of the predictive label, which can direct the classification probability to a threshold classifierThe amount is converted into a predictive tag.

Further, the prediction module in step S5 specifically includes:

step S51, the final reserved information e under the influence of the subject knowledge information of the ith index output by the gating screening module _i ^cls-remain2 After being input into a linear layer with a sigmoid function, a final classification probability vector is obtained as shown in a formula (7);

(7)；

wherein,the j-th classification probability vector obtained for the linear layer of the sigmoid function, sigmoid being the activation function, W _c Information e finally reserved under influence of subject knowledge information for ith index _i ^cls-remain2 Matrix capable of learning, b _c A bias vector;

step S52, a classification threshold delta is introduced,the j-th classification probability vector obtained by judging the linear layer of the sigmoid function corresponding to the j-th knowledge point label of the current problem>Obtaining the j knowledge point label corresponding to the current problem according to the size relation of the classification threshold delta >As in equation (8);

(8) ；

step S53, adopting distributed balance loss to balance the number of examples among all knowledge point labels, wherein the calculation of a specific loss function is shown in a formula (9);

(9)；

wherein L is _DB Represents the distribution balance loss obtained last, C represents the total number of knowledge points, k represents the kth problem in the data set,training is added as a weighting coefficient to make up the gap between the expected and actual sampling probabilities, y _j ^k True marks representing the j-th knowledge point corresponding to the k-th problem, y _j ^k E {0,1}, log represents logarithm, z _j ^k Representing the probability of predicting the jth knowledge point of the kth problem, v _j Is a class-specific bias, representing the bias of the intrinsic model; lambda is a decisive factor influencing the loss gradient, representing the probability z of classification _j ^k To a degree of "tolerance".

Further, the automatic knowledge point labeling model integrating the LATEX label is applied to the automatic knowledge point labeling method integrating the LATEX label, and is mainly divided into four modules, namely a sentence encoder module, a disciplinary knowledge integration module, a gating screening module and a prediction module, wherein the sentence encoder module is used as a first module of the automatic knowledge point labeling model, and the four modules are sequentially in a serial structure.

The invention has the advantages that: (1) According to the invention, the representation of mathematical discipline knowledge is considered to have the particularities of formulation, expression refining and the like, so that two kinds of finer discipline knowledge, namely information of both LATEX label concepts and term types, are introduced, and key information is provided for labeling most knowledge points under the condition that the distribution of a constructed problem data set is unbalanced.

(2) The invention designs a gating mechanism for implicit fusion of discipline knowledge, which uses fewer parameters to keep key information related to two discipline knowledge in original problem text representation, thereby reducing noise generated during feature fusion.

(3) The problem knowledge point automatic labeling model integrating the discipline knowledge introduces two refined mathematical discipline knowledge of LATEX label concepts and term types as prompts, and utilizes a attention mechanism to update deep semantic characterization of the two, then implicitly integrates the deep semantic characterization of the two knowledge through a gating mechanism on the premise of not interfering with original problem text representation, and utilizes distribution balance loss to balance the number of examples among all knowledge point labels.

Drawings

FIG. 1 is a diagram of an overall model framework of the present invention.

Detailed Description

The invention constructs a problem knowledge point labeling data set of junior middle school, firstly, text is collected from human teaching version junior middle school math teaching materials and test papers, a problem knowledge point labeling data set of junior middle school is constructed, the data set is subjected to a large number of preprocessing operations to clean and template problem problems, and a plurality of experts carry out multi-round knowledge point labeling on the problems, wherein the labeling consistency rate reaches 96.02%. Then, a detailed experiment is carried out on the data set, and the experimental result shows that the knowledge point automatic labeling model provided by the invention: (1) in microF ₁ ，macroF ₂ ，weightedF ₁ The three evaluation indexes are respectively improved by 1.99%,2.99% and 2.12% compared with the reference model; (2) for knowledge points with fewer training examples, the marking effect is improved; (3) f tested in four sets of baseline comparison experiments based on different pre-trained models ₁ The value (which is an indicator for evaluating the performance of the classification model) exceeds the selected baseline.

step S1, constructing a data set, collecting problems in a junior middle school mathematics test paper, and preprocessing the collected problems; marking the knowledge points of the collected problems after pretreatment; finally, obtaining a problem data set, wherein any problem in the problem data set comprises two parts, one part is an original problem text w, and the other part is a real label Q;

Step S2, inputting the original problem text w constructed in the step S1, and LATEX label concept text lc and term type text tt in the original problem text w into a sentence encoder module of the knowledge point automatic labeling model, and outputting the result as the original problem textThe present representation e, LATEX label conceptual representation e ^lc And the term type represents e ^tt ；

Step S3, inputting the output result obtained in the step S2 into a discipline knowledge fusion module, and using a cross attention mechanism to represent the LATEX label concept as e ^lc And the term type represents e ^tt Respectively fusing with the original problem text representation e, and outputting a deep semantic representation M with the LATEX label concept as an output result ^lc And deep semantic representation M of term types ^tt The method comprises the steps of carrying out a first treatment on the surface of the The calculation result after the average pooling operation in the discipline knowledge fusion module is respectively used as the final semantic representation of the LATEX label concept and the term type, namely the pooling representation of the LATEX label conceptPooling representation with term type +.>；

Further, in step S1, the data set is constructed specifically as follows:

Further, in step S2, the sentence encoder module specifically includes:

（1）；

wherein e _i Original problem text w for the ith index _i Vector representation obtained through Roberta pre-trained language model, i.e. original problem text representation e called ith index _i ，e _i ^lc Vector representation obtained for the text of the LATEX label concept of the ith index through the RoBERTa pre-trained language model, namely the LATEX label concept representation e called the ith index _i ^lc ，e _i ^tt Vector representation of text of term type for the ith index via Roberta pre-trained language model, namely term type representation e called the ith index _i ^tt ；

(2)；

(3)；

M _i ^lc For the deep semantic representation of the LATEX label concept obtained by cascade after the attention calculation of h times on the LATEX label concept representation of the ith index, the deep semantic representation M of the LATEX label concept of the ith index is called _i ^lc ，Represents cascade operation, h represents attentionCalculating the times;

(4)；

wherein,for the result of averaging the deep semantic representation of the label concept of the i-th index, the pooled representation of the LATEX label concept called i-th index +.>，For the result of the averaged pooling of the deep semantic representation of the term type of the i-th index, the pooled representation of the term type called i-th index +. >The method comprises the steps of carrying out a first treatment on the surface of the AvgPool is the deep semantic representation M of the LATEX tag concept for the ith index, respectively _i ^lc And the deep semantic representation M of the term type of the ith index _i ^tt And (3) averaging and pooling operation.

Further, in step S4, the gating and screening module specifically includes:

step S41, transfusionPooled representation of the LATEX tag concept of the input data, i.e., the ith indexPooling of term types with the ith index is +.>；

Step S42, the LATEX label concept is represented by pooling acting on the ith indexAnd a CLS tag vector (CLS tag vector is a specially position-coded vector for representing the whole sequence or sentence meaning, here a sentence representation for replacing the original problem text) e _cls Calculating the proportion of the information of the original problem text to be kept under the influence of the LATEX label concept information so as to screen out the key information related to the original problem text, wherein the calculation process is shown in a formula (5);

(5)；

(6)；

Step S5, the final reserved information e under the influence of the subject knowledge information of the ith index output by the gating screening module _i ^cls-remain2 As the input of the prediction module, the input is passed through a linear layer with a sigmoid function to obtain a final classification probability vector, which is a representation of the prediction tag, and the final classification probability vector can be converted into the prediction tag by a threshold classifier.

Further, the prediction module in step S5 specifically includes:

(7)；

(8) ；

(9)；

FIG. 1 shows a schematic diagram of a computer systemFirstly, constructing data required by a sentence encoder module, taking out original problem text w from the constructed mathematical data set, inputting the original problem text w together with LATEX label concept text lc and term type text tt of the problem into the sentence encoder module, sharing parameters of the sentence encoder module, processing the sentence encoder module to obtain output of a model (transducer) of the final layer natural language processing field, and representing the output as text word vectors, wherein the text word vectors comprise LATEX label concept representation e ^lc The term type denotes e ^tt And an original problem text representation e.

Then, LATEX tag concept is expressed as e ^lc The term type denotes e ^tt And the original problem text representation e are input to the discipline knowledge fusion module and LATEX label concept representation e is input by using a cross-attention mechanism ^lc And term type represents e ^tt Respectively fusing the input data with the original problem text representation e, and taking the output result as a deep semantic representation after the update of two disciplines of knowledge, namely, a deep semantic representation M of LATEX label concept ^lc And deep semantic representation M of term types ^tt . At the same time, in order to make the model learn stable characteristic representation in several independent characteristic spaces, the invention introduces a multi-head attention mechanism and makes the deep semantic representation M of LATEX label concept ^lc Deep semantic representation M with term type ^tt Average pooling is performed to obtain pooled representations of LATEX label concepts respectivelyPooling representation with term type +.>And the pooled representation of LATEX tag concept +.>Pooling representation with term type +.>As final semantic characterizations of LATEX tag concepts and term types, respectively.

Pooling representation of LATEX tag conceptsPooling of term types is indicated->And CLS tag vector e _cls Input to the gate screening module. Here, multiple gating mechanisms are used to control in turn the amount of effective information that should be retained in the original problem text. First of all, by a pooling representation acting on the LATEX label concept +.>And CLS tag vector e _cls Calculating the ratio of the original problem text information to be kept under the influence of LATEX label concept information so as to screen out the key information related to the original problem text; similarly, another gating mechanism is to consider the influence of term type information and preserve key information in sentence representation. Wherein the input is the output of the last gating mechanism, and the information e is finally reserved ^cls-remain2 Then as the final output of the gating screening module.

The classifier is used as a final prediction module, and only the final reserved information e output by the gating screening module is needed ^cls ^-remain2 Inputting to the linear layer with sigmoid activation function to obtain the j-th classification probability vectorAnd a threshold classifier is introduced, and then the predicted knowledge points are finally obtained through a marker decoder.

As most of knowledge point labels in the data set have relatively less problem data, even a plurality of knowledge point labels only having single-digit examples exist, the unbalanced distribution of the labels greatly increases the complexity of the multi-knowledge point labeling task. So that probability vectors will be classifiedThe real labels Q of the problems shown in FIG. 1 are distributedBalancing a Loss function (Distribution-Balanced Loss for Multi-Label Classification in Long-published data, DB_Low) to balance the number of instances between knowledge point tags, where the Loss is L _DB 。

The knowledge point automatic labeling model adopts a deep learning model frame as PyTorch in the experiment. The text embedding dimension of the original problem, LATEX tag concept, term type is 768 dimensions. The similarity threshold is set to 0.95, and the number of heads of the multi-head attention mechanismThe classification threshold δ is set to 6, the initial learning rate is set to 0.00003, and the classification threshold δ is set to 0.5. />

Claims

1. A knowledge point labeling method integrating LATEX labels is characterized by comprising the following steps of: the method comprises the following steps:

Step S3, inputting the output result obtained in the step S2 into a discipline knowledge fusion module, and using a cross attention mechanism to represent the LATEX label concept as e ^lc And the term type represents e ^tt Respectively fusing with the original problem text representation e, and outputting a deep semantic representation M with the LATEX label concept as an output result ^lc And deep semantic representation M of term types ^tt The method comprises the steps of carrying out a first treatment on the surface of the The calculation result after the average pooling operation in the discipline knowledge fusion module is respectively used as the final semantic representation of the LATEX label concept and the term type, namely the pooling representation of the LATEX label conceptPooling representation with term type +. >；

2. The knowledge point labeling method fused with LATEX labels according to claim 1, wherein the method comprises the following steps: in step S1, the data set is constructed specifically as follows:

3. The knowledge point labeling method fused with LATEX labels according to claim 2, wherein the method comprises the following steps: in step S13, the knowledge points of the problem are labeled, specifically:

4. A method for labeling knowledge points by fusing LATEX labels according to claim 3, wherein: the sentence encoder module in step S2 specifically includes:

step S22, roberta pretrainingLanguage model as a function, w _i The original problem text, lc, for the ith index _i LATEX label concept text, tt for the ith index _i For the term type text of the ith index, the specific calculation process is shown in formula (1);

（1）；

5. The knowledge point labeling method fused with LATEX labels according to claim 4, wherein the method comprises the following steps: the academic or vocational study knowledge fusion module in step S3 specifically includes:

(2)；

(3)；

M _i ^lc for the deep semantic representation of the LATEX label concept obtained by cascade after the attention calculation of h times on the LATEX label concept representation of the ith index, the deep semantic representation M of the LATEX label concept of the ith index is called _i ^lc ，Indicating cascade operation, h indicating an attention meterCounting times;

(4)；

6. The knowledge point labeling method fused with LATEX labels according to claim 5, wherein the method comprises the following steps: in step S4, the gating and screening module specifically includes:

Step S42, the LATEX label concept is represented by pooling acting on the ith indexAnd CLS tag vector e _cls Calculating the proportion of the information of the original problem text to be kept under the influence of the LATEX label concept information so as to screen out the key information related to the original problem text, wherein the calculation process is shown in a formula (5);

(5)；

wherein r is _i ^lc For the weight values that are retained under the influence of the LATEX tag concept information of the i-th index,to activate a function, W ^lc Pooled representation of LATEX tag concept for index i>And CLS tag vector e _cls Spliced learnable matrix, b ^lc Is a bias vector, [ e ] _cls ,]To CLS tag vectore _cls And the pooling representation of the LATEX label concept of the ith index +.>Splicing to obtain a result;

(6)；

wherein r is _i ^tt For the weight value reserved under the influence of the term type information of the i-th index,representing a sigmoid activation function, inputting information e reserved for preliminary _i ^cls-remain1 ，W ^tt For the preliminary reserved information e _i ^cls-remain1 And the pooling representation of the term type of the ith index +.>Spliced learnable matrix, b ^tt Is a bias vector, [ e ] _i ^cls-remain1 ,]Information e reserved for preliminary _i ^cls-remain1 And the pooling representation of the term type of the ith index +.>Splicing to obtain a result;

7. The knowledge point labeling method fused with LATEX labels according to claim 6, wherein the method comprises the following steps: the prediction module in step S5 comprises the following specific steps:

(7)；

wherein,the j-th classification probability vector obtained for the linear layer of the sigmoid function, sigmoid being the activation function, W _c Information e finally reserved under influence of subject knowledge information for ith index _i ^cls-remain2 Matrix capable of learning, b _c Is a bias vector;

step S52, a classification threshold delta is introduced,the j-th classification probability vector obtained by judging the linear layer of the sigmoid function corresponding to the j-th knowledge point label of the current problem >Obtaining the j knowledge point label corresponding to the current problem according to the size relation of the classification threshold delta>As in equation (8);

(8)；

(9)；

wherein L is _DB Represents the distribution balance loss obtained last, C represents the total number of knowledge points, k represents the kth problem in the data set,training is added as a weighting coefficient to make up the gap between the expected and actual sampling probabilities, y _j ^k True marks representing the j-th knowledge point corresponding to the k-th problem, y _j ^k E {0,1}, log represents logarithm, z _j ^k Representing the probability of predicting the jth knowledge point of the kth problem, v _j Is a level specific deviation representing the natural modeDeviation of the pattern; lambda is a decisive factor influencing the loss gradient, representing the probability z of classification _j ^k To a degree of "tolerance".

8. The automatic knowledge point labeling model fused with LATEX labels is applied to the knowledge point labeling method fused with LATEX labels, and is characterized in that: the system mainly comprises four modules, namely a sentence encoder module, a disciplinary knowledge fusion module, a gating screening module and a prediction module, wherein the sentence encoder module is used as a first module of an automatic knowledge point labeling model, and the four modules are sequentially in a serial structure.