[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114626425A - Multi-view interactive matching method for noise text and electronic device - Google Patents

Multi-view interactive matching method for noise text and electronic device Download PDF

Info

Publication number
CN114626425A
CN114626425A CN202011456860.8A CN202011456860A CN114626425A CN 114626425 A CN114626425 A CN 114626425A CN 202011456860 A CN202011456860 A CN 202011456860A CN 114626425 A CN114626425 A CN 114626425A
Authority
CN
China
Prior art keywords
noise
vector
sections
interaction
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011456860.8A
Other languages
Chinese (zh)
Other versions
CN114626425B (en
Inventor
井雅琪
李扬曦
佟玲玲
任博雅
段东圣
段运强
胡燕林
方芳
尹鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
National Computer Network and Information Security Management Center
Original Assignee
Institute of Information Engineering of CAS
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS, National Computer Network and Information Security Management Center filed Critical Institute of Information Engineering of CAS
Priority to CN202011456860.8A priority Critical patent/CN114626425B/en
Publication of CN114626425A publication Critical patent/CN114626425A/en
Application granted granted Critical
Publication of CN114626425B publication Critical patent/CN114626425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a multi-view interactive matching method and an electronic device for noise texts, which comprises the steps of respectively coding two sections of noise texts to be matched to obtain two sections of coding vector sequences, and adding position information into each coding vector of the two sections of coding vector sequences; carrying out internal interaction on the two sections of coding vector sequences added with the position information to respectively obtain two sections of internal interaction results; performing external interaction on the two sections of internal interaction results, and respectively constructing two bidirectional noise text interaction matrixes; and splicing the two noise text interaction matrixes, and judging whether the two noise texts to be matched are matched. According to the method, the attention mechanism is adopted to capture the bidirectional matching mode between the noise texts, the influence of the logic sequence of sentences in the noise texts is small, the influence of effective semantic words of the texts is increased, the model time efficiency and the noise text matching effect are improved, and the problem of transmission matching is avoided.

Description

Multi-view interactive matching method for noise text and electronic device
Technical Field
The invention relates to the field of computers, in particular to a multi-view interactive matching method and an electronic device for a noise text.
Background
On the current internet, there is a large amount of noisy text. Specifically, noisy text refers to text that contains segments of text that have no practical significance or that is disorganized in the grammatical structure order. There are two main problems in these noisy texts: in content, the semantics expressed by noisy text are independent of the original text, and usually have ambiguity and repeatability. In terms of form, the noise text has a relatively complex grammatical structure, and the sequential structure of the noise text is also diverse. In view of these two main problems in noisy text, it is necessary to design a matching model insensitive to noise and language order in order to solve the matching problem of noisy text. At present, the mainstream noise text matching method firstly filters the noise in the text before matching through a rule method and a feature engineering. The filtered noise text is input into a time sequence matching model, and the time sequence matching model mainly comprises a Markov conditional random field, a recurrent neural network and the like. And finally, the model sequentially reads the input texts to obtain a matching score as a score of a matching result so as to judge whether the two noise sentences are matched with each other.
However, methods that utilize rules and feature engineering have limited filtering effects, are difficult to cover all noise instances, and are difficult to correctly identify all noise. Because the form of noise varies, it is difficult to exhaustively enumerate and generalize, and some noise still has its practical meaning in a particular context. In addition, because the sequential structure of the text determines the true meaning of the text to be expressed to a great extent, the phenomenon of word order disorder in the noisy text may cause the effect of the conventional time sequence model to be poor.
Disclosure of Invention
The invention aims to provide a multi-view interactive matching method and an electronic device for a noise text, wherein an Attention weight is calculated by a cosine scaling mechanism, and interference of noise and a word sequence on a text matching result is suppressed by adopting an Attention weighting mode, so that a better text matching effect can be still obtained under the condition of text noise and word sequence disorder.
The technical scheme of the invention is as follows:
a multi-view interactive matching method for noise text comprises the following steps:
1) respectively encoding two sections of noise texts to be matched to obtain two sections of encoding vector sequences, and adding position information into each encoding vector of the two sections of encoding vector sequences;
2) carrying out internal interaction on the two sections of coding vector sequences added with the position information to respectively obtain two sections of internal interaction results, wherein the dimensions of the internal interaction results are consistent with those of the coding vector sequences;
3) performing external interaction on two sections of internal interaction results by calculating bidirectional attention distribution, and respectively constructing two bidirectional noise text interaction matrixes;
4) and splicing the two noise text interaction matrixes, and judging whether the two sections of noise texts to be matched are matched.
Further, preprocessing the two sections of noise texts to be matched before coding the two sections of noise texts to be matched; the pretreatment comprises the following steps: punctuation marks, stop words and low frequency words are removed.
Further, the method for coding two sections of noise texts comprises the following steps: encoding is performed using a pre-trained Word2vec or Bert model.
Furthermore, position information is added into each code vector of the two code vector sequences through a production mode of position vector coding in the Bert model.
Further, before the two segments of coding vector sequences added with the position information are subjected to internal interaction, the two segments of coding vector sequences added with the position information are mapped to a unified semantic space through the following steps:
1) respectively inputting the two sections of coding vector sequences added with the position information into a bidirectional LSTM neural network for secondary coding to obtain two sections of final vector coding sequences;
2) and mapping each vector code in the two final vector code sequences through the same residual error network, so that the two coding vector sequences added with the position information are mapped to a unified semantic space.
Further, the vector dimension of the final vector encoding sequence is determined by the number of hidden layer units of the second layer of LSTM encoding layer in the bidirectional LSTM neural network.
Further, the noise internal interaction result is obtained by the following steps:
1) inputting the two sections of coding vector sequences added with the position information into a first residual error network respectively;
2) performing internal interaction on other vector coding sequences in the noise text by using a cosine attention scaling algorithm and a noise coding vector added with position information as a query item, and calculating a self-attention weight;
3) combining the coding vector sequence added with the position information with the attention weight value to obtain a weighted coding vector sequence;
4) sending the weighted coding vector sequence into a second residual error network to obtain abstract vector representation of the noise text;
5) and respectively carrying out L2 regularization operation on the abstract vector representation of the noise text to obtain an internal interaction result added with the position information coding vector sequence.
Further, a noise text interaction matrix is constructed by the following steps:
1) respectively taking each coding vector in the two sections of internal interaction results as a query, and performing external interaction based on cosine similarity attention on the internal interaction results of the other section of text to obtain the attention weight distribution of the current coding vector relative to the other section of coding sequence;
2) weighting the corresponding internal interaction results by using the attention weight distribution to obtain external interaction vectors of each coding vector in the internal interaction results;
3) and inputting a vector sequence obtained according to an external interaction result into a third residual error network, and performing L2 regularization operation on the output of the third residual error network to obtain a noise bidirectional interaction matrix.
Further, whether two sections of to-be-matched noise texts are matched is judged through the following steps:
1) acquiring a splicing result of two noise text interaction matrixes;
2) inputting the splicing result into a scorer to obtain a matching score;
3) and judging whether the two sections of noise texts to be matched are matched or not according to the matching scores.
Further, the structure of the scorer comprises: a scoring network consisting of fully connected layers.
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
Compared with the prior art, the invention has the following advantages:
1. the method mainly adopts an attention mechanism to capture the matching mode between the noise texts, and can improve the time efficiency of the model through parallel computation.
2. The model is not a traditional time sequence model, is less influenced by the logic sequence of sentences in the noise text, and is also effective to the noise text pair with disordered word sequence.
3. By using the cosine scaling attention mechanism, the interference of noise in the text can be effectively inhibited in a weighting mode, and the influence of effective semantic words of the text is increased, so that the matching effect of the noise text is improved.
4. In long-text matching, attention mechanism can well avoid the problems of difficult representation and long-distance dependence of long documents in a time sequence model, and the matching effect is obviously superior to that of a matching method based on document representation.
5. A bidirectional matching mode is adopted, the matching mode of the noise texts q to d is calculated, and meanwhile the matching degree of the texts q to d is considered, so that the problem of transmission matching can be avoided.
Drawings
Fig. 1 is a flowchart of a multi-view interactive matching method for noise-oriented text according to the present invention.
Fig. 2 is a frame diagram of the noise text-oriented multi-view interactive matching method of the present invention.
Detailed Description
For the purpose of promoting an understanding of the principles, solutions and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings.
The method provided by the invention is suitable for matching tasks between text pairs containing noise texts and disordered word orders. The method has the main ideas that the method adopts various attention mechanisms to increase the weight of key information, reduce noise and sentence sequence interference, capture the matching mode of sentences in the interaction process, and finally score the matching result by a scoring network.
As shown in fig. 1 and fig. 2, the method mainly includes three processes of internal interaction, external interaction and matching scoring. First, the text pairs need to be encoded before these three processes. After a noise query text q and a corresponding text d to be matched are given, a Word vector sequence in a noise text pair is firstly encoded through a pre-training existing Word2vec model to obtain a corresponding Word vector sequence. After that, the word vector sequence will be encoded for the second time by the existing bidirectional LSTM neural network, to obtain a vector encoding sequence whose one dimension is determined by the number of LSTM hidden layer units. And finally, mapping each vector code in the sequence through the same residual error network, so that the vector code sequences of the two sections of texts are mapped to a uniform semantic space.
The internal interaction module of the model mainly uses a scaled cosine similarity attention mechanism to capture key information inside a sentence and suppress the expression of secondary information by weighting vector-coded sequences. After the internal interaction, the external interaction module of the model also uses a scaled cosine similarity mechanism to capture the interaction matching patterns between sentences according to the results of the internal interaction module. Finally, the scorer gives a score of the degree of match according to the results of the internal interaction and the external interaction.
The internal interaction process is defined as a Self-Attention process, and the model adopts a scaled cosine Attention mechanism and is input as a vector coding sequence of text. For the noise text q, the internal interaction takes the words in the vector coding sequence of the text q as query terms in the Attention algorithm, weights are carried out on the vector codes of other words in the sequence, the Attention weight of the text q self-interaction can be obtained through the method, then the vector code of each word in the self-interaction result sequence is weighted according to the obtained Attention weight, and therefore the Attention information of the model to different word vectors is obtained to reduce the weight of the noise word and improve the weight of the key word. In the same way, the invention weights the Attention of the vector coding sequence in the candidate document d by the same scaled cosine Attention mechanism. The calculation method of the Attention can avoid the influence of the text sequence structure to a certain extent, because although the sequence of the word positions in the same layer is changed, the weights of the positions calculated by the Attention are all unchanged. The specific calculation process is as follows:
Figure BDA0002829009030000041
Figure BDA0002829009030000042
Figure BDA0002829009030000043
Figure BDA0002829009030000044
where q and k are the query and key values for the calculation of the scaled cosine Attention, WqAnd WkIs a trainable parameter matrix, alphaijIs the degree of correlation between the ith word and the jth word, IqIs a sequence of word vectors of the noise text after stacking and weighting. The above formula represents the self-interaction process of the text q, where IqIt is the final result from the interaction process that encodes a sequence of word vectors. In addition, the self-interaction result of the candidate document d is obtained in the same manner
The external interaction module of the model is defined as a two-way Attention interaction process, and the invention is also realized by using a scaled cosine Attention algorithm. The input of the bidirectional interaction layer is the output from the interaction layer, namely, the bidirectional interaction matrix is constructed by calculating bidirectional attention distribution through two vector coding sequence models, so that the key word of the current word vector in another noise text is obtained. Assuming that the length of the noise text q is n and the length of the corresponding candidate document d is m, the model firstly uses the word vector of the text q to respectively perform Attenttion on the word vector sequence in the candidate document d, and performs weighted summation on the result of the Attenttion, so that n fixed-dimension interaction vectors can be obtained, and the dimension is equal to the dimension of the vector coding sequence of the candidate document d. In the same way, the word vectors in the candidate document d are used for respectively carrying out Attention and weighted summation on the word vector coding sequence of the text q, and m interactive vectors with the same dimension as the vector coding sequence in the text q can be obtained. The detailed calculation process is as follows:
q=[q1;q2;...;qn]=Wq·Iq
k=[k1;k2;...;kn]=Wk·Id
Figure BDA0002829009030000051
Figure BDA0002829009030000052
wherein Iq,IdRespectively, the text q and the candidate document d are obtained through self-interaction, alphaijIs the correlation score between the ith word in the noise text q and the jth word in the text d, and O represents the interaction result of the noise text q to the candidate document d. Similarly, the interaction result of the noise document d to the candidate document q can be obtained in the same manner.
After the bidirectional interaction result is obtained, the bidirectional interaction result is spliced, and a final matching score is obtained through a fully-connected scoring network, which is specifically as follows:
f(p,e)=softmax(Wy[um;ue]+by)
wherein u ism,ueRespectively, the results obtained after the noise text pair q and d are subjected to external interaction are fixed as d by using dimensionality extracted by a single-layer convolutional neural networkeThe global feature of (2) represents a vector. Wherein d iseThe total number of the convolution kernels is represented, the width of the convolution kernels is consistent with the width of an external interaction result vector, the heights of the convolution kernels are respectively 3, 5 and 7, the number of the convolution kernels is the same, and a constant term is obtained by performing maximum pooling operation on the feature vectors obtained by each convolution kernel. WyAnd byThe weight matrix and the bias term of the full-connection network respectively, and the scoring function calculates the probability distribution of the labels of the matched noise text pair (q, d), wherein the label of the text pair is 0 or 1 to indicate whether the two sections of noise text are matched or not.
In the training process, the invention uses the hinge loss function to calculate the error and performs back propagation on the whole network. Given a noisy text pair, the training objective function minimizes the hinge loss between positive and negative examples, the projection interval between positive and negative examples being defined by a manually set threshold, the loss function being defined as:
Figure BDA0002829009030000061
wherein P is a query set of noise texts, E is a set of texts to be matched, E+E represents a regular example document corresponding to the noise query text P, E-E represents the corresponding negative example document, and gamma represents the threshold value over-parameter set by the hinge loss function.
In an embodiment of the present invention, a specific process of setting a given noise text q and a candidate document d and calculating a matching score between two text segments is as follows:
(1) and (3) segmenting the noise texts q and d, and after punctuation marks, stop words and low-frequency words are removed, the lengths of the text sequences are n and m respectively.
(2) Coding each Word in the noise texts q and d through a Word2vec model, and converting each Word in the text into a Word with the length deIs fixed vector.
(3) Generating corresponding position vectors for the texts q and d by using a position vector coding generation mode in the existing bert model, and setting the dimension as deThe position vector is spliced with each word vector, and position information is added into each word vector to obtain a word vector sequence containing the position information.
(4) The step is an optional step, the word vector after the splicing position coding is input into a bidirectional LSTM network, and the LSTM hidden layer state of each step is used as the coding of the word vector. The addition of the step can improve the accuracy of matching the text with shorter length.
(5) The word vector (the result of step 3) of each splicing position code in the noise texts q and d or the LSTM output code (the result of step 4) is input into the same residual error network, words in different texts can be mapped to the same vector representation space through the residual error network, and finally the output of the residual error network is a vector sequence with fixed dimensions and the lengths of n and m, wherein n and m are the lengths of the vector coding sequences of the texts q and d respectively.
(6) And respectively calculating the corresponding Self-orientation weight value of the vector of the fixed dimension corresponding to each word in q and d by using a scaled cosine Attention algorithm.
(7) And performing dot product on the Self-orientation weight value obtained by calculation in the previous step and the vector sequence of the fixed dimension after residual error network mapping to obtain a weighted word vector.
(8) And inputting the weighted word vectors obtained by self-interaction into a new residual error network to be used as abstract representation of the self-interaction layer.
(9) Performing L2 regularization operation on the output of the residual error network in the self-interacting layer to obtain the output of the self-interacting layer, wherein the output of the self-interacting layer is still two vector sequences with the lengths of n and m respectively, and the dimensionality of the output vector and the dimensionality of the word vector are kept consistent to be (d)e+dm) Dimension wherein dmThe dimension that encodes the position vector.
(10) And (3) enabling each output vector obtained by the noise text q through the self-interaction layer to serve as an output vector sequence from the query to the text d for Attention, and obtaining an Attention weight value of each word in the text q to the word vector sequence in the text d.
(11) And performing dot product operation on the Attention weight value calculated in the last step and the word sequence after self interaction in the candidate document d, weighting the word sequence in d, accumulating the weighted results, and finally calculating the word vector in each noise text q to obtain an interaction vector with a fixed length.
(12) Inputting each interactive vector into the same residual error network, and performing L2 regularization operation on the output of the residual error network to obtain a vector sequence with the length of n as an external interactive result from the noise document q to the candidate document d.
(13) In a similar way, each word vector in the candidate document d is regarded as a word vector in the query-to-noise text q for Attention, and finally, the external interaction result from the candidate document d to the noise document q is a vector sequence with the length of m.
(14) And coding the vector sequence obtained through external interaction by using a convolutional neural network to finally obtain two interaction vectors with fixed lengths.
(15) And splicing the two interactive vectors, inputting the two interactive vectors into a fully-connected neural network, calculating the probability distribution of the class labels through a Softmax function, taking the label with the maximum probability value as a final matching result, wherein the probability value corresponding to the result label is the confidence coefficient of the matching result.
The above-mentioned embodiments are merely for better illustrating the objects, principles, technical solutions and advantages of the present invention. It should be understood that the above-mentioned embodiments are only exemplary of the present invention, and are not intended to limit the present invention, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A multi-view interactive matching method for noise text comprises the following steps:
1) respectively encoding two sections of noise texts to be matched to obtain two sections of encoding vector sequences, and adding position information into each encoding vector of the two sections of encoding vector sequences;
2) carrying out internal interaction on the two sections of coding vector sequences added with the position information to respectively obtain two sections of internal interaction results, wherein the dimensions of the internal interaction results are consistent with those of the coding vector sequences;
3) performing external interaction on two sections of internal interaction results by calculating bidirectional attention distribution, and respectively constructing two bidirectional noise text interaction matrixes;
4) and splicing the two noise text interaction matrixes, and judging whether the two sections of noise texts to be matched are matched.
2. The method of claim 1, wherein the two sections of noise texts to be matched are preprocessed before the two sections of noise texts to be matched are encoded; the pretreatment comprises the following steps: punctuation marks, stop words and low frequency words are removed.
3. The method of claim 1, wherein the method of encoding two sections of noisy text comprises: coding by using a pre-trained Word2vec or Bert model; and adding position information into each coding vector of the two sections of coding vector sequences by a production mode of position vector coding in the Bert model.
4. The method of claim 1, wherein before the two coded vector sequences with the added position information are inter-interacted, the two coded vector sequences with the added position information are mapped to a unified semantic space by:
1) respectively inputting the two sections of coding vector sequences added with the position information into a bidirectional LSTM neural network for secondary coding to obtain two sections of final vector coding sequences;
2) and mapping each vector code in the two final vector code sequences through the same residual error network, so that the two coding vector sequences added with the position information are mapped to a unified semantic space.
5. The method of claim 4, wherein the vector dimensions of the final vector encoded sequence are determined by the number of hidden layer elements of a second layer of LSTM encoded layer in the bi-directional LSTM neural network.
6. The method of claim 1, wherein the noisy intra-interaction result is obtained by:
1) inputting the two sections of coding vector sequences added with the position information into a first residual error network respectively;
2) performing internal interaction on other vector coding sequences in the noise text by using a cosine attention scaling algorithm and a noise coding vector added with position information as a query item, and calculating a self-attention weight;
3) combining the coding vector sequence added with the position information with the attention weight value to obtain a weighted coding vector sequence;
4) sending the weighted coding vector sequence into a second residual error network to obtain abstract vector representation of the noise text;
5) and respectively carrying out L2 regularization operation on the abstract vector representation of the noise text to obtain an internal interaction result added with the position information coding vector sequence.
7. The method of claim 1, wherein the noise-text interaction matrix is constructed by:
1) respectively taking each coding vector in the two sections of internal interaction results as a query, and performing external interaction based on cosine similarity attention on the internal interaction results of the other section of text to obtain the attention weight distribution of the current coding vector relative to the other section of coding sequence;
2) weighting the corresponding internal interaction results by using the attention weight distribution to obtain external interaction vectors of each coding vector in the internal interaction results;
3) and inputting a vector sequence obtained according to an external interaction result into a third residual error network, and performing L2 regularization operation on the output of the third residual error network to obtain a noise bidirectional interaction matrix.
8. The method of claim 1, wherein whether two pieces of noise text to be matched are matched is determined by:
1) acquiring a splicing result of the two noise text interaction matrixes;
2) inputting the splicing result into a scorer to obtain a matching score, wherein the structure of the scorer comprises: a scoring network consisting of fully connected layers;
3) and judging whether the two sections of noise texts to be matched are matched or not according to the matching scores.
9. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when run, perform the method of any of claims 1-8.
10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-8.
CN202011456860.8A 2020-12-10 2020-12-10 Multi-view interactive matching method for noise text and electronic device Active CN114626425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011456860.8A CN114626425B (en) 2020-12-10 2020-12-10 Multi-view interactive matching method for noise text and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011456860.8A CN114626425B (en) 2020-12-10 2020-12-10 Multi-view interactive matching method for noise text and electronic device

Publications (2)

Publication Number Publication Date
CN114626425A true CN114626425A (en) 2022-06-14
CN114626425B CN114626425B (en) 2024-11-08

Family

ID=81894894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011456860.8A Active CN114626425B (en) 2020-12-10 2020-12-10 Multi-view interactive matching method for noise text and electronic device

Country Status (1)

Country Link
CN (1) CN114626425B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190349321A1 (en) * 2018-05-10 2019-11-14 Royal Bank Of Canada Machine natural language processing for summarization and sentiment analysis
CN110532353A (en) * 2019-08-27 2019-12-03 海南阿凡题科技有限公司 Text entities matching process, system, device based on deep learning
CN111160568A (en) * 2019-12-27 2020-05-15 北京百度网讯科技有限公司 Machine reading understanding model training method and device, electronic equipment and storage medium
KR20200071821A (en) * 2018-11-30 2020-06-22 고려대학교 산학협력단 Detection metohd of fake news using grammatic transformation on neural network, computer readable medium and apparatus for performing the method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190349321A1 (en) * 2018-05-10 2019-11-14 Royal Bank Of Canada Machine natural language processing for summarization and sentiment analysis
KR20200071821A (en) * 2018-11-30 2020-06-22 고려대학교 산학협력단 Detection metohd of fake news using grammatic transformation on neural network, computer readable medium and apparatus for performing the method
CN110532353A (en) * 2019-08-27 2019-12-03 海南阿凡题科技有限公司 Text entities matching process, system, device based on deep learning
CN111160568A (en) * 2019-12-27 2020-05-15 北京百度网讯科技有限公司 Machine reading understanding model training method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
庞亮;兰艳艳;徐君;郭嘉丰;万圣贤;程学旗;: "深度文本匹配综述", 计算机学报, no. 04, 20 September 2019 (2019-09-20) *

Also Published As

Publication number Publication date
CN114626425B (en) 2024-11-08

Similar Documents

Publication Publication Date Title
CN111783462B (en) Chinese named entity recognition model and method based on double neural network fusion
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN112800776B (en) Bidirectional GRU relation extraction data processing method, system, terminal and medium
CN109977861B (en) Off-line handwriting mathematical formula recognition method
CN112989834A (en) Named entity identification method and system based on flat grid enhanced linear converter
CN112232053B (en) Text similarity computing system, method and storage medium based on multi-keyword pair matching
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111414481A (en) Chinese semantic matching method based on pinyin and BERT embedding
KR102379660B1 (en) Method for utilizing deep learning based semantic role analysis
CN111581392B (en) Automatic composition scoring calculation method based on statement communication degree
CN111125367A (en) Multi-character relation extraction method based on multi-level attention mechanism
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN113434682B (en) Text emotion analysis method, electronic device and storage medium
CN114742069A (en) Code similarity detection method and device
CN114036246A (en) Commodity map vectorization method and device, electronic equipment and storage medium
CN114881042A (en) Chinese emotion analysis method based on graph convolution network fusion syntax dependence and part of speech
CN110276396A (en) Picture based on object conspicuousness and cross-module state fusion feature describes generation method
CN114780677B (en) Chinese event extraction method based on feature fusion
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN113051886B (en) Test question duplicate checking method, device, storage medium and equipment
CN112528168A (en) Social network text emotion analysis method based on deformable self-attention mechanism
CN114626425B (en) Multi-view interactive matching method for noise text and electronic device
CN113536797B (en) Method and system for extracting key information sheet model of slice document
CN112507081A (en) Similar sentence matching method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant