[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115600685A - Model training method, model training device, text processing method, text processing device, model training equipment and storage medium - Google Patents

Model training method, model training device, text processing method, text processing device, model training equipment and storage medium Download PDF

Info

Publication number
CN115600685A
CN115600685A CN202211217493.5A CN202211217493A CN115600685A CN 115600685 A CN115600685 A CN 115600685A CN 202211217493 A CN202211217493 A CN 202211217493A CN 115600685 A CN115600685 A CN 115600685A
Authority
CN
China
Prior art keywords
text
graph
representation vector
sample
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211217493.5A
Other languages
Chinese (zh)
Inventor
李昕
邴立东
蔡登�
林伟
何镇升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202211217493.5A priority Critical patent/CN115600685A/en
Publication of CN115600685A publication Critical patent/CN115600685A/en
Priority to PCT/CN2023/121263 priority patent/WO2024074099A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure relates to a model training method, a text processing method, a model training device, a text processing device, a model training apparatus and a storage medium. The present disclosure converts an abstract semantic graph into a graph sequence by converting a first sample text into the abstract semantic graph, and converting the graph sequence into a graph representation vector. The machine learning model is then trained based on the graph representation vectors and the text representation vectors. Because the semantic concept corresponding to each node in the abstract semantic graph is the concept after abstraction and normalization, the problem of representation sparseness is relieved, and the ambiguity of the abstract semantic graph is reduced. In addition, the abstract semantic graph is introduced into a comparison representation learning process based on the graph representation vector and the text representation vector, so that the trained machine learning model can accurately reflect the text semantics through the graph representation vector, the graph representation vector can enhance the text representation vector, and the representation vector obtained according to the graph representation vector and the text representation vector can sufficiently and accurately express the text semantics.

Description

Model training method, model training device, text processing method, text processing device, model training equipment and storage medium
Technical Field
The present disclosure relates to the field of information technologies, and in particular, to a method, an apparatus, a device, and a storage medium for model training and text processing.
Background
Currently, a Text can be mapped to a certain vector space through multi-language Text/Sentence embedding (Multilingual Text/sequence entries), so as to obtain a representation vector corresponding to the Text, where the representation vector can reflect the semantics of the Text, that is, the representation vector is only related to the semantics of the Text and is not related to the language or language to which the Text belongs.
However, in a multilingual or multilingual scenario, there exists an alternative form for an individual word, phrase, word, or word in a text, and if the alternative form is never learned by the machine learning model, the expression vector output by the machine learning model cannot sufficiently express the semantics of the text. In addition, if the languages of the texts processed by the machine learning model in the training stage and the reasoning stage are different, the problem of cross-language ambiguity also exists.
Disclosure of Invention
In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a method, an apparatus, a device, and a storage medium for model training and text processing, so as to sufficiently and accurately express semantics of a text, i.e., improve multi-language text embedding performance.
In a first aspect, an embodiment of the present disclosure provides a model training method, including:
converting a first sample text into an abstract semantic graph and converting the abstract semantic graph into a graph sequence, wherein the abstract semantic graph is determined by semantic information of the first sample text;
taking the graph sequence as an input to a machine learning model, such that the machine learning model outputs a graph representation vector;
training the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample text.
In a second aspect, an embodiment of the present disclosure provides a text processing method, including:
acquiring a target text;
converting the target text into an abstract semantic graph, and converting the abstract semantic graph into a graph sequence;
taking the graph sequence as an input to a machine learning model, such that the machine learning model output graph represents a vector, the machine learning model being trained according to the method of the first aspect;
and fusing the text representation vector corresponding to the target text and the graph representation vector to obtain a fusion result serving as the representation vector of the target text, wherein the representation vector is used for representing the semantics of the target text.
In a third aspect, an embodiment of the present disclosure provides a model training apparatus, including:
the conversion module is used for converting the first sample text into an abstract semantic graph and converting the abstract semantic graph into a graph sequence, wherein the abstract semantic graph is determined by semantic information of the first sample text;
an input module to take the graph sequence as an input to a machine learning model such that the machine learning model outputs a graph representation vector;
a training module for training the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample.
In a fourth aspect, an embodiment of the present disclosure provides a text processing apparatus, including:
the acquisition module is used for acquiring a target text;
the conversion module is used for converting the target text into an abstract semantic graph and converting the abstract semantic graph into a graph sequence;
an input module, configured to take the graph sequence as an input of a machine learning model, so that a graph output by the machine learning model represents a vector, the machine learning model being trained according to the method of the first aspect;
and the fusion module is used for fusing the text representation vector corresponding to the target text and the graph representation vector to obtain a fusion result serving as the representation vector of the target text, wherein the representation vector is used for representing the semantics of the target text.
In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first or second aspect.
In a sixth aspect, the disclosed embodiments provide a computer-readable storage medium having a computer program stored thereon, the computer program being executed by a processor to implement the method of the first or second aspect.
According to the model training and text processing method, device and equipment and the storage medium, the first sample text is converted into the abstract semantic graph, the abstract semantic graph is further converted into the graph sequence, and the graph sequence is used as the input of the machine learning model, so that the machine learning model outputs the graph representation vector. The machine learning model is then trained based on the graph representation vector and a text representation vector corresponding to the first sample. Because the abstract semantic graph is determined by the semantic information of the first sample text and is irrelevant to the language or language of the first sample text, and the semantic concept corresponding to each node in the abstract semantic graph is the concept after abstraction and normalization, multiple variable forms of the same word can be abstracted and normalized into one semantic concept due to the fact that the semantics of the multiple variable forms are the same, and therefore the multiple variable forms are guaranteed to be parsed into the same semantics, and the problem of representation sparseness is greatly relieved. In addition, because the corresponding semantics of the ambiguous word or phrase in different texts are different, the words or phrases with the same expression form in different texts correspond to different abstract semantic graphs due to the difference of the semantics, and the semantic concepts of the ambiguous word or phrase in different abstract semantic graphs are also different, so that the ambiguity of the abstract semantic graphs is low. In addition, in this embodiment, a highly abstracted abstract semantic graph is introduced into a process of contrast representation learning based on a graph representation vector and a text representation vector, so that a machine learning model trained based on the process of contrast representation learning can output a relatively accurate graph representation vector, that is, the machine learning model is guided to learn core semantics of a multi-language text, and thus the graph representation vector of any text can accurately reflect the semantics of the text, and therefore, the graph representation vector of any text can enhance the text representation vector of the text, so that a final representation vector obtained according to the graph representation vector and the text representation vector can fully and accurately express the semantics of the text, that is, the multi-language text embedding performance is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a flow chart of a model training method provided by an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of a model training method according to another embodiment of the present disclosure;
FIG. 4 is a flowchart of a model training method provided by another embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an analysis and linearization process provided by another embodiment of the disclosure;
FIG. 6 is a flowchart of a model training method provided by another embodiment of the present disclosure;
FIG. 7 is a flowchart of a model training method according to another embodiment of the present disclosure;
FIG. 8 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
In general, a Text may be mapped to a certain vector space through Multilingual Text/Sentence embedding (Multilingual Text/sequence entries), so as to obtain a corresponding representation vector of the Text, where the representation vector can reflect the semantics of the Text, i.e., the representation vector is only related to the semantics of the Text and is not related to the language or language to which the Text belongs. Due to this language-independent nature, multilingual text/sentences are embedded as basic building blocks for multilingual natural language processing applications. Therefore, multi-lingual text/sentence embedding has been widely applied in the fields of bilingual text mining, multi-lingual text classification, and cross-language text inference, among others. And in practical applications, there is an increasing demand for high-quality multilingual text/sentence embedding capability. However, the current machine learning models for implementing multi-language text/sentence embedding are trained based on a comparison representation learning framework of text pairs, which draws up the text pairs that are positive examples of each other and distinguishes the text pairs that are negative examples of each other, so that the machine learning models can learn the intrinsic representation of the text. Comparative Representation Learning (comparative Representation Learning) is a Representation Learning paradigm for Learning textual representations by comparing pairs of semantically related and semantically unrelated sentences.
However, in a multi-language or multi-language scenario, there exists a variable form, such as a very rich word variation (e.g., a singular or plural number of variations, an irregular writing manner, a misspelling, etc.), for individual words, phrases, characters, or words in a text (e.g., a sentence), and if the variable form is never learned by the machine learning model, the representation vector output by the machine learning model cannot sufficiently express the semantic meaning of the text. For example, a word in a sentence is replaced with an irregular writing mode, and before the replacement, the sentence may correspond to a representation vector; after the substitution, the sentence corresponds to another representation vector. Theoretically, although a word in a sentence is replaced with an irregular writing manner, the semantic meaning of the sentence is unchanged before and after the replacement, and then, theoretically, the expression vector of the sentence before the replacement and the expression vector of the sentence after the replacement are the same. However, in practical situations, the machine learning model cannot recognize an irregular writing manner, that is, a deformable state cannot be sufficiently covered, so that the machine learning model cannot understand the semantics of the sentence after the replacement, and the difference between the expression vector of the sentence before the replacement and the expression vector of the sentence after the replacement is large, that is, the problem of sparse representation occurs.
In addition, if the languages of the texts processed by the machine learning model in the training stage and the reasoning stage are different, the problem of cross-language ambiguity also exists. For example, machine learning models are trained on english text, where an apple phone (iphone) and an apple as a fruit (apple) are distinguishable. However, if an "apple" appears in the chinese text when the trained machine learning model processes the chinese text in the use stage or the inference stage, the machine learning model cannot correctly analyze the semantics of the "apple", and thus a problem of cross-language ambiguity occurs.
To solve this problem, embodiments of the present disclosure provide a model training method, which is described below with reference to specific embodiments.
Fig. 1 is a flowchart of a model training method provided in an embodiment of the present disclosure. The method may be performed by a model training apparatus, which may be implemented in software and/or hardware, and the apparatus may be configured in an electronic device, such as a server or a terminal, where the terminal specifically includes a mobile phone, a computer, or a tablet computer. In addition, the model training method described in this embodiment may be applied to the application scenario shown in fig. 2. As shown in fig. 2, the application scenario includes a terminal 21 and a server 22, where the server 22 may execute a model training method to train a machine learning model, and the trained machine learning model may be stored in the server 22, or may be deployed in the terminal 21 or another server, so that the server 22, the terminal 21, or another server may execute a text processing method according to the trained machine learning model, that is, process a target text to obtain a representation vector corresponding to the target text, and the representation vector may represent semantics of the target text, for example, core semantics. The target text may be any text to be processed or any text to be semantically or representational vector determined. The method is described in detail with reference to fig. 2, and as shown in fig. 1, the method comprises the following specific steps:
s101, converting the first sample text into an abstract semantic graph, and converting the abstract semantic graph into a graph sequence, wherein the abstract semantic graph is determined by semantic information of the first sample text.
For example, the server 22 may obtain a large amount of sample texts, or the server 22 may store a large amount of sample texts in advance, where each sample text may be a sentence, a paragraph, or a set of words with different lengths. The following is a schematic description taking sentences as examples. For example, the server 22 randomly selects a sample text from a large number of sample texts, and the sample text is denoted as a first sample text. Further, the first sample text is converted into an abstract semantic map using a multi-language abstract semantic representation parsing model, the abstract semantic map being independent of the language or language of the first sample text but related to the semantics of the first sample text. Specifically, the abstract semantic graph can be further referred to as an abstract semantic Representation (AMR), which is a highly abstract semantic structure diagram, and the semantic structure diagram is determined only by the core semantic information of the sentence, and is independent of the specific expression (for example, words, word order, etc.) and the language used. In addition, the abstract semantic representation is also a structured text semantic representation paradigm based on a single directed acyclic graph, and nodes and edges in the graph respectively represent semantic concepts and relations among the concepts. Further, the server 22 may convert the abstract semantic graph into a graph sequence according to the depth-first search principle.
And S102, taking the graph sequence as an input of a machine learning model, so that the machine learning model outputs a graph representation vector.
In particular, the server 22 may take the graph sequence as an input to a machine learning model, which may be a pre-trained transducer (Transformer) based bi-directional Encoder (BERT) model. The BERT model may act as a graph encoder to convert the sequence of graphs into a graph representation vector, such that the machine learning model outputs the graph representation vector.
S103, training the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample.
In this embodiment, since the first sample text is converted into the abstract semantic map, the abstract semantic map is converted into the graph sequence, and the graph sequence is further converted into the graph representation vector, the graph representation vector is derived from the first sample text, that is, the graph representation vector reflects the core semantics of the first sample text. In addition, in this embodiment, the first sample text may also be processed by a text encoder, so as to obtain a text representation vector corresponding to the first sample text, and therefore, the text representation vector reflects the core semantics of the first sample text. Thus, the machine learning model, i.e., the graph encoder, may be trained by reducing the distance between the graph representation vector and the text representation vector. In particular, a Text Encoder (Text Encoder) may be a model that converts a Text sequence into a Text representation vector. A Graph Encoder (Graph Encoder) may be a model that converts Graph structures into corresponding Graph representation vectors.
The embodiment of the disclosure converts the first sample text into the abstract semantic graph, further converts the abstract semantic graph into a graph sequence, and uses the graph sequence as an input of the machine learning model, so that the machine learning model outputs a graph representation vector. The machine learning model is then trained based on the graph representation vector and a text representation vector corresponding to the first sample. Because the abstract semantic graph is determined by the semantic information of the first sample text and is irrelevant to the language or language of the first sample text, and the semantic concept corresponding to each node in the abstract semantic graph is the concept after abstraction and normalization, multiple variable forms of the same word can be abstracted and normalized into one semantic concept due to the fact that the semantics of the variable forms are the same, and therefore the multiple variable forms are guaranteed to be analyzed into the same semantics, and the problem of representation sparseness is greatly relieved. In addition, because the corresponding semantics of the ambiguous word or phrase in different texts are different, the words or phrases with the same expression form in different texts correspond to different abstract semantic graphs due to the difference of the semantics, and the semantic concepts of the ambiguous word or phrase in different abstract semantic graphs are also different, so that the ambiguity of the abstract semantic graphs is low. In addition, in this embodiment, a highly abstracted abstract semantic graph is introduced into a process of contrast representation learning based on a graph representation vector and a text representation vector, so that a machine learning model trained based on the process of contrast representation learning can output a relatively accurate graph representation vector, that is, the machine learning model is guided to learn core semantics of a multi-language text, and thus the graph representation vector of any text can accurately reflect the semantics of the text, and therefore, the graph representation vector of any text can enhance the text representation vector of the text, so that a final representation vector obtained according to the graph representation vector and the text representation vector can fully and accurately express the semantics of the text, that is, the multi-language text embedding performance is improved.
Fig. 3 is a flowchart of a model training method according to another embodiment of the present disclosure. In this embodiment, the method specifically includes the following steps:
s301, converting the first sample text into an abstract semantic graph, and converting the abstract semantic graph into a graph sequence, wherein the abstract semantic graph is determined by semantic information of the first sample text.
For example, the first sample text in this embodiment is "this fact is accessible to you" as shown in fig. 4, the parser as shown in fig. 4 is a multi-language abstract semantic Representation Parsing model as described above, and specifically, the parser may parse the first sample text, which may be abstract semantic Representation Parsing (unstructured Meanning Parsing), i.e., a process of converting unstructured natural language text into a corresponding structured abstract semantic graph. And abstracting and normalizing the semantic information of the first sample text in the parsing process, so as to obtain an abstract semantic graph 41 as shown in fig. 4, where the abstract semantic graph 41 includes nodes 42, 43 and 44, where different nodes have different names, and each node corresponds to a semantic concept. Wherein the name of any node may or may not be present in the first sample. For example, the core of "accessible" in the first sample text means the privilege, and thus, the parser can abstract and normalize "accessible" to "privilege-01". In addition, "authority-01" may be used as a core semantic concept in the abstract semantic graph 41, and "you" and "fact" respectively represent the semantic concepts on which the core semantic concept depends, that is, "you" and "fact" are some semantic concepts surrounding the core semantic concept. Argument (ARG) 0 represents the relationship between "privilege-01" and "you", for example, "privilege-01" depends on "you". Similarly, ARG1 represents the relationship between "Authority-01" and "fact", e.g., "Authority-01" depends on "fact".
It is understood that sentences with the same semantics but different word orders or different languages correspond to the same abstract semantic graph. For example, as shown in fig. 4, "you have permission to access this fact" and "this fact is accessible to you" are two sentences with the same semantics and different word orders, which may correspond to the same abstract semantic graph 41, that is, "accessible" in "you have permission to access this fact" and "accessible" in "this fact is accessible to you are words with different writing forms and the same semantics, and thus" permission "and" accessible "may be uniformly abstracted and normalized to" permission-01 ", thereby greatly alleviating the problem of representing sparseness and ambiguity. Similarly, "You have permission to access this fact" and "You have access to the features" may also correspond to the same abstract semantic graph. Further, the abstract semantic graph 41 can be linearized, i.e. converted into a graph sequence according to the principle of depth-first search, the graph sequence being as shown in FIG. 4 (Authority-01 ARG0 you: ARG1 fact.
S302, the graph sequence is used as the input of a machine learning model, so that the machine learning model outputs graph representation vectors.
For example, as shown in fig. 4, the graph encoder, which is marked as a machine learning model to be trained in the embodiment of the present disclosure, takes the graph sequence (authority-01 arg0 you: ARG1 fact) as an input, and further, the graph encoder may convert the graph sequence into a graph representation vector, so that the graph encoder may output the graph representation vector.
S303, inputting the first sample into a text encoder, so that the text encoder outputs a text representation vector corresponding to the first sample.
As shown in fig. 4, the embodiment may further input the first sample into a text encoder, so that the text encoder outputs a text representation vector corresponding to the first sample. In this embodiment, the text encoder may be a model that can convert text into text representation vectors. For example, the text encoder is not limited to be trained by the comparative representation learning framework based on the text pair, and may also be trained by other methods, and this embodiment does not specifically limit the training method of the text encoder. For example, the text encoder may be an existing Multilingual text/Sentence Embedding model, such as the Language-agnostic BERT Sentence Embedding (LaBSE) model or the Multilingual Unsupervised and Supervised Embedding (MUSE) model, etc.
Optionally, the machine learning model and the text encoder form a double tower structure, and in the double tower structure, the encoding process of the graph sequence by the machine learning model and the encoding process of the first sample text by the text encoder are independent of each other. For example, the graph coder and the text coder shown in fig. 4 constitute a double tower structure. That is, any sentence may sequentially pass through the parser, the linearizer, and the graph encoder to obtain the graph representation vector, and the sentence may further pass through the text encoder to obtain the text representation vector, and the graph encoder may sequentially pass through the parser, the process of encoding the graph sequence obtained by the linearizer, and the process of encoding the sentence by the text encoder to be independent of each other. That is, a double tower structure (also referred to as a double tower network) is a network structure that independently encodes text pairs that need to be compared or matched, respectively.
S304, a second sample text is obtained, and the first sample text and the second sample text are negative example texts.
In addition, the server 22 may obtain a second sample text from a large amount of sample texts, so that the second sample text and the first sample text are negative example texts of each other. The second sample text may be a sentence that does not match, is not equivalent, or is not similar to the abstract semantic graph 41 as shown in fig. 4. For example, the second sample text may be "you do not have access to this fact" as shown in fig. 5. Since the second sample text and the first sample text are negative example texts, the abstract semantic graph 51 corresponding to the second sample text is different from the abstract semantic graph 41 corresponding to the first sample text, and the graph sequence corresponding to the second sample text is also different from the graph sequence corresponding to the first sample text. For example, as shown in fig. 5, the abstract semantic graph 51 corresponding to the second sample text further includes a polarity corresponding to "right-01", for example, the polarity is "-" to indicate that the polarity is negative, i.e., no right is indicated. Similarly, the second sample text corresponds to a graph sequence of (Right-01: polarity-: ARG0 you: ARG1 fact) which also includes the polarity of "-".
And S305, inputting the second sample text into a text encoder, so that the text encoder outputs a text representation vector corresponding to the second sample text.
As shown in fig. 6, the first sample text is input to the text encoder 1, so that the text encoder 1 outputs a text representation vector 1 corresponding to the first sample text. The second sample text is input to the text encoder 2 such that the text encoder 2 outputs a text representation vector 2 corresponding to the second sample text. In some embodiments, the text encoder 1 and the text encoder 2 may be the same text encoder, or may be different text encoders.
S306, training the machine learning model by reducing the distance between the graph representation vector and the text representation vector corresponding to the first sample text and increasing the distance between the graph representation vector and the text representation vector corresponding to the second sample text.
As shown in fig. 6, since both the representation vector and the text representation vector 1 are derived from the same text, i.e., the first sample text, both the representation vector and the text representation vector 1 can reflect the core semantics of the first sample text. However, since the second sample text and the first sample text are negative example texts of each other, the difference between the text representation vector 1 and the text representation vector 2 is large. In this case, the graph encoder may be trained by decreasing the distance between the representation vector and the text representation vector 1 and increasing the distance between the representation vector and the text representation vector 2. Such that the trained graph encoder can generate graph representation vectors that are aligned with positive example text representations while being distinguished from negative example text representations, i.e., such that the trained graph encoder can generate more accurate graph representation vectors.
It will be appreciated that in some embodiments, on the basis of fig. 5, the sequence of graphs corresponding to the second sample text may also be input to the graph encoder, such that the graph encoder outputs the graph representation vector corresponding to the second sample text. Further, the graph encoder is trained by decreasing the distance between the graph representation vector corresponding to the first sample text and the text representation vector 1, and increasing the distance between the graph representation vector corresponding to the first sample text and the graph representation vector corresponding to the second sample text.
In this embodiment, the second sample text is obtained, so that the first sample text and the second sample text are negative example texts. Further, the machine learning model is trained by decreasing a distance between a graph representation vector corresponding to a first sample text and a text representation vector corresponding to the first sample text, and increasing a distance between the graph representation vector and a text representation vector corresponding to a second sample text. Thus, a comparative representation learning framework based on the abstract semantic graph and the text is realized, so that the input of the comparative representation learning framework comprises multi-modal information (such as the abstract semantic graph and the text) and is not limited to the input of the same modality. Thus, this training method of the present embodiment may enable interaction between a structured representation, e.g., a graph representation vector, and an unstructured representation, e.g., a text representation vector. In addition, the parser and the linearized and trained graph encoder described in this embodiment may constitute a plug-in, and when the plug-in is installed to a device having only a text encoder, the device may calculate not only a text representation vector of any text, but also a graph representation vector of the text, and further, a final representation vector of the text may be generated according to the text representation vector and the graph representation vector. In addition, the plug-in mode has the plug-and-play characteristic, can be applied to any embedded model, and is high in flexibility.
It is understood that the above embodiments mainly describe the training process of the graph encoder, and the following describes the usage process (also called inference process) of the graph encoder. In particular, the training process and the use process may be performed on the same device, e.g., on the same server 22. Alternatively, the training process and the use process may be performed separately on different devices, for example, the training process for the graph encoder may be performed on the server 22, and the parser, linearized, trained graph encoder may constitute a plug-in that is installed on other devices that have only text encoders. Or the trained graph encoder may be deployed to other devices having parsers, linearization functionality, and text encoders. Multiple experiments show that after the plug-in is applied to a model (LASER) with the best performance on a multilingual text classification task, the multilingual classification accuracy is averagely improved by 2.0 on 5 different types of classification tasks. After the plug-in is applied to the best-performing model (Xpara) on the multilingual text similarity task, the similarity score is improved by 1.58 on average under a monolingual scene (3 languages in total) and 1.08 on average under a cross-language scene (7 languages in total).
Fig. 7 is a flowchart of a text processing method according to another embodiment of the disclosure. In this embodiment, the method specifically includes the following steps:
and S701, acquiring a target text.
For example, after the graph encoder training is completed, the trained graph encoders are stored in the server 22. The server 22 may obtain the target text to be processed, for example, the terminal 21 may transmit the target text to be processed to the server 22.
S702, converting the target text into an abstract semantic graph, and converting the abstract semantic graph into a graph sequence.
The server 22 may employ a parser to convert the target text into an abstract semantic graph and linearize the abstract semantic graph into a sequence of graphs.
And S703, taking the graph sequence as an input of a machine learning model, so that the output graph of the machine learning model represents a vector, wherein the machine learning model is obtained by training according to the model training method.
For example, the server 22 may use the graph sequence as an input of a trained graph encoder, and a training process of the graph encoder is the process described in the above embodiments, which is not described herein again. The trained graph encoder may convert the sequence of graphs into a graph representation vector and output the graph representation vector.
S704, fusing the text representation vector corresponding to the target text and the graph representation vector, wherein the obtained fusion result is used as the representation vector of the target text, and the representation vector is used for representing the semantics of the target text.
For example, the server 22 may fuse the text representation vector corresponding to the target text with the graph representation vector, and use the obtained fusion result as the representation vector of the target text, so that the representation vector of the target text may represent the semantics of the target text.
Optionally, before fusing the text representation vector corresponding to the target text and the graph representation vector, the method further includes: and inputting the target text into a text encoder, so that the text encoder outputs a text representation vector corresponding to the target text.
For example, before the server 22 fuses the text representation vector corresponding to the target text with the graph representation vector, the target text may be input to a text encoder, so that the text encoder outputs the text representation vector corresponding to the target text.
Optionally, fusing the text representation vector corresponding to the target text and the representation vector, including: and splicing the text representation vector corresponding to the target text and the representation vector.
For example, the server 22 may splice the text representation vector corresponding to the target text and the graph representation vector to obtain a fusion result. For example, the representation vector is directly spliced to the text representation vector corresponding to the target text.
The embodiment highlights the core semantics of the target text through the abstract semantic graph, and compared with the existing multilingual text/sentence embedding model which only depends on text information, the method enhances the multilingual text/sentence embedding model by using the abstract semantic graph, introduces the highly abstracted abstract semantic graph into the graph-text contrast representation learning process, and relieves the problems of text representation sparseness and cross-language ambiguity. In addition, the graph representation vector output by the graph encoder based on the abstract semantic graph and the text representation vector are introduced into a comparison representation learning framework for semantic interaction, so that the graph representation vector capturing the core semantic information can be used for enhancing any existing multi-language text representation.
Fig. 8 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure. The model training apparatus provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the model training method, as shown in fig. 8, the model training apparatus 80 includes:
a conversion module 81, configured to convert the first sample text into an abstract semantic map, and convert the abstract semantic map into a map sequence, where the abstract semantic map is determined by semantic information of the first sample text;
an input module 82 for taking the graph sequence as an input to a machine learning model, such that the machine learning model outputs a graph representation vector;
a training module 83 configured to train the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample.
Optionally, the training module 83 includes: an obtaining unit 831 and a training unit 832, wherein the obtaining unit 831 is configured to obtain a second sample text, and the first sample text and the second sample text are negative example texts; the training unit 832 is for training the machine learning model by decreasing a distance between the graph representation vector and the text representation vector corresponding to the first sample text and increasing a distance between the graph representation vector and the text representation vector corresponding to the second sample text.
Optionally, the training module 83 further includes: an input unit 833, configured to input the second sample text into the text encoder after the obtaining unit 831 obtains the second sample text, so that the text encoder outputs a text representation vector corresponding to the second sample text.
Optionally, the input module 82 is further configured to, before the training module 83 trains the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample, input the first sample into a text encoder, so that the text encoder outputs the text representation vector corresponding to the first sample.
Optionally, the machine learning model and the text encoder form a double tower structure, and in the double tower structure, the encoding process of the graph sequence by the machine learning model and the encoding process of the first sample text by the text encoder are independent of each other.
The model training apparatus in the embodiment shown in fig. 8 can be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 9 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present disclosure. The text processing apparatus provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the text processing method, as shown in fig. 9, the text processing apparatus 90 includes:
an obtaining module 91, configured to obtain a target text;
a conversion module 92, configured to convert the target text into an abstract semantic graph, and convert the abstract semantic graph into a graph sequence;
an input module 93 for inputting the graph sequence as a machine learning model such that the machine learning model output graph represents a vector, the machine learning model being trained according to the method of any one of claims 1-5;
and a fusion module 94, configured to fuse the text representation vector corresponding to the target text and the graph representation vector, and use the obtained fusion result as a representation vector of the target text, where the representation vector is used to represent the semantics of the target text.
Optionally, before the input module 93 fuses the text representation vector corresponding to the target text and the graph representation vector, the input module is further configured to input the target text into a text encoder, so that the text encoder outputs the text representation vector corresponding to the target text.
Optionally, when the fusion module 94 fuses the text representation vector corresponding to the target text and the graph representation vector, specifically, the fusion module is configured to: and splicing the text representation vector corresponding to the target text and the graph representation vector.
The text processing apparatus in the embodiment shown in fig. 9 may be used to implement the technical solutions in the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
The internal functions and structures of the model training apparatus, the text processing apparatus, and the like have been described above, and the apparatus can be implemented as an electronic device. Fig. 10 is a schematic structural diagram of an embodiment of an electronic device provided in the embodiment of the present disclosure. As shown in fig. 10, the electronic device includes a memory 101 and a processor 102.
The memory 101 is used to store programs. In addition to the above-described programs, the memory 101 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.
The memory 101 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The processor 102 is coupled to the memory 101 and executes programs stored in the memory 101 for:
converting a first sample text into an abstract semantic graph and converting the abstract semantic graph into a graph sequence, wherein the abstract semantic graph is determined by semantic information of the first sample text;
taking the graph sequence as an input to a machine learning model, such that the machine learning model outputs a graph representation vector;
training the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample text.
Alternatively, the processor 102 is further configured to:
acquiring a target text;
converting the target text into an abstract semantic graph, and converting the abstract semantic graph into a graph sequence;
taking the graph sequence as an input of a machine learning model, so that the output graph of the machine learning model represents a vector, wherein the machine learning model is obtained by training according to the model training method;
and fusing the text representation vector corresponding to the target text and the graph representation vector to obtain a fusion result serving as the representation vector of the target text, wherein the representation vector is used for representing the semantics of the target text.
Further, as shown in fig. 10, the electronic device may further include: communication components 103, power components 104, audio components 105, display 106, and other components. Only some of the components are schematically shown in fig. 10, and the electronic device is not meant to include only the components shown in fig. 10.
The communication component 103 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 103 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 103 further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
The power supply component 104 provides power to various components of the electronic device. The power components 104 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for an electronic device.
The audio component 105 is configured to output and/or input audio signals. For example, the audio component 105 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 101 or transmitted via the communication component 103. In some embodiments, audio component 105 also includes a speaker for outputting audio signals.
The display 106 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
In addition, the embodiment of the present disclosure also provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the model training method or the text processing method described in the above embodiment.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The previous description is only for the purpose of describing particular embodiments of the present disclosure, so as to enable those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. A method of model training, wherein the method comprises:
converting a first sample text into an abstract semantic graph and converting the abstract semantic graph into a graph sequence, wherein the abstract semantic graph is determined by semantic information of the first sample text;
taking the graph sequence as an input to a machine learning model, such that the machine learning model outputs a graph representation vector;
training the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample.
2. The method of claim 1, wherein training the machine learning model based on the graph representation vector and a text representation vector corresponding to the first sample comprises:
acquiring a second sample text, wherein the first sample text and the second sample text are negative example texts;
training the machine learning model by decreasing a distance between the graph representation vector and the text representation vector corresponding to the first sample text and increasing a distance between the graph representation vector and the text representation vector corresponding to the second sample text.
3. The method of claim 2, wherein after obtaining the second sample text, the method further comprises:
inputting the second sample text into a text encoder, such that the text encoder outputs a text representation vector corresponding to the second sample text.
4. The method of claim 1, wherein prior to training the machine learning model based on the graph representation vector and the text representation vector corresponding to the first sample, the method further comprises:
inputting the first sample into a text encoder such that the text encoder outputs a text representation vector corresponding to the first sample.
5. The method of claim 4, wherein the machine learning model and the text encoder form a double tower structure in which the encoding process of the graph sequence by the machine learning model and the encoding process of the first sample text by the text encoder are independent of each other.
6. A method of text processing, wherein the method comprises:
acquiring a target text;
converting the target text into an abstract semantic graph, and converting the abstract semantic graph into a graph sequence;
taking the sequence of graphs as an input to a machine learning model such that the machine learning model output graph represents a vector, the machine learning model being trained according to the method of any one of claims 1-5;
and fusing the text representation vector corresponding to the target text and the graph representation vector to obtain a fusion result serving as the representation vector of the target text, wherein the representation vector is used for representing the semantics of the target text.
7. The method of claim 6, wherein prior to fusing the text representation vector corresponding to the target text and the graph representation vector, the method further comprises:
and inputting the target text into a text encoder, so that the text encoder outputs a text representation vector corresponding to the target text.
8. The method of claim 6, wherein fusing the text representation vector corresponding to the target text and the graph representation vector comprises:
and splicing the text representation vector corresponding to the target text and the representation vector.
9. A model training apparatus, comprising:
a conversion module, configured to convert a first sample text into an abstract semantic graph, and convert the abstract semantic graph into a graph sequence, where the abstract semantic graph is determined by semantic information of the first sample text;
an input module to take the graph sequence as an input to a machine learning model such that the machine learning model outputs a graph representation vector;
a training module for training the machine learning model according to the graph representation vector and the text representation vector corresponding to the first sample.
10. A text processing apparatus, comprising:
the acquisition module is used for acquiring a target text;
the conversion module is used for converting the target text into an abstract semantic graph and converting the abstract semantic graph into a graph sequence;
an input module for taking the graph sequence as an input to a machine learning model such that the machine learning model output graph represents a vector, the machine learning model being trained according to the method of any one of claims 1-5;
and the fusion module is used for fusing the text representation vector corresponding to the target text and the graph representation vector to obtain a fusion result as the representation vector of the target text, wherein the representation vector is used for representing the semantics of the target text.
11. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-8.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202211217493.5A 2022-10-04 2022-10-04 Model training method, model training device, text processing method, text processing device, model training equipment and storage medium Pending CN115600685A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211217493.5A CN115600685A (en) 2022-10-04 2022-10-04 Model training method, model training device, text processing method, text processing device, model training equipment and storage medium
PCT/CN2023/121263 WO2024074099A1 (en) 2022-10-04 2023-09-25 Model training method and apparatus, text processing method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211217493.5A CN115600685A (en) 2022-10-04 2022-10-04 Model training method, model training device, text processing method, text processing device, model training equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115600685A true CN115600685A (en) 2023-01-13

Family

ID=84845263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211217493.5A Pending CN115600685A (en) 2022-10-04 2022-10-04 Model training method, model training device, text processing method, text processing device, model training equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115600685A (en)
WO (1) WO2024074099A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188618A (en) * 2023-04-24 2023-05-30 清华大学 Image generation method and device based on structured semantic graph
WO2024074099A1 (en) * 2022-10-04 2024-04-11 阿里巴巴达摩院(杭州)科技有限公司 Model training method and apparatus, text processing method and apparatus, device, and storage medium
WO2024199450A1 (en) * 2023-03-31 2024-10-03 北京罗克维尔斯科技有限公司 Vehicle control method and apparatus, and device and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7698294B2 (en) * 2006-01-11 2010-04-13 Microsoft Corporation Content object indexing using domain knowledge
CN112015955B (en) * 2020-09-01 2021-07-30 清华大学 Multi-mode data association method and device
CN113761122A (en) * 2021-05-19 2021-12-07 清华大学 Event extraction method, related device, equipment and storage medium
CN113673201A (en) * 2021-07-15 2021-11-19 北京三快在线科技有限公司 Text representation vector generation method and device, storage medium and electronic equipment
CN114611521B (en) * 2022-04-13 2024-04-09 国家电网有限公司大数据中心 Entity identification method, device, equipment and storage medium
CN114881004A (en) * 2022-04-24 2022-08-09 四川语言桥信息技术有限公司 Document level machine translation method based on intermediate semantic representation
CN115600685A (en) * 2022-10-04 2023-01-13 阿里巴巴(中国)有限公司(Cn) Model training method, model training device, text processing method, text processing device, model training equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024074099A1 (en) * 2022-10-04 2024-04-11 阿里巴巴达摩院(杭州)科技有限公司 Model training method and apparatus, text processing method and apparatus, device, and storage medium
WO2024199450A1 (en) * 2023-03-31 2024-10-03 北京罗克维尔斯科技有限公司 Vehicle control method and apparatus, and device and storage medium
CN116188618A (en) * 2023-04-24 2023-05-30 清华大学 Image generation method and device based on structured semantic graph
CN116188618B (en) * 2023-04-24 2023-08-15 清华大学 Image generation method and device based on structured semantic graph

Also Published As

Publication number Publication date
WO2024074099A1 (en) 2024-04-11

Similar Documents

Publication Publication Date Title
US20200226324A1 (en) System for providing intelligent part of speech processing of complex natural language
CN115600685A (en) Model training method, model training device, text processing method, text processing device, model training equipment and storage medium
Bunt et al. Dialogue act annotation with the ISO 24617-2 standard
US9740685B2 (en) Generation of natural language processing model for an information domain
US10832000B2 (en) Identification of textual similarity with references
US20140316764A1 (en) Clarifying natural language input using targeted questions
US10824816B2 (en) Semantic parsing method and apparatus
US11010284B1 (en) System for understanding navigational semantics via hypothesis generation and contextual analysis
CN109429522A (en) Voice interactive method, apparatus and system
KR20210154705A (en) Method, apparatus, device and storage medium for matching semantics
US10977155B1 (en) System for providing autonomous discovery of field or navigation constraints
US20180260389A1 (en) Electronic document segmentation and relation discovery between elements for natural language processing
KR20190059084A (en) Natural language question-answering system and learning method
US10223349B2 (en) Inducing and applying a subject-targeted context free grammar
CN110890097A (en) Voice processing method and device, computer storage medium and electronic equipment
CN113051895A (en) Method, apparatus, electronic device, medium, and program product for speech recognition
CN116561275A (en) Object understanding method, device, equipment and storage medium
Feng et al. Question classification by approximating semantics
Wang et al. RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction
CN117421413A (en) Question-answer pair generation method and device and electronic equipment
CN116701604A (en) Question and answer corpus construction method and device, question and answer method, equipment and medium
CN111222334A (en) Named entity identification method, device, equipment and medium
CN115982204A (en) Query statement conversion method and device, electronic equipment and storage medium
Yin Fuzzy information recognition and translation processing in English interpretation based on a generalized maximum likelihood ratio algorithm
CN115620726A (en) Voice text generation method, and training method and device of voice text generation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination