WO2020134154A1

WO2020134154A1 - Artificial intelligence-based text data enhancement method and device, equipment and storage medium

Info

Publication number: WO2020134154A1
Application number: PCT/CN2019/103684
Authority: WO
Inventors: 金戈; 徐亮; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-12-29
Filing date: 2019-08-30
Publication date: 2020-07-02
Also published as: CN109614492A; CN109614492B

Abstract

The present application relates to the technical field of artificial intelligence, and relates to an artificial intelligence-based text data enhancement method and device, equipment and a storage medium. The method comprises: providing a first input text in a text database to a text generation model, and the text generation model converting the first input text into at least one first output text; calculating the word order fluency of the first output text; comparing the word order fluency of the first output text to the word order fluency of a correct text; and when the word order fluency of the first output text is greater than or equal to the word order fluency of the correct text, providing the first output text as a second input text to the text generation model, so that the text generation model converts the second input text into at least one second output text, the word order fluency of the second output text being less than that of the correct text. Thus, the data size of text generation model training is increased.

Description

Artificial intelligence-based text data enhancement method, device, equipment and storage medium

【cross reference】

This application is based on the Chinese invention patent application filed on December 29, 2018, with the application number 201811641967.2, titled "Artificial Intelligence-based Text Data Enhancement Methods, Devices, Equipment, and Storage Media," and claims priority.

【Technical field】

This application belongs to the field of artificial intelligence technology, and relates to text data enhancement methods, devices, equipment, and storage media based on artificial intelligence.

【Background technique】

Currently, the text generation model can convert one or more input texts into one or more output texts. In order for the text generation model to generate output text with less linguistics and more accurate semantics, it is necessary to provide the text generation model with a large amount of input text so that the text generation model can converge.

Under the existing technical conditions, it is very difficult to obtain a large amount of input text that meets the requirements, making it difficult to achieve the desired effect on the training of the text generation model, that is, the text generation model is not easy to achieve convergence. In addition, in the prior art, it is difficult to perform a linguistic check on the output text obtained by conversion of the text generation model, thus restricting the practical application of the text generation model.

[Invention content]

Embodiments of the present application provide an artificial intelligence-based text data enhancement method, device, device, and storage medium, which are designed to increase the amount of input text data.

An artificial intelligence-based text data enhancement method. The artificial intelligence-based text data enhancement method includes:

Providing the first input text in the text database to a text generation model, and converting the first input text into at least one first output text by the text generation model;

Calculating the word order fluency of the first output text;

Compare the word order fluency of the first output text with the word order fluency of the correct text;

When the word order fluency of the first output text is greater than or equal to the word order fluency of the correct text, the first output text is provided as a second input text to the text generation model, so that the text is generated The model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the word order fluency of the second output text is less than the word order fluency of the correct text.

An artificial intelligence-based text data enhancement device. The artificial intelligence-based text data enhancement device includes:

The text training module is used to provide the first input text in the text database to the text generation model, and the text generation model converts the first input text into at least one first output text;

A word order fluency calculation module, used to calculate the word order fluency of the first output text;

Word order fluency comparison module, used to compare the word order fluency of the first output text with the word order fluency of the correct text;

The input text increment module is used to provide the first output text as the second input text to the text generation when the word sequence fluency of the first output text is greater than or equal to the word sequence fluency of the correct text A model so that the text generation model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the fluency of the word sequence of the second output text is less than the correct Fluency of the word order of the text.

A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the processor executes the computer-readable instructions, any of the steps of the artificial intelligence-based text data enhancement method described above is implemented.

A computer-readable storage medium having computer-readable instructions stored thereon when the computer-readable instructions are executed by one or more processors to implement any of the above artificial intelligence-based text data enhancement methods A step of.

Compared with the prior art, the technical solution disclosed in this application mainly has the following beneficial effects:

In an embodiment of the present application, by providing the first output text with a word order fluency greater than or equal to the correct text as the second input text to the text generation model, the text generation model will The second input text is converted into at least one piece of the second output text. The text generation model incorrectly trains the second input text, so that the word order fluency of the second output text is less than the word order fluency of the correct text. Inputting the text generation model by the second input text with a word order fluency greater than or equal to the correct text, and converting the second input text into at least one second output text by the text generation model Therefore, the amount of data used to train the text generation model is further increased, which is helpful to reduce the training time of the text generation model, so that the text generation model can achieve convergence in a shorter time, which is beneficial to overcome The problem of insufficient data training for the text generation model.

[Description of the drawings]

In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings required in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without paying any creative labor.

1 is a schematic diagram of a text data enhancement method based on artificial intelligence described in an embodiment of the present application;

2 is another schematic diagram of an artificial intelligence-based text data enhancement method described in an embodiment of this application;

3 is a schematic diagram of text generation training of a seq2seq model of an intelligent customer service robot in an embodiment of this application;

4 is another schematic diagram of text generation training of a seq2seq model of an intelligent customer service robot in an embodiment of the present application;

5 is a schematic diagram of an artificial intelligence-based text data enhancement device according to an embodiment of the application;

FIG. 6 is a block diagram of the basic structure of the computer device 100 in an embodiment of the present application.

Description of reference signs:

【detailed description】

In order to facilitate understanding of the application, the application will be described more fully below with reference to related drawings. The drawings show preferred embodiments of the present application. However, this application can be implemented in many different forms and is not limited to the embodiments described herein. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the present application. The terminology used in the specification of the present application herein is for the purpose of describing specific embodiments only, and is not intended to limit the present application.

An embodiment of the present application discloses a text data enhancement method based on artificial intelligence.

Referring to FIGS. 1 and 2, FIG. 1 is a schematic diagram of an artificial intelligence-based text data enhancement method according to an embodiment of the application, and FIG. 2 is an artificial intelligence-based text data enhancement method according to an embodiment of the application. Another schematic of the method.

As illustrated in FIG. 1, the text data enhancement method based on artificial intelligence includes:

S1: Provide the first input text in the text database to the text generation model, and convert the first input text into at least one first output text by the text generation model.

S2: Calculate the fluency of the word sequence of the first output text.

S3: Compare the fluency of the word sequence of the first output text with the fluency of the word sequence of the correct text.

S4a: When the word order fluency of the first output text is greater than or equal to the word order fluency of the correct text, provide the first output text as the second input text to the text generation model, so that the The text generation model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the word order fluency of the second output text is less than the word order fluency of the correct text. The preset condition includes that the text generation model achieves convergence.

In S4a of the embodiment of the present application, the first output text with a word order fluency greater than or equal to the correct text is provided as the second input text to the text generation model, and the text generation model will The second input text is converted into at least one piece of the second output text. The text generation model incorrectly trains the second input text, so that the word order fluency of the second output text is less than the word order fluency of the correct text. The "wrong training" can be understood as providing the first output text with a word order fluency greater than or equal to the correct text as the second input text to the text generation model for training, resulting in a word order fluency less than The second output text of the correct text.

Since the text generation model can recombine the morphemes of the second input text, the second input text itself is usually not combined. Therefore, if the second input text whose word order fluency is greater than or equal to the correct text is input into the text generation model, at least one of the second outputs whose word order fluency is less than the correct text will be combined text. In an embodiment of the present application, the text generation model is input by the second input text with a word order fluency greater than or equal to the correct text, and the second input text is converted into At least one of the second output texts, thus further increasing the amount of data used to train the text generation model, which is beneficial to reduce the training time of the text generation model, so that the text generation model can be shortened Achieve convergence within time.

S1, S2, S3, and S4a may be repeated until the text generation model converges, and stop providing the second input text to the text generation model.

As illustrated in FIG. 2, in order to further increase the data volume of the input text, the text data enhancement method based on artificial intelligence after S3 further includes:

S4b: When the word order fluency of the first output text is less than the word order fluency of the correct text, the first output text is provided to the text generation model.

In the embodiment of the present application, the first output text converted into a text sequence model with a word order fluency less than the correct text is re-provided to the text generation model, so the text database can be increased The amount of data is beneficial to overcome the problem of insufficient data amount of the text database, reduces the difficulty of obtaining the first input text that meets the requirements, and improves the training efficiency of the text generation model.

S1, S2, S3, and S4b may be repeated until the text generation model converges, and then stop providing the first input text to the text generation model.

It should be noted that the steps shown in FIG. 1 and the steps shown in FIG. 2 can be performed simultaneously. In addition, the execution order of S4a and S4b is not different.

In some embodiments of the present application, the calculating the word order fluency of the first output text includes:

among them,

f(x) represents the fluency of the word order; P(x _i |x<i) refers to the language of the following P(x _i ) given the above of the first output text Model probability.

Further, in the embodiment of the present application, the language model probability is obtained by calculating a language model, and the language model includes an n-gram language model and a neural probabilistic language model.

In the embodiments of the present application, the "above" and the "below" can be understood as follows:

When "above" is the subject of the first output text, "below" is the predicate of the first output text. For example, the first output text is: I like it. Among them, "I" is above, then "like" is below.

When "above" is the subject and predicate of the first output text, "below" is the object of the first output text. For example, the first output text is: I like apples. Among them, "I like" is above, then "Apple" is below.

In a word, the "above" can be understood as the words and sentences that have been given and determined, and the "below" can be understood as the words and sentences appearing after the "above" in the language model.

"Language model probability" refers to the probability that a certain kind of context appears when given the context above. The probabilities of different language models appearing behind the same above are different. On the basis of combining the above, usually the following words with fluency greater than or equal to the correct text have a relatively large probability of language models. For example, given the above "I like it", the probability of the language model below "eating apples" is greater than the probability of the language model below "dislikes".

In the embodiment of the present application, H(x) can be understood as information entropy, and the greater the information entropy, the greater the uncertainty of a certain word or sentence appearing below.

An example will be given below to compare the word order fluency of the first output text with the corresponding word order fluency of the correct text.

It is assumed that the word order fluency of the correct text is 1.6. After the first input text is input to the text generation model, the text generation model converts five first output texts. The fluency of the word sequence of the five first output texts is 0.7, 0.9, 1.2, 1.8, and 1.4, respectively. It is assumed that the first output text with a word order fluency of 1.8 is free of speech problems, and the first output text with a flow degree of 0.7, 0.9, 1.2, and 1.4 is considered to have a speech disease. 4 said first output texts with language problems are stored in the text database, and then provided to the text generation model for training. The first output text without linguistics directly provides the text generation model for error training, and the word order fluency obtained by performing error training on the text generation model is less than the second output of the correct text The text is provided to the text generation model for training, and the amount of data used for training the text generation model is increased.

In some embodiments of the present application, the step of providing the first output text as the second input text to the text generation model includes: forming the first output text and the correct text into one text data Yes, the first output text in the text data pair is provided to the text generation model as the second input text. The first output text corresponds to only one correct text.

Since the first output text needs to compare the word order fluency with the corresponding correct text, it is advantageous to quickly determine the An output text for the correct text for word order fluency comparison.

In some embodiments of the present application, the text generation model includes: an RNN (Recurrent Neural Network) structure model and a seq2seq model. The purpose of providing the first input text and the second input text to the text generation model is to converge the text generation model, so when the text generation model converges, stop providing the text generation model with The first input text and the second input text.

The specific application of the text data enhancement method in the above embodiment will be further explained by taking the seq2seq model of the intelligent customer service robot for text generation training as an example.

Referring to FIG. 3, it is a schematic diagram of text generation training of a seq2seq model of an intelligent customer service robot in an embodiment of the present application. The specific implementation process is detailed as follows:

S51: Obtain a text data pair in a preset text database, and input the text data pair to the seq2seq model, where the text data pair includes the first output text.

As illustrated in FIG. 3, the first input text and the correct text for text generation training constitute the text data pair and are stored in the text database. When performing text generation training on the seq2seq model of the intelligent customer service robot, the text data pair in the text database is retrieved, and the first output text in the text data pair is provided to the seq2seq model .

S52: Calculate the word order fluency of the first output text through the seq2seq model, compare the word order fluency with the word order fluency of the correct text, and determine the comparison result.

The seq2seq model converts the first output text into multiple pieces of the first output text. All the first output texts converted from the seq2seq model constitute an output text set. Then calculate the word order fluency of each of the first output texts. Compare the word order fluency of each first output text with the corresponding word order fluency of the correct text.

S53: According to the comparison result, determine whether the seq2seq model has converged.

Determine whether the seq2seq model has converged. When the seq2seq model does not converge, the first output text whose word order fluency is less than the correct text and the correct text form a new text data pair, and store it in the text database.

It should be noted that the step of determining whether the seq2seq model has converged does not limit the position shown in FIG. 3. For example, after the seq2seq model converts the first input text into multiple first output texts, it can be determined whether the seq2seq model has converged.

S54: If the seq2seq model does not converge, return to the step of calculating the word order fluency of the first output text through the seq2seq model, and continue to execute until the seq2seq model converges to obtain the trained seq2seq model.

When it is determined that the seq2seq model does not converge, calculate the word order fluency of each first output text, and then compare the word order fluency of each first output text with the corresponding word order fluency of the correct text .

The above process of supplying the first output text with a word order fluency less than the correct text to the seq2seq model for text generation training is cyclically performed until it is determined that the seq2seq model has converged. After the seq2seq model converges, the loop will end, and the first input text will be stopped for the seq2seq model.

Referring to FIG. 4, another schematic diagram of text generation training of the seq2seq model of the intelligent customer service robot in an embodiment of the present application is described in detail as follows:

S61: Obtain a text data pair in a preset text database, and input the text data pair to the seq2seq model, where the text data pair includes the first output text.

As illustrated in FIG. 4, the first input text and the correct text used for text generation training form a text data pair and are stored in a text database. When performing text generation training on the seq2seq model of the intelligent customer service robot, the text data pair in the text database is retrieved, and the first output text in the text data pair is provided to the seq2seq model.

S62: Calculate the word order fluency of the first output text through the seq2seq model, compare the word order fluency with the word order fluency of the correct text, and determine the comparison result;

The seq2seq model converts the first input text into multiple first output texts. All the first output texts converted from the seq2seq model constitute an output text set. Then calculate the word order fluency of each of the first output texts. Compare the word order fluency of each first output text with the corresponding word order fluency of the correct text.

S63: According to the comparison result, judge whether the seq2seq model has converged;

According to the word order fluency comparison result obtained in step S62, it is determined whether the seq2seq model has converged.

S64: If the seq2seq model does not converge, the first output text with word order fluency not less than the correct text is used as the second output text, and the second output text is segmented and input into the seq2seq model to continue Calculate the word order fluency of the segmented second output text until the seq2seq model converges to obtain a trained seq2seq model.

When the seq2seq model does not converge, the first output text in the output text set whose word order fluency is greater than or equal to the correct text is provided as the second input text to the seq2seq model, and then the The seq2seq model converts the second input text into a plurality of second output texts whose word order fluency is less than the correct text. The plurality of second output texts whose word order fluency is less than the correct text form a new output text set. Each second output text and the correct text form a new text data pair, and are stored in the text database. Convert the second input text with a word order fluency greater than or equal to the correct text into multiple second output texts with a word order fluency less than the correct text, and provide the seq2seq model for text generation training The process loops until it is determined that the seq2seq model has converged. After the seq2seq model converges, the loop will end, and the second input text will be stopped from being provided to the seq2seq model.

The following will list examples to illustrate the technical solutions in the embodiments of the present application.

Table 1

Please refer to Table 1. In Table 1, the first input text provided to the seq2seq model of the intelligent customer service robot is "the sun rises from the east", and the corresponding correct text is "the sun rises from the east". The seq2seq model of the intelligent customer service robot converts the first input text "rising from the east sun" into multiple first output texts. Table 1 only shows a number of possible first output texts, not all possible first output texts of the first input text "Rising from the East Sun" after conversion by the seq2seq model.

It is assumed that the correctness of the correct text "the sun rises from the east" is 1. The word order fluency of each first output text shown in Table 1 is less than 1, so there is a certain language disorder. The word order fluency of each first output text shown in Table 1 is less than 1. All the first output texts shown in Table 1 are paired with the correct texts to form text data pairs, and stored in the text database. At this time, all the first output text shown in Table 1 is converted into the first input text, and provided to the seq2seq model of the intelligent customer service robot for the next round of text generation training.

When the input text column in Table 1 has more first input text, more first output text will be obtained. In the next round of text generation training of the seq2seq model of the intelligent customer service robot, the text database will be able to provide multiple times of the first input text to the seq2seq model of the intelligent customer service robot. Therefore, the seq2seq model of the intelligent customer service robot will be able to automatically increase the first input text during the training process, so that the text data is enhanced, which is beneficial to overcome the problem of insufficient input text data and reduces the amount of first input text that meets the requirements. difficult.

Form 2

Please refer to Table 2. In Table 2, the first input text of the seq2seq model provided to the intelligent customer service robot is "Guo Zu I You Love", and the corresponding correct text is "Mother I Love You". The seq2seq model of the intelligent customer service robot converts the first input text "Guo Zu I You Love" into multiple first output texts. Table 2 only shows a number of possible first output texts, not all possible first output texts of the first input text "Guo Zu I You Love" after conversion by the seq2seq model.

It is assumed that the correct order of the correct text "I love you in the motherland" is 1. The word order fluency of the part of the first output text in Table 2 is less than 1, indicating that there is a linguistic disorder in the part of the first output text. In addition, in Table 2, the word order fluency of the first output text "I love your motherland" is greater than 1, so there is no linguistic disorder in the first output text. The first output text "I love your motherland" is provided as the second input text to the seq2seq model of the intelligent customer service robot. The seq2seq model of the intelligent customer service robot will erroneously train the second input text "I love your motherland" and convert it into several second output texts with a word order fluency of less than 1. Then, a plurality of second output texts obtained by error training and having the second order text with a fluency of less than 1 and the correct text form text data pairs and stored in the text database. In the next round of text generation training of the seq2seq model of the intelligent customer service robot, several second output texts with less than 1 word order fluency obtained from wrong training are retrieved from the text database and provided to the intelligent The seq2seq model of the customer service robot is trained. The above method for incorrectly training the second input text to obtain several second output texts with a word order fluency of less than 1 can also automatically increase the amount of text data, play a role in enhancing text data, and help to further overcome the lack of input text data The problem of reducing the difficulty of obtaining the first input text that meets the requirements.

It should be noted that although the value of the word order fluency in Table 1 and Table 2 is positive, in some possible embodiments of the present application, the value of the word order fluency may also be a negative value.

An embodiment of the present application discloses a text data enhancement device based on artificial intelligence.

Referring to FIG. 5, it is a schematic diagram of an artificial intelligence-based text data enhancement device according to an embodiment of the present application.

As illustrated in FIG. 5, the text data enhancement device based on artificial intelligence includes:

The text training module 10 is used to provide the first input text in the text database to the text generation model, and the text generation model converts the first input text into at least one first output text;

The word order fluency calculation module 20 is used to calculate the word order fluency of the first output text;

Word order fluency comparison module 30, used to compare the word order fluency of the first output text with the word order fluency of the correct text;

The input text increment module 40 is configured to provide the first output text as the second input text to the text when the word sequence fluency of the first output text is greater than or equal to the word sequence fluency of the correct text Generating a model so that the text generation model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the fluency of the word sequence of the second output text is less than the Fluency of the word order of the correct text.

In some embodiments of the present application, the word order fluency calculation module 20 calculates the word order fluency of the first output text by the following formula:

among them,

In some embodiments of the present application, the word order fluency calculation module 20 obtains the language model probability through language model calculation, and the language model includes an n-gram language model and a neural probabilistic language model.

In some embodiments of the present application, the text training module 10 composes the first output text and the correct text into a text data pair, and sets the first output text in the text data pair as the The second input text is provided to the text generation model.

In some embodiments of the present application, the text training module 10 performs error training on the second input text through the text generation model so that the word order fluency of the second output text is less than the word order of the correct text Fluency.

In some embodiments of the present application, the input text increment module 40 is further configured to provide the first output text when the word order fluency of the first output text is less than the word order fluency of the correct text Generate a model for the text.

In some embodiments of the present application, when the text generation model converges, the text training module 10 stops providing the text input model with the first input text and the second input text.

An embodiment of the present application discloses a computer device. For details, please refer to FIG. 6, which is a basic structural block diagram of the computer device 100 in an embodiment of the present application.

As illustrated in FIG. 6, the computer device 100 includes a memory 101, a processor 102, and a network interface 103 that communicate with each other through a system bus. It should be noted that FIG. 6 only shows the computer device 100 having the components 101-103, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Those skilled in the art should understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touchpad, or a voice control device.

The memory 101 includes one or more computer-readable storage media storing computer-readable instructions, the computer-readable storage media including a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), Random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk Wait. In some embodiments, the memory 101 may be an internal storage unit of the computer device 100, such as a hard disk or a memory of the computer device 100. In other embodiments, the memory 101 may also be an external storage device of the computer device 100, for example, a plug-in hard disk equipped on the computer device 100, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 101 may also include both the internal storage unit of the computer device 100 and its external storage device. In this embodiment, the memory 101 is generally used to store an operating system and various application software installed on the computer device 100, such as the computer-readable instructions of the artificial intelligence-based text data enhancement method described above. In addition, the memory 101 may also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 102 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 102 is generally used to control the overall operation of the computer device 100. In this embodiment, the processor 102 is configured to execute computer-readable instructions or process data stored in the memory 101, for example, computer-readable instructions to execute the above artificial intelligence-based text data enhancement method.

The network interface 103 may include a wireless network interface or a wired network interface, and the network interface 103 is generally used to establish a communication connection between the computer device 100 and other electronic devices.

The present application also provides another implementation manner, that is, to provide a computer-readable storage medium that stores computer-readable instructions corresponding to entry of document information, and computer-readable instructions corresponding to entry of document information The instructions may be executed by at least one processor, so that the at least one processor executes the steps of any of the above artificial intelligence-based text data enhancement methods.

Finally, it should be noted that the above-described embodiments are obviously only a part of the embodiments of the present application, but not all the embodiments. The preferred embodiments of the present application are given in the drawings, but do not limit the patents of the present application range. The present application can be implemented in many different forms. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, they can still modify the technical solutions described in the foregoing specific embodiments, or equivalently replace some of the technical features . Any equivalent structure made by using the description and drawings of this application, directly or indirectly used in other related technical fields, is also within the scope of patent protection of this application.

Claims

A text data enhancement method based on artificial intelligence, which is characterized by including:

Providing the first input text in the text database to a text generation model, and converting the first input text into at least one first output text by the text generation model;

Calculating the word order fluency of the first output text;

Compare the word order fluency of the first output text with the word order fluency of the correct text;

When the word order fluency of the first output text is greater than or equal to the word order fluency of the correct text, the first output text is provided as a second input text to the text generation model, so that the text is generated The model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the word order fluency of the second output text is less than the word order fluency of the correct text.
The artificial intelligence-based text data enhancement method according to claim 1, wherein the calculating the word order fluency of the first output text comprises:

among them,

f(x) represents the fluency of the word order; P(x i |x<i) refers to the language of the following P(x i ) given the above of the first output text Model probability.
The artificial intelligence-based text data enhancement method according to claim 2, wherein the language model probability is obtained by calculating a language model, and the language model includes an n-gram language model and a neural probabilistic language model.
The artificial intelligence-based text data enhancement method according to claim 1, wherein the step of providing the first output text as the second input text to the text generation model includes: The output text and the correct text form a text data pair, and the first output text in the text data pair is provided to the text generation model as the second input text.
The artificial intelligence-based text data enhancement method according to claim 1, wherein the second input text is erroneously trained by the text generation model so that the second order text is fluent in word order State the fluency of the word order of the correct text.
The artificial intelligence-based text data enhancement method according to claim 1, wherein the artificial intelligence-based text data enhancement method further comprises:

Acquiring text data pairs in a preset text database, and inputting the text data pairs into the seq2seq model, wherein the text data pairs include the first output text;

Calculating the word order fluency of the first output text through the seq2seq model, comparing the word order fluency with the word order fluency of the correct text, and determining the comparison result;

According to the comparison result, determine whether the seq2seq model has converged;

If the seq2seq model does not converge, return to the step of calculating the word order fluency of the first output text by using the seq2seq model until the seq2seq model converges to obtain the trained seq2seq model.
The artificial intelligence-based text data enhancement method according to claim 1, wherein the artificial intelligence-based text data enhancement method further comprises:

Acquiring text data pairs in a preset text database, and inputting the text data pairs into the seq2seq model, wherein the text data pairs include the first output text;

Calculating the word order fluency of the first output text through the seq2seq model, comparing the word order fluency with the word order fluency of the correct text, and determining the comparison result;

According to the comparison result, determine whether the seq2seq model has converged;

If the seq2seq model does not converge, the first output text whose word order fluency is not less than the word order fluency of the correct text is used as the second output text, and the second output text is segmented and input to all In the seq2seq model, continue to calculate the word order fluency of the segmented second output text until the seq2seq model converges to obtain a trained seq2seq model.
A text data enhancement device based on artificial intelligence, which is characterized by comprising:

The text training module is used to provide the first input text in the text database to the text generation model, and the text generation model converts the first input text into at least one first output text;

A word order fluency calculation module, used to calculate the word order fluency of the first output text;

Word order fluency comparison module, used to compare the word order fluency of the first output text with the word order fluency of the correct text;

The input text increment module is used to provide the first output text as the second input text to the text generation when the word sequence fluency of the first output text is greater than or equal to the word sequence fluency of the correct text A model so that the text generation model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the fluency of the word sequence of the second output text is less than the correct Fluency of the word order of the text.
The artificial intelligence-based text data enhancement device according to claim 8, wherein the input text increment module includes:

A text matching unit, configured to form a text data pair of the first output text and the correct text, and provide the first output text in the text data pair as the second input text to the text Generate the model.
The text data enhancement device based on artificial intelligence according to claim 8, further comprising:

The text selection module is configured to provide the first output text to the text generation model when the word order fluency of the first output text is less than the word order fluency of the correct text.
A computer device, including a memory and a processor, characterized in that the memory stores computer readable instructions executable on the processor. When the processor executes the computer readable instructions, the following implementation is based on manual The steps of the intelligent text data enhancement method:

Providing the first input text in the text database to a text generation model, and converting the first input text into at least one first output text by the text generation model;

Calculating the word order fluency of the first output text;

Compare the word order fluency of the first output text with the word order fluency of the correct text;

When the word order fluency of the first output text is greater than or equal to the word order fluency of the correct text, the first output text is provided as a second input text to the text generation model, so that the text is generated The model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the word order fluency of the second output text is less than the word order fluency of the correct text.
The computer device according to claim 11, wherein calculating the word order fluency of the first output text comprises:

among them,

f(x) represents the fluency of the word order; P(x i |x<i) refers to the language of the following P(x i ) given the above of the first output text Model probability.
The computer device according to claim 12, wherein the language model probability is obtained by calculating a language model, and the language model includes an n-gram language model and a neural probabilistic language model.
The computer device according to claim 11, wherein the step of providing the first output text as the second input text to the text generation model includes: combining the first output text with the correct The text forms a text data pair, and the first output text in the text data pair is provided to the text generation model as the second input text.
The computer device according to claim 11, wherein the second input text is erroneously trained by the text generation model so that the second-order text has a smoother word order than the correct text degree.
A computer-readable storage medium, characterized in that, when the computer-readable instructions are executed by one or more processors, the one or more processes perform the following steps:

Providing the first input text in the text database to a text generation model, and converting the first input text into at least one first output text by the text generation model;

Calculating the word order fluency of the first output text;

Compare the word order fluency of the first output text with the word order fluency of the correct text;

When the word order fluency of the first output text is greater than or equal to the word order fluency of the correct text, the first output text is provided as a second input text to the text generation model, so that the text is generated The model converts the second input text into at least one second output text until the text generation model meets a preset condition, and the word order fluency of the second output text is less than the word order fluency of the correct text.
The computer-readable storage medium of claim 16, wherein calculating the word order fluency of the first output text comprises:

among them,

f(x) represents the fluency of the word order; P(x i |x<i) refers to the language of the following P(x i ) given the above of the first output text Model probability.
The computer-readable storage medium according to claim 17, wherein the language model probability is obtained by calculating a language model, and the language model includes an n-gram language model and a neural probabilistic language model.
The computer-readable storage medium according to claim 16, wherein the step of providing the first output text as the second input text to the text generation model includes: The correct text forms a text data pair, and the first output text in the text data pair is provided to the text generation model as the second input text.
The computer-readable storage medium according to claim 16, wherein the second input text is erroneously trained by the text generation model so that the second order text has a smoother word order than the correct text Fluency of word order.