CN114781358A

CN114781358A - Text error correction method, device, device and storage medium based on reinforcement learning

Info

Publication number: CN114781358A
Application number: CN202210404810.8A
Authority: CN
Inventors: 王伟; 黄勇其; 张黔
Original assignee: Runlian Software System Shenzhen Co Ltd
Current assignee: Runlian Software System Shenzhen Co Ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-07-22

Abstract

The present application discloses a text error correction method, device, device and storage medium based on reinforcement learning, belonging to the technical field of natural language processing. This application uses the masked training corpus to construct a first training sample by masking the similarities of pronunciation and glyphs of the text in the training corpus, importing the first training sample into a pre-training language model, and outputting the error correction result of the first pre-training text , adjust the first training sample to generate a second training sample, use the second training sample to iteratively train the pre-trained language model to obtain a text error correction model, and finally import the text to be corrected into the text error correction model, and output the text correction model. wrong result. This application introduces pronunciation information and glyph information when training the error correction model, constructs training samples with rich noise through similar pronunciation masking and similar glyph masking, and further trains the error correction model through reinforcement learning technology, so that the model can better recognize spelling Error, generalization performance is stronger.

Description

Text error correction method, device, device and storage medium based on reinforcement learning

技术领域technical field

本申请属于自然语言处理技术领域，具体涉及一种基于强化学习的文本纠错方法、装置、设备及存储介质。The present application belongs to the technical field of natural language processing, and specifically relates to a text error correction method, apparatus, device and storage medium based on reinforcement learning.

背景技术Background technique

文本纠错的目的是检测和纠正文本中的拼写错误，是自然语言处理中的重要任务，在信息检索、智能写作、智能客服等诸多领域都有应用。传统文本纠错采取的方法多为：设定包含大量容易被拼写错误汉字的混淆文字集，当待纠错的文档中文字与混淆文字集中文字匹配时，用混淆文字集中的文字替换，但这种机械地匹配方式未考虑上下文语义，效果不甚理想。The purpose of text error correction is to detect and correct spelling errors in text. It is an important task in natural language processing and has applications in many fields such as information retrieval, intelligent writing, and intelligent customer service. Most of the methods used in traditional text error correction are as follows: set up a confused text set containing a large number of Chinese characters that are easily misspelled, when the text in the document to be corrected matches the text in the obfuscated text set, replace it with the text in the obfuscated text set, but this is not the case. This mechanical matching method does not consider contextual semantics, and the effect is not ideal.

近年来，有研究者基于已有预训练语言模型Bert+微调方式进行文本纠错。在实施中，将容易出错的文字用错误文字进行遮掩(mask)替换来构造负训练样本，例如在样本“中国是一个拥有5000年历史的文明古国”中，将“历史”替换为“厉史”，然后在对预训练语言模型进行微调训练，以此识别错误文字并纠正。但由于预训练语言模型在训练时并非专门为文字纠错任务而设计，因此，微调后地模型中也很少能学习到如何进行纠错的知识，实践效果仍有较大提升空间。In recent years, some researchers have carried out text error correction based on the existing pre-trained language model Bert + fine-tuning method. In the implementation, the error-prone text is masked with the wrong text to construct a negative training sample. For example, in the sample "China is an ancient civilization with a history of 5,000 years", replace "history" with "Li Shi" ”, and then fine-tune the pre-trained language model to identify and correct erroneous text. However, since the pre-trained language model is not specially designed for text error correction tasks during training, the knowledge of how to correct errors is rarely learned in the fine-tuned model, and there is still much room for improvement in the practical effect.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的在于提出一种基于强化学习的文本纠错方法、装置、计算机设备及存储介质，以解决现有的对预训练语言模型存在的文本纠错效果较差的技术问题。The purpose of the embodiments of the present application is to propose a text error correction method, device, computer equipment and storage medium based on reinforcement learning, so as to solve the existing technical problem of poor text error correction effect on pre-trained language models.

为了解决上述技术问题，本申请实施例提供一种基于强化学习的文本纠错方法，采用了如下所述的技术方案：In order to solve the above technical problems, the embodiment of the present application provides a text error correction method based on reinforcement learning, which adopts the following technical solutions:

一种基于强化学习的文本纠错方法，包括：A reinforcement learning-based text error correction method, including:

收集训练语料，并按照预设的文本遮掩比例对训练语料中的文本进行发音相似遮掩和字形相似遮掩；Collect training corpus, and mask the text in the training corpus according to the preset text masking ratio for pronunciation similarity masking and glyph similarity masking;

利用遮掩后的训练语料构建第一训练样本；Use the masked training corpus to construct a first training sample;

对第一训练样本进行向量转化，得到第一样本嵌入向量；Perform vector transformation on the first training sample to obtain the first sample embedding vector;

将第一样本嵌入向量导入预训练语言模型，输出第一预训练文本纠错结果；Import the first sample embedding vector into the pre-training language model, and output the error correction result of the first pre-training text;

基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本；Adjusting the first training sample based on the error correction result of the first pre-training text to generate a second training sample;

利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型；Use the second training sample to iteratively train the pre-trained language model to obtain a text error correction model;

接收文本纠错指令，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。Receive the text error correction instruction, obtain the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result.

进一步地，收集训练语料，并按照预设的文本遮掩比例对训练语料中的文本进行发音相似遮掩和字形相似遮掩的步骤，具体包括：Further, collect the training corpus, and perform the steps of masking similar pronunciation and masking the text in the training corpus according to the preset text masking ratio, which specifically includes:

收集训练语料，并对训练语料进行划分，得到若干个训练语料片段；Collect training corpus, and divide the training corpus to obtain several training corpus segments;

按照预设的文本遮掩比例在训练语料片段中的确定目标遮掩文本；Determine the target masking text in the training corpus segment according to the preset text masking ratio;

在预设的文本混淆集中确定与目标遮掩文本对应的发音相似文本和字形相似文本；Determine the pronunciation-similar text and glyph-similar text corresponding to the target masked text in the preset text confusion set;

基于发音相似文本和字形相似文本对目标遮掩文本进行发音相似遮掩和字形相似遮掩。Pronunciation similarity masking and glyph similarity masking are performed on the target masked text based on the pronunciation similar text and the glyph similarity text.

进一步地，利用遮掩后的训练语料构建第一训练样本的步骤，具体包括：Further, the step of constructing the first training sample using the masked training corpus specifically includes:

组合完成发音相似遮掩的训练语料片段和完成字形相似遮掩的训练语料片段，形成第一训练样本。A first training sample is formed by combining the training corpus segments with similar masking of pronunciation and the training corpus segments with similar masking of glyphs.

从预设词汇表获取随机文本，并利用随机文本对目标遮掩文本进行文本随机遮掩；Obtain random text from a preset vocabulary, and use random text to perform random text masking on the target masked text;

组合完成发音相似遮掩的训练语料片段、完成字形相似遮掩的训练语料片段、完成文本随机遮掩的训练语料片段以及未进行文本遮掩的训练语料片段，形成第一训练样本。A first training sample is formed by combining the training corpus segments with similar pronunciation masking, the training corpus segments with similar glyph masking, the training corpus segments with random text masking, and the training corpus segments without text masking.

进一步地，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本的步骤，具体包括：Further, the step of adjusting the first training sample based on the error correction result of the first pre-training text to generate the second training sample specifically includes:

基于第一预训练文本纠错结果计算第一训练样本中各个训练语料片段的行动价值得分；Calculate the action value score of each training corpus segment in the first training sample based on the error correction result of the first pre-training text;

基于各个训练语料片段的行动价值得分调整第一训练样本中各个训练语料片段的占比，得到第二训练样本。Based on the action value score of each training corpus segment, the proportion of each training corpus segment in the first training sample is adjusted to obtain a second training sample.

进一步地，基于第一预训练文本纠错结果计算第一训练样本中各个训练语料片段的行动价值得分的步骤，具体包括：Further, the step of calculating the action value score of each training corpus segment in the first training sample based on the error correction result of the first pre-training text specifically includes:

基于第一预训练文本纠错结果确定预训练语言模型的F1值；Determine the F1 value of the pre-trained language model based on the error correction result of the first pre-trained text;

获取预训练语言模型的训练开销；Get the training cost of the pre-trained language model;

基于预训练语言模型的F1值和训练开销计算第一训练样本中各个训练语料片段的行动价值得分。The action value score of each training corpus segment in the first training sample is calculated based on the F1 value of the pre-trained language model and the training cost.

进一步地，基于各个训练语料片段的行动价值得分调整第一训练样本中各个训练语料片段的占比，得到第二训练样本的步骤，具体包括：Further, the steps of adjusting the proportion of each training corpus segment in the first training sample based on the action value score of each training corpus segment to obtain the second training sample specifically include:

判断各个训练语料片段的行动价值得分正负值；Determine the positive or negative value of the action value score of each training corpus segment;

当行动价值得分为正数时，上调行动价值得分对应的训练语料片段的占比；When the action value score is a positive number, the proportion of training corpus segments corresponding to the action value score is increased;

当行动价值得分为负数时，下调行动价值得分对应的训练语料片段的占比。When the action value score is negative, the proportion of training corpus segments corresponding to the action value score is lowered.

进一步地，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型的步骤，具体包括：Further, the steps of using the second training sample to iteratively train the pre-trained language model to obtain a text error correction model specifically include:

将第二训练样本导入预训练语言模型，得到第二预训练文本纠错结果；Import the second training sample into the pre-training language model to obtain the error correction result of the second pre-training text;

基于第二预训练文本纠错结果计算第二训练样本中各个训练语料片段的行动价值得分；Calculate the action value score of each training corpus segment in the second training sample based on the error correction result of the second pre-training text;

对第二训练样本中各个训练语料片段的行动价值得分进行求和，得到最终行动价值得分；Summing the action value scores of each training corpus segment in the second training sample to obtain the final action value score;

基于最终行动价值得分对预训练语言模型进行迭代训练，得到文本纠错模型。Based on the final action value score, the pre-trained language model is iteratively trained to obtain a text error correction model.

进一步地，第一样本嵌入向量包括文本嵌入向量、位置嵌入向量、发音嵌入向量和字形嵌入向量，对第一训练样本进行向量转化，得到第一样本嵌入向量的步骤，具体包括：Further, the first sample embedding vector includes a text embedding vector, a position embedding vector, a pronunciation embedding vector and a glyph embedding vector, and the steps of performing vector transformation on the first training sample to obtain the first sample embedding vector specifically include:

对第一训练样本进行特征提取，得到文本特征、位置特征、发音特征和字形特征；Perform feature extraction on the first training sample to obtain text features, position features, pronunciation features and glyph features;

分别对文本特征、位置特征、发音特征和字形特征进行向量转化，得到文本嵌入向量、位置嵌入向量、发音嵌入向量和字形嵌入向量。The text feature, position feature, pronunciation feature and glyph feature are vectorized respectively, and the text embedding vector, position embedding vector, pronunciation embedding vector and glyph embedding vector are obtained.

为了解决上述技术问题，本申请实施例还提供一种基于强化学习的文本纠错装置，采用了如下所述的技术方案：In order to solve the above technical problems, the embodiment of the present application also provides a text error correction device based on reinforcement learning, which adopts the following technical solutions:

一种基于强化学习的文本纠错装置，包括：A text error correction device based on reinforcement learning, comprising:

文本遮掩模块，用于收集训练语料，并按照预设的文本遮掩比例对训练语料中的文本进行发音相似遮掩和字形相似遮掩；The text masking module is used to collect training corpus, and mask the text in the training corpus according to the preset text masking ratio for pronunciation similarity masking and glyph similarity masking;

样本构建模块，用于利用遮掩后的训练语料构建第一训练样本；The sample building module is used to construct the first training sample by using the masked training corpus;

向量转化模块，用于对第一训练样本进行向量转化，得到第一样本嵌入向量；a vector transformation module, which is used to perform vector transformation on the first training sample to obtain the first sample embedding vector;

预训练模块，用于将第一样本嵌入向量导入预训练语言模型，输出第一预训练文本纠错结果；The pre-training module is used to import the first sample embedding vector into the pre-training language model, and output the error correction result of the first pre-training text;

样本调整模块，用于基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本；a sample adjustment module, configured to adjust the first training sample based on the error correction result of the first pre-training text to generate a second training sample;

迭代训练模块，用于利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型；The iterative training module is used to iteratively train the pre-trained language model by using the second training sample to obtain a text error correction model;

文本纠错模块，用于接收文本纠错指令，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。The text error correction module is used to receive the text error correction instruction, obtain the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result.

为了解决上述技术问题，本申请实施例还提供一种计算机设备，采用了如下所述的技术方案：In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:

一种计算机设备，包括存储器和处理器，所述存储器中存储有计算机可读指令，所述处理器执行所述计算机可读指令时实现如上述任一项所述的基于强化学习的文本纠错方法的步骤。A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the reinforcement learning-based text error correction as described in any of the above is implemented steps of the method.

为了解决上述技术问题，本申请实施例还提供一种计算机可读存储介质，采用了如下所述的技术方案：In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:

一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机可读指令，所述计算机可读指令被处理器执行时实现如上述中任一项所述的基于强化学习的文本纠错方法的步骤。A computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the text correction based on reinforcement learning as described in any one of the above is realized; Steps of the wrong method.

与现有技术相比，本申请实施例主要有以下有益效果：Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:

本申请公开了一种基于强化学习的文本纠错方法、装置、设备及存储介质，属于自然语言处理技术领域。本申请通过收集训练语料，并对训练语料中的文本进行发音相似遮掩和字形相似遮掩，利用遮掩后的训练语料构建第一训练样本，对第一训练样本导入预训练语言模型，输出第一预训练文本纠错结果，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。本申请在训练纠错模型时引入发音信息和字形信息，通过发音相似遮掩和字形相似遮掩构造拥有丰富噪声的训练样本，并进一步通过强化学习技术训练纠错模型，使得模型能够更好地识别拼写错误，使得模型的泛化性能更强。The present application discloses a text error correction method, device, device and storage medium based on reinforcement learning, belonging to the technical field of natural language processing. This application collects training corpus, performs similar pronunciation masking and glyph similarity masking on the text in the training corpus, uses the masked training corpus to construct a first training sample, imports the first training sample into a pre-training language model, and outputs the first pre-training language model. training text error correction results, adjusting the first training sample based on the first pre-training text error correction results, generating a second training sample, using the second training sample to iteratively train the pre-training language model, obtaining a text error correction model, and obtaining For the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result. This application introduces pronunciation information and glyph information when training the error correction model, constructs training samples with rich noise through similar pronunciation masking and similar glyph masking, and further trains the error correction model through reinforcement learning technology, so that the model can better recognize spelling error, making the model more generalizable.

附图说明Description of drawings

为了更清楚地说明本申请中的方案，下面将对本申请实施例描述中所需要使用的附图作一个简单介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

图1示出了本申请可以应用于其中的示例性系统架构图；FIG. 1 shows an exemplary system architecture diagram to which the present application can be applied;

图2示出了根据本申请的基于强化学习的文本纠错方法的一个实施例的流程图；FIG. 2 shows a flowchart of an embodiment of the reinforcement learning-based text error correction method according to the present application;

图3示出了图2中步骤S201的一个实施例的流程图；Fig. 3 shows a flowchart of an embodiment of step S201 in Fig. 2;

图4示出了根据本申请的基于强化学习的文本纠错装置的一个实施例的结构示意图；FIG. 4 shows a schematic structural diagram of an embodiment of an apparatus for text error correction based on reinforcement learning according to the present application;

图5示出了基于强化学习的文本纠错装置的文本遮掩模块301的一个实施例的结构示意图；FIG. 5 shows a schematic structural diagram of an embodiment of a text masking module 301 of a text error correction device based on reinforcement learning;

图6示出了根据本申请的计算机设备的一个实施例的结构示意图。FIG. 6 shows a schematic structural diagram of an embodiment of a computer device according to the present application.

具体实施方式Detailed ways

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同；本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本申请；本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形，意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

为了使本技术领域的人员更好地理解本申请方案，下面将结合附图，对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

如图1所示，系统架构100可以包括终端设备101、102、103，网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互，以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用，例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving PictureExpertsGroup Audio Layer III，动态影像专家压缩标准音频层面3)、MP4(MovingPictureExperts Group Audio Layer IV，动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 may be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, and MP3 players (Moving Picture Experts Group Audio Layer III, moving picture experts). Compression Standard Audio Layer 3), MP4 (Moving PictureExperts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.

服务器105可以是提供各种服务的服务器，例如对终端设备101、102、103上显示的页面提供支持的后台服务器，服务器可以是独立的服务器，也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The server 105 can be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101, 102, and 103. The server can be an independent server, or can provide cloud services, cloud databases, cloud computing, Cloud servers for basic cloud computing services such as cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.

需要说明的是，本申请实施例所提供的基于强化学习的文本纠错方法一般由服务器执行，相应地，基于强化学习的文本纠错装置一般设置于服务器中。It should be noted that the text error correction method based on reinforcement learning provided by the embodiments of the present application is generally executed by a server, and accordingly, the text error correction device based on reinforcement learning is generally set in the server.

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

继续参考图2，示出了根据本申请的基于强化学习的文本纠错方法的一个实施例的流程图。本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。Continuing to refer to FIG. 2 , a flowchart of one embodiment of the reinforcement learning-based text error correction method according to the present application is shown. The embodiments of the present application may acquire and process related data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。所述的基于强化学习的文本纠错方法，包括以下步骤：The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning. The text error correction method based on reinforcement learning includes the following steps:

S201，收集训练语料，并按照预设的文本遮掩比例对训练语料中的文本进行发音相似遮掩和字形相似遮掩。S201 , collecting training corpus, and masking the text in the training corpus according to a preset text covering ratio for similar pronunciation and similar characters.

具体的，从互联网公开网站上收集新闻文章作为训练语料，将每篇新闻文章裁剪为满足预训练语言模型输入长度，例如Bert的输入长度为510个汉字，将裁剪后的训练语料用作预训练语言模型的训练样本。Specifically, news articles are collected from public Internet websites as training corpus, and each news article is cut to meet the input length of the pre-training language model. For example, Bert's input length is 510 Chinese characters, and the cut training corpus is used as pre-training Training samples for the language model.

服务器在收集训练语料后，对训练语料进行划分，得到多个训练语料片段，按照预设的文本遮掩比例确定训练语料片段中需要遮掩的文本，即目标遮掩文本，并对训练语料片段中目标遮掩文本进行发音相似遮掩和字形相似遮掩，其中，发音相似遮掩为通过与目标遮掩文本发音相似的文本对训练语料进行遮掩，例如“精jing”和“金jin”属于发音相似；字形相似遮掩为通过与目标遮掩文本字形相似的文本对训练语料进行遮掩，例如“历”和“厉”属于字形相似。After collecting the training corpus, the server divides the training corpus to obtain a plurality of training corpus segments, determines the text that needs to be covered in the training corpus segment according to the preset text masking ratio, that is, the target masked text, and masks the target in the training corpus segment. Pronunciation similarity masking and glyph similarity masking are performed on texts. Among them, pronunciation similarity masking means masking the training corpus with texts similar in pronunciation to the target masking text. For example, "Jingjing" and "Jinjin" belong to similar pronunciation; The training corpus is masked by texts with similar glyphs to the target masking text, for example, "Li" and "Li" belong to similar glyphs.

需要说明的是，相似文本可以从公开可用的文字混淆集中选择，该文字混淆集包含两种类型的类似文字，即发音相似文字和字形相似文字。It should be noted that similar texts can be selected from a publicly available text confusion set, which contains two types of similar texts, ie similar texts in pronunciation and similar in glyphs.

进一步地，请参考图3，步骤S201收集训练语料，并按照预设的文本遮掩比例对训练语料中的文本进行发音相似遮掩和字形相似遮掩的步骤，具体包括：Further, please refer to FIG. 3, step S201 collects training corpus, and performs the steps of masking similar pronunciation and masking the text in the training corpus according to a preset text masking ratio, specifically including:

S211，收集训练语料，并对训练语料进行划分，得到若干个训练语料片段；S211, collect training corpus, and divide the training corpus to obtain several pieces of training corpus;

S212，按照预设的文本遮掩比例在训练语料片段中的确定目标遮掩文本；S212, determining the target masking text in the training corpus segment according to a preset text masking ratio;

S213，在预设的文本混淆集中确定与目标遮掩文本对应的发音相似文本和字形相似文本；S213, determining in the preset text confusion set, the text similar to the pronunciation and the similar text of the glyph corresponding to the target masked text;

S214，基于发音相似文本和字形相似文本对目标遮掩文本进行发音相似遮掩和字形相似遮掩。S214 , performing pronunciation similarity masking and glyph similarity masking on the target masked text based on the pronunciation similar text and the glyph similarity text.

具体的，服务器收集训练语料，并对训练语料进行划分，得到若干个训练语料片段，然后按照预设的文本遮掩比例在训练语料片段中的确定目标遮掩文本，并在预设的文本混淆集中确定与目标遮掩文本对应的发音相似文本和字形相似文本，基于发音相似文本和字形相似文本对目标遮掩文本进行发音相似遮掩和字形相似遮掩。Specifically, the server collects the training corpus, divides the training corpus, obtains several training corpus segments, and then determines the target masking text in the training corpus segment according to the preset text masking ratio, and determines it in the preset text confusion set. Pronunciation similarity text and glyph similarity text corresponding to the target masking text, based on the pronunciation similarity text and the glyph similarity text, the pronunciation similarity masking and the glyph similarity masking are performed on the target mask text.

在本申请一种具体的实施例中，服务器将训练语料划分为2个训练语料片段，选取其中一个训练语料片段来完成发音相似遮掩，选取另一个训练语料片段来完成字形相似遮掩。In a specific embodiment of the present application, the server divides the training corpus into two training corpus segments, selects one of the training corpus segments to complete pronunciation similarity concealment, and selects another training corpus segment to complete glyph similarity concealment.

在上述实施例中，本申请通过对训练语料进行划分，并对划分出来的训练语料片段进行发音相似遮掩和字形相似遮掩，构造了拥有丰富噪声的训练样本。In the above-mentioned embodiment, the present application constructs training samples with rich noise by dividing the training corpus, and masking the similarities of pronunciation and glyphs on the divided training corpus segments.

S202，利用遮掩后的训练语料构建第一训练样本。S202, using the masked training corpus to construct a first training sample.

具体的，服务器利用遮掩后的训练语料构建第一训练样本，通过对训练语料进行不同的文本遮掩，以构造拥有丰富噪声的训练样本，使得训练出来的模型能够更好地识别拼写错误。Specifically, the server constructs the first training sample by using the masked training corpus, and constructs a training sample with rich noise by masking the training corpus with different texts, so that the trained model can better identify spelling errors.

具体的，在上述实施例中，通过组合完成发音相似遮掩的训练语料片段和完成字形相似遮掩的训练语料片段，形成第一训练样本。Specifically, in the above-mentioned embodiment, the first training sample is formed by combining the training corpus segments for which the similar pronunciation masking is completed and the training corpus segments for which the similar glyph masking is completed.

具体的，服务器从预设词汇表获取随机文本，并利用随机文本对目标遮掩文本进行文本随机遮掩，然后组合完成发音相似遮掩的训练语料片段、完成字形相似遮掩的训练语料片段、完成文本随机遮掩的训练语料片段以及未进行文本遮掩的训练语料片段，形成第一训练样本，通过组合上述训练语料片段，以获得具有更多噪声的训练样本。Specifically, the server obtains random text from a preset vocabulary, and uses the random text to perform random text masking on the target masked text, and then combines the training corpus segments with similar pronunciation masking, the training corpus segments with similar glyph masking, and the random text masking. The training corpus segment and the training corpus segment without text masking form the first training sample, and the training corpus segment with more noise can be obtained by combining the above training corpus segments.

在本申请另一种具体的实施例中，设用于预训练语言模型训练的语料数量为batch_size，对训练语料进行，每次随机选择batch_size中a％的文字进行遮掩，对选中的文字，有4种遮掩方式，具体操作如下：In another specific embodiment of the present application, the number of corpora used for training the pre-training language model is set to batch_size, and for the training corpus, a% of the characters in the batch_size are randomly selected for masking each time, and for the selected characters, there are 4 masking methods, the specific operations are as follows:

(1)选择a1％比例的语料完成发音相似遮掩；(2)选择a2％比例的语料完成字形相似遮掩；(3)选择a3％比例的语料不做文本遮掩；(4)选择a4％的比例的语料从词汇表中随机文本进行文本随机遮掩。其中，词汇表包含了常见的文字文本(如包含汉字和非汉字字符)，预先构建的词汇表通常包含数万个字符。(1) Select a1% corpus to complete pronunciation similarity masking; (2) select a2% corpus to complete glyph similarity masking; (3) select a3% corpus without text masking; (4) select a4% The corpus performs random text masking from random text in the vocabulary. Among them, the vocabulary contains common text (such as containing Chinese characters and non-Chinese characters), and the pre-built vocabulary usually contains tens of thousands of characters.

例如，收集10000条语料用于预训练语言模型的训练，其中，选择2500条语料进行发音相似遮掩，选择2500条语料进行字形相似遮掩，选择2500条语料不做处理，选择2500条语料进行文本随机遮掩，然后将上述语料组合，以构建第一训练样本。For example, 10,000 corpora are collected for the training of the pre-trained language model. Among them, 2,500 corpora are selected for pronunciation similarity concealment, 2,500 corpora are selected for glyph similarity concealment, 2,500 corpora are selected for no processing, and 2,500 corpora are selected for text randomization. Masking, and then combining the above corpora to construct the first training sample.

S203，对第一训练样本进行向量转化，得到第一样本嵌入向量。S203: Perform vector transformation on the first training sample to obtain a first sample embedding vector.

具体的，通过对第一训练样本进行向量转化，得到预训练语言模型的输入向量，即第一样本嵌入向量。Specifically, by performing vector transformation on the first training sample, the input vector of the pre-trained language model, that is, the first sample embedding vector, is obtained.

具体的，文本嵌入向量表征训练样本的文本特征，文本嵌入向量可以通过查找预设的字向量嵌入表获得，其中词汇表和字向量嵌入表具有相同维度。位置嵌入向量表征训练样本的位置特征，位置嵌入向量通过transformer模型中的正余弦函数得到。发音嵌入向量表征训练样本的发音特征，在中文领域，发音(又称拼音)是一个小写字母序列，代表一个字符的发音，实践中可使用UnihanDatabase3来获得字符-发音映射关系，将代表每个字符发音的字母送到LSTM网络，得到发音嵌入向量。字形嵌入向量表征训练样本的字形特征，使用笔画顺序来表示字符的形状，该笔画顺序指示汉字笔画的书写顺序，笔划是书写工具在书写表面上的运动，利用Chaizi数据库获取笔画数据，为了建模字符之间的视觉关系，将每个字符的笔画顺序馈送到另一个LSTM网络中，以生成字形嵌入向量。Specifically, the text embedding vector represents the text features of the training samples, and the text embedding vector can be obtained by searching a preset word vector embedding table, wherein the vocabulary table and the word vector embedding table have the same dimension. The position embedding vector represents the position feature of the training sample, and the position embedding vector is obtained by the sine and cosine function in the transformer model. The pronunciation embedding vector represents the pronunciation characteristics of the training samples. In the Chinese field, pronunciation (also called pinyin) is a sequence of lowercase letters, representing the pronunciation of a character. In practice, UnihanDatabase3 can be used to obtain the character-pronunciation mapping relationship, which will represent each character. Pronounced letters are sent to the LSTM network to get the pronunciation embedding vector. The glyph embedding vector represents the glyph features of the training samples, and the shape of the character is represented by the stroke order, which indicates the writing order of the strokes of Chinese characters. The stroke is the movement of the writing tool on the writing surface. The stroke data is obtained using the Chaizi database. In order to model The visual relationship between characters, the stroke order of each character is fed into another LSTM network to generate a glyph embedding vector.

S204，将第一样本嵌入向量导入预训练语言模型，输出第一预训练文本纠错结果。S204, import the first sample embedding vector into the pre-training language model, and output the error correction result of the first pre-training text.

具体的，将第一样本嵌入向量导入预训练语言模型，通过预训练语言模型对第一样本嵌入向量进行处理，预测样本中的错误文本，并对错误文本进行纠正，输出纠正后的文本，即第一预训练文本纠错结果。需要说明的是，预训练语言模型可以是Bert模型、AlBert模型等本领域应用较广泛的预训练语言模型，本申请对此并不做限定。Specifically, the first sample embedding vector is imported into the pre-training language model, the first sample embedding vector is processed by the pre-training language model, the wrong text in the sample is predicted, the wrong text is corrected, and the corrected text is output , that is, the error correction result of the first pre-training text. It should be noted that the pre-trained language model may be a pre-trained language model widely used in the art, such as a Bert model and an AlBert model, which is not limited in this application.

在本申请具体的实施例中，给定一个训练样本，训练样本中第i个位置所预测的字符错误概率为：In a specific embodiment of the present application, given a training sample, the predicted error probability of characters at the ith position in the training sample is:

P(y_i＝j|X)＝softmax(Wh_i+b)[j]P(y _i =j|X)=softmax(Wh _i +b)[j]

其中，P(y_i＝j|X)是真实字符被预测为词汇表中第j个字符的条件概率，X表示给定的语料，h_i表示预训练语言模型的输出向量，W为预训练参数，b为神经网络中的偏置量。本申请采用有监督学习的方式训练模型，通过训练使得预训练语言模型尽可能学习到正确汉字出现的规律，从而在特定位置将错别字恢复为正确的汉字。where P( _yi = j|X) is the conditional probability that the real character is predicted to be the _jth character in the vocabulary, X is the given corpus, hi is the output vector of the pre-trained language model, and W is the pre-training parameter, b is the bias in the neural network. The present application uses a supervised learning method to train the model, so that the pre-trained language model can learn the regularity of the appearance of correct Chinese characters as much as possible through training, so that the typos are restored to the correct Chinese characters at a specific position.

S205，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本。S205: Adjust the first training sample based on the error correction result of the first pre-training text to generate a second training sample.

具体的，在本申请中，将第一样本嵌入向量导入预训练语言模型后，得到第一预训练文本纠错结果，第一预训练文本纠错结果为纠正后的文本，通过第一预训练文本纠错结果和训练语料确定纠错效果，然后根据纠错效果调整第一训练样本中各个训练语料片段的占比，以提高对于模型训练有益的的样本的比例和降低对于模型训练无益的的样本的比例，得到新的训练样本集合，进一步提高预训练语言模型的纠错能力。Specifically, in this application, after the first sample embedding vector is imported into the pre-training language model, the first pre-training text error correction result is obtained, and the first pre-training text error correction result is the corrected text. The error correction result of the training text and the training corpus determine the error correction effect, and then adjust the proportion of each training corpus segment in the first training sample according to the error correction effect, so as to increase the proportion of samples that are beneficial to model training and reduce the proportion of samples that are not beneficial to model training. The proportion of samples obtained is a new training sample set, which further improves the error correction ability of the pre-trained language model.

具体的，通过比对第一预训练文本纠错结果和训练语料确定已纠错文本的数量，得到预训练语言模型的纠错效果，根据纠错效果计算第一训练样本中各个训练语料片段的行动价值得分，基于各个训练语料片段的行动价值得分调整第一训练样本中各个训练语料片段的占比，得到第二训练样本，通过第二训练样本对对预训练语言模型进行迭代训练，进一步提高预训练语言模型的纠错能力。Specifically, by comparing the error correction result of the first pre-training text with the training corpus to determine the number of error-corrected texts, the error correction effect of the pre-training language model is obtained, and the error correction effect of each training corpus segment in the first training sample is calculated according to the error correction effect. Action value score, adjust the proportion of each training corpus segment in the first training sample based on the action value score of each training corpus segment, obtain a second training sample, and iteratively train the pre-trained language model through the second training sample to further improve Error correction capabilities of pretrained language models.

其中，F1值，也称F1分数(Score)，又称平衡F分数(balanced F Score)，F1值被定义为精确率和召回率的调和平均数，F1值可以看作是模型精确率和召回率的一种调和平均，它的最大值是1，最小值是0，来评价二分类模型的分析效果。Among them, F1 value, also known as F1 score (Score), also known as balanced F score (balanced F Score), F1 value is defined as the harmonic mean of precision and recall, F1 value can be regarded as model precision and recall A harmonic average of the rate, its maximum value is 1 and the minimum value is 0, to evaluate the analysis effect of the binary classification model.

具体的，第一预训练文本纠错结果确定预训练语言模型的F1值，获取预训练语言模型的训练开销，基于预训练语言模型的F1值和训练开销计算第一训练样本中各个训练语料片段的行动价值得分，其中，行动价值得分计算公式如下：Specifically, the first pre-trained text error correction result determines the F1 value of the pre-trained language model, obtains the training cost of the pre-trained language model, and calculates each training corpus segment in the first training sample based on the F1 value of the pre-trained language model and the training cost The action value score of , where the action value score is calculated as follows:

式中，Sⁱ为第i次预训练的行动价值得分，F1_i为第i次预训练的F1值，T_Costi为第i次预训练的训练开销。需要说明的是，行动价值得分存在正负值，若一轮训练后得到F1值较上一轮训练后的F1值提高，则该次训练对模型拟合有益，其行动价值得分为

若较上一轮训练后的F1值降低，则该次训练对模型拟合无益，其行动价值得分为

In the formula, S ⁱ is the action value score of the ith pre-training, F1 _i is the F1 value of the ith pre-training, and T _Costi is the training cost of the ith pre-training. It should be noted that there are positive and negative values for the action value score. If the F1 value obtained after one round of training is higher than the F1 value after the previous round of training, this training is beneficial to the model fitting, and its action value score is

If it is lower than the F1 value after the previous round of training, this training is not beneficial to the model fitting, and its action value score is

在本申请中，训练开销包括但不限于节点的CPU运算频率、内存容量、GPU运算频率、GPU显存容量、硬盘I/O吞吐率、上行网络带宽等指标，由于每个指标的量纲不一致，需要先去掉量纲先，再对每个指标进行归一化处理，具体方法为：统计一定历史周期内某指标x的数值，采样指标x的三个数值，分别是最高值x_max、平均值x_ave和最小值x_min，采用如下归一化方法：In this application, the training overhead includes but is not limited to the node's CPU operation frequency, memory capacity, GPU operation frequency, GPU memory capacity, hard disk I/O throughput rate, uplink network bandwidth and other indicators. Since the dimensions of each indicator are inconsistent, It is necessary to remove the dimension first, and then normalize each indicator. The specific method is: count the value of an indicator x in a certain historical period, and sample the three values of the indicator x, which are the highest value x _max and the average value. x _ave and minimum value x _min , normalized as follows:

其中，x为归一化后的指标，然后为每个归一化后的指标赋予不同权重，并乘以权重，得到最后的训练开销，权重设置可采用常见的层次分析法或机器学习方法得到。Among them, x is the normalized index, and then assign different weights to each normalized index, and multiply the weights to obtain the final training cost. The weight setting can be obtained by common AHP or machine learning methods. .

具体的，判断各个训练语料片段的行动价值得分正负值，当行动价值得分为正数时，上调行动价值得分对应的训练语料片段的占比，当行动价值得分为负数时，下调行动价值得分对应的训练语料片段的占比。例如，调整比例为5％，当行动价值得分为正数时，提高该行动价值得分对应的训练语料片段在训练样本中5％占比，当行动价值得分为负数时，降低该行动价值得分对应的训练语料片段在训练样本中5％占比，通过提高对于模型训练有益的的样本的比例和降低对于模型训练无益的的样本的比例，得到新的训练样本集合，进一步提高预训练语言模型的纠错能力。Specifically, determine the positive or negative value of the action value score of each training corpus segment. When the action value score is positive, the proportion of the training corpus segment corresponding to the action value score is increased. When the action value score is negative, the action value score is decreased. The proportion of the corresponding training corpus segments. For example, if the adjustment ratio is 5%, when the action value score is positive, the training corpus segment corresponding to the action value score is increased to account for 5% of the training samples. When the action value score is negative, the corresponding action value score is reduced. The training corpus fragments of 5% of the training samples account for 5% of the training samples. By increasing the proportion of samples that are useful for model training and reducing the proportion of samples that are unhelpful for model training, a new set of training samples is obtained, which further improves the performance of the pre-trained language model. Error correction capability.

S206，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型。S206, using the second training sample to iteratively train the pre-trained language model to obtain a text error correction model.

具体的，将第二训练样本导入预训练语言模型，并计算第二训练样本的行动价值得分，通过第二训练样本的行动价值得分和预训练语言模型的损失函数对模型进行迭代训练，得到文本纠错模型。Specifically, the second training sample is imported into the pre-training language model, the action value score of the second training sample is calculated, and the model is iteratively trained by the action value score of the second training sample and the loss function of the pre-training language model, and the text is obtained. Error correction model.

具体的，将第二训练样本导入预训练语言模型，得到第二预训练文本纠错结果，按照上述行动价值得分的计算方式计算第二训练样本中各个训练语料片段的行动价值得分，对第二训练样本中各个训练语料片段的行动价值得分进行求和，得到最终行动价值得分，基于最终行动价值得分对预训练语言模型进行迭代训练，得到文本纠错模型。基于以下公式计算最终行动价值得分：Specifically, the second training sample is imported into the pre-training language model to obtain the error correction result of the second pre-training text, and the action value score of each training corpus segment in the second training sample is calculated according to the above calculation method of the action value score. The action value scores of each training corpus segment in the training sample are summed to obtain the final action value score. Based on the final action value score, the pre-trained language model is iteratively trained to obtain a text error correction model. The final action value score is calculated based on the following formula:

式中，S为最终行动价值得分，n为迭代次数，γ为价值衰减系数。In the formula, S is the final action value score, n is the number of iterations, and γ is the value decay coefficient.

需要说明的是，由于每一次预训练后的行动价值得分对于后续训练均会产生影响，且这种影响随着迭代轮数增加会逐渐衰减，因此还需要计算训练后的潜在价值得分，定义价值衰减系数γ，通过价值衰减系数γ计算最终行动价值得分S。It should be noted that since the action value score after each pre-training will have an impact on the subsequent training, and this impact will gradually decay with the increase of the number of iterations, it is also necessary to calculate the potential value score after training and define the value. Decay coefficient γ, the final action value score S is calculated by the value decay coefficient γ.

本申请通过采用强化学习领域中的策略梯度方法进行优化，尽可能最大化价值得分，具体过程如下：训练一个对应类别为M个行动的多层神经网络，以一个二层神经网络为例，将第二训练样本对应的第二样本嵌入向量作为输入向量v输入多层神经网络，设第一隐藏层权重矩阵为w1，采用relu激活函数，偏置量为b1，输出o1＝relu(w1*v+b1)；设第二隐藏层权重矩阵为w2，偏置量为b2，输出o2＝relu(w2*o1+b2)，再通过softmax层得到神经网络输出o3，实践中还可以采用更多隐藏层来获得更好效果。This application optimizes by adopting the strategy gradient method in the field of reinforcement learning, and maximizes the value score as much as possible. The specific process is as follows: train a multi-layer neural network corresponding to M actions, take a two-layer neural network as an example, The second sample embedding vector corresponding to the second training sample is input to the multi-layer neural network as the input vector v, and the weight matrix of the first hidden layer is set to w1, the relu activation function is used, the offset is b1, and the output o1=relu(w1*v +b1); set the weight matrix of the second hidden layer to be w2, the offset to be b2, and output o2=relu(w2*o1+b2), and then obtain the neural network output o3 through the softmax layer. In practice, more hidden layers can be used. layers for better results.

S207，接收文本纠错指令，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。S207: Receive a text error correction instruction, obtain the text to be corrected, import the text to be corrected into a text error correction model, and output a text error correction result.

具体的，文本纠错模型完成训练后获得较好的文本纠错能力，服务器在接收文本纠错指令后，获取待纠错文本，将待纠错文本导入文本纠错模型，即可获得文本纠错结果。Specifically, the text error correction model obtains better text error correction ability after the training is completed. After receiving the text error correction instruction, the server obtains the text to be corrected, and imports the to-be-corrected text into the text error correction model to obtain the text correction. wrong result.

在本实施例中，基于强化学习的文本纠错方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式文本纠错指令。需要指出的是，上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device (for example, the server shown in FIG. 1 ) on which the reinforcement learning-based text error correction method runs may perform text error correction instructions through wired connection or wireless connection. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .

在上述实施例中，本申请公开了一种基于强化学习的文本纠错方法，属于自然语言处理技术领域。本申请通过收集训练语料，并对训练语料中的文本进行发音相似遮掩和字形相似遮掩，利用遮掩后的训练语料构建第一训练样本，对第一训练样本导入预训练语言模型，输出第一预训练文本纠错结果，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。本申请在训练纠错模型时引入发音信息和字形信息，通过发音相似遮掩和字形相似遮掩构造拥有丰富噪声的训练样本，并进一步通过强化学习技术训练纠错模型，使得模型能够更好地识别拼写错误，使得模型的泛化性能更强。In the above embodiment, the present application discloses a text error correction method based on reinforcement learning, which belongs to the technical field of natural language processing. This application collects training corpus, performs similar pronunciation masking and glyph similarity masking on the text in the training corpus, uses the masked training corpus to construct a first training sample, imports the first training sample into a pre-training language model, and outputs the first pre-training language model. training text error correction results, adjusting the first training sample based on the first pre-training text error correction results, generating a second training sample, using the second training sample to iteratively train the pre-training language model, obtaining a text error correction model, and obtaining For the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result. This application introduces pronunciation information and glyph information when training the error correction model, constructs training samples with rich noise through similar pronunciation masking and similar glyph masking, and further trains the error correction model through reinforcement learning technology, so that the model can better recognize spelling error, making the model more generalizable.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机可读指令来指令相关的硬件来完成，该计算机可读指令可存储于一计算机可读取存储介质中，该计算机可读指令在执行时，可包括如上述各方法的实施例的流程。其中，前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)等非易失性存储介质，或随机存储记忆体(Random Access Memory，RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. The aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

应该理解的是，虽然附图的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，其可以以其他的顺序执行。而且，附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，其执行顺序也不必然是依次进行，而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

进一步参考图4，作为对上述图2所示方法的实现，本申请提供了一种基于强化学习的文本纠错装置的一个实施例，该装置实施例与图2所示的方法实施例相对应，该装置具体可以应用于各种电子设备中。Further referring to FIG. 4 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a text error correction device based on reinforcement learning, which corresponds to the method embodiment shown in FIG. 2 . , the device can be specifically applied to various electronic devices.

如图4所示，本实施例所述的基于强化学习的文本纠错装置包括：As shown in FIG. 4 , the text error correction device based on reinforcement learning described in this embodiment includes:

文本遮掩模块301，用于收集训练语料，并按照预设的文本遮掩比例对训练语料中的文本进行发音相似遮掩和字形相似遮掩；The text masking module 301 is used for collecting training corpus, and performing similar pronunciation masking and similar glyph masking on the text in the training corpus according to a preset text masking ratio;

样本构建模块302，用于利用遮掩后的训练语料构建第一训练样本；A sample construction module 302, configured to construct a first training sample by using the masked training corpus;

向量转化模块303，用于对第一训练样本进行向量转化，得到第一样本嵌入向量；The vector transformation module 303 is used to perform vector transformation on the first training sample to obtain the first sample embedding vector;

预训练模块304，用于将第一样本嵌入向量导入预训练语言模型，输出第一预训练文本纠错结果；The pre-training module 304 is used to import the first sample embedding vector into the pre-training language model, and output the error correction result of the first pre-training text;

样本调整模块305，用于基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本；A sample adjustment module 305, configured to adjust the first training sample based on the error correction result of the first pre-training text to generate a second training sample;

迭代训练模块306，用于利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型；The iterative training module 306 is used to iteratively train the pre-trained language model by using the second training sample to obtain a text error correction model;

文本纠错模块307，用于接收文本纠错指令，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。The text error correction module 307 is configured to receive the text error correction instruction, obtain the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result.

进一步地，请参考图5，文本遮掩模块301具体包括：Further, please refer to FIG. 5, the text masking module 301 specifically includes:

语料划分单元311，用于收集训练语料，并对训练语料进行划分，得到若干个训练语料片段；The corpus dividing unit 311 is used for collecting training corpus, and dividing the training corpus to obtain several training corpus segments;

遮掩文本确定单元312，用于按照预设的文本遮掩比例在训练语料片段中的确定目标遮掩文本；A masking text determining unit 312, configured to determine the target masking text in the training corpus segment according to a preset text masking ratio;

相似文本确定单元313，用于在预设的文本混淆集中确定与目标遮掩文本对应的发音相似文本和字形相似文本；Similar text determination unit 313, for determining the pronunciation similar text and the glyph similar text corresponding to the target masked text in the preset text confusion set;

文本遮掩单元314，用于基于发音相似文本和字形相似文本对目标遮掩文本进行发音相似遮掩和字形相似遮掩。The text masking unit 314 is configured to perform pronunciation similarity masking and glyph similarity masking on the target masked text based on the pronunciation similar text and the glyph similarity text.

进一步地，样本构建模块302具体包括：Further, the sample building module 302 specifically includes:

第一语料组合单元，用于组合完成发音相似遮掩的训练语料片段和完成字形相似遮掩的训练语料片段，形成第一训练样本。The first corpus combining unit is used to combine the training corpus segments that have completed similar masking of pronunciation and the training corpus segments that have completed similar glyph masking to form a first training sample.

随机遮掩单元，用于从预设词汇表获取随机文本，并利用随机文本对目标遮掩文本进行文本随机遮掩；The random masking unit is used to obtain random text from the preset vocabulary, and use the random text to perform random text masking on the target masking text;

第二语料组合单元，用于组合完成发音相似遮掩的训练语料片段、完成字形相似遮掩的训练语料片段、完成文本随机遮掩的训练语料片段以及未进行文本遮掩的训练语料片段，形成第一训练样本。The second corpus combining unit is used to combine the training corpus segments with similar pronunciation masking, the training corpus segments with similar glyph masking, the training corpus segments with random text masking, and the training corpus segments without text masking to form the first training sample .

进一步地，样本调整模块305具体包括：Further, the sample adjustment module 305 specifically includes:

第一行动价值得分计算单元，用于基于第一预训练文本纠错结果计算第一训练样本中各个训练语料片段的行动价值得分；a first action value score calculation unit, configured to calculate the action value score of each training corpus segment in the first training sample based on the error correction result of the first pre-training text;

样本调整单元，用于基于各个训练语料片段的行动价值得分调整第一训练样本中各个训练语料片段的占比，得到第二训练样本。The sample adjustment unit is configured to adjust the proportion of each training corpus segment in the first training sample based on the action value score of each training corpus segment to obtain the second training sample.

进一步地，第一行动价值得分计算单元具体包括：Further, the first action value score calculation unit specifically includes:

F1值计算子单元，用于基于第一预训练文本纠错结果确定预训练语言模型的F1值；The F1 value calculation subunit is used to determine the F1 value of the pretrained language model based on the error correction result of the first pretrained text;

训练开销计算子单元，用于获取预训练语言模型的训练开销；The training cost calculation subunit is used to obtain the training cost of the pre-trained language model;

第一得分计算子单元，用于基于预训练语言模型的F1值和训练开销计算第一训练样本中各个训练语料片段的行动价值得分。The first score calculation subunit is configured to calculate the action value score of each training corpus segment in the first training sample based on the F1 value of the pre-trained language model and the training cost.

进一步地，样本调整单元具体包括：Further, the sample adjustment unit specifically includes:

行动价值判断子单元，用于判断各个训练语料片段的行动价值得分正负值；The action value judgment subunit is used to judge the positive and negative value of the action value score of each training corpus segment;

第一判断结果子单元，用于当行动价值得分为正数时，上调行动价值得分对应的训练语料片段的占比；The first judgment result subunit is used to increase the proportion of the training corpus segment corresponding to the action value score when the action value score is a positive number;

第二判断结果子单元，用于当行动价值得分为负数时，下调行动价值得分对应的训练语料片段的占比。The second judgment result subunit is used to lower the proportion of the training corpus segment corresponding to the action value score when the action value score is negative.

进一步地，迭代训练模块306具体包括：Further, the iterative training module 306 specifically includes:

训练样本导入单元，用于将第二训练样本导入预训练语言模型，得到第二预训练文本纠错结果；a training sample importing unit, used for importing the second training sample into the pre-training language model to obtain the error correction result of the second pre-training text;

第二得分计算单元，用于基于第二预训练文本纠错结果计算第二训练样本中各个训练语料片段的行动价值得分；The second score calculation unit is used to calculate the action value score of each training corpus segment in the second training sample based on the error correction result of the second pre-training text;

得分求和单元，用于对第二训练样本中各个训练语料片段的行动价值得分进行求和，得到最终行动价值得分；The score summation unit is used to sum the action value scores of each training corpus segment in the second training sample to obtain the final action value score;

迭代训练单元，用于基于最终行动价值得分对预训练语言模型进行迭代训练，得到文本纠错模型。The iterative training unit is used to iteratively train the pre-trained language model based on the final action value score to obtain a text error correction model.

进一步地，第一样本嵌入向量包括文本嵌入向量、位置嵌入向量、发音嵌入向量和字形嵌入向量，向量转化模块303具体包括：Further, the first sample embedding vector includes a text embedding vector, a position embedding vector, a pronunciation embedding vector and a glyph embedding vector, and the vector conversion module 303 specifically includes:

特征提取单元，用于对第一训练样本进行特征提取，得到文本特征、位置特征、发音特征和字形特征；a feature extraction unit, which is used to perform feature extraction on the first training sample to obtain text features, position features, pronunciation features and glyph features;

向量转化单元，用于分别对文本特征、位置特征、发音特征和字形特征进行向量转化，得到文本嵌入向量、位置嵌入向量、发音嵌入向量和字形嵌入向量。The vector transformation unit is used for vector transformation of text features, position features, pronunciation features and glyph features respectively, to obtain text embedding vector, position embedding vector, pronunciation embedding vector and glyph embedding vector.

在上述实施例中，本申请公开了一种基于强化学习的文本纠错装置，属于自然语言处理技术领域。本申请通过收集训练语料，并对训练语料中的文本进行发音相似遮掩和字形相似遮掩，利用遮掩后的训练语料构建第一训练样本，对第一训练样本导入预训练语言模型，输出第一预训练文本纠错结果，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。本申请在训练纠错模型时引入发音信息和字形信息，通过发音相似遮掩和字形相似遮掩构造拥有丰富噪声的训练样本，并进一步通过强化学习技术训练纠错模型，使得模型能够更好地识别拼写错误，使得模型的泛化性能更强。In the above embodiment, the present application discloses a text error correction device based on reinforcement learning, which belongs to the technical field of natural language processing. This application collects training corpus, performs similar pronunciation masking and glyph similarity masking on the text in the training corpus, uses the masked training corpus to construct a first training sample, imports the first training sample into a pre-training language model, and outputs the first pre-training language model. training text error correction results, adjusting the first training sample based on the first pre-training text error correction results, generating a second training sample, using the second training sample to iteratively train the pre-training language model, obtaining a text error correction model, and obtaining For the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result. This application introduces pronunciation information and glyph information when training the error correction model, constructs training samples with rich noise through similar pronunciation masking and similar glyph masking, and further trains the error correction model through reinforcement learning technology, so that the model can better recognize spelling error, making the model more generalizable.

为解决上述技术问题，本申请实施例还提供计算机设备。具体请参阅图6，图6为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.

所述计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是，图中仅示出了具有组件61-63的计算机设备6，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。其中，本技术领域技术人员可以理解，这里的计算机设备是一种能够按照事先设定或存储的指令，自动进行数值计算和/或信息处理的设备，其硬件包括但不限于微处理器、专用集成电路(ApplicationSpecific Integrated Circuit，ASIC)、可编程门阵列(Field－Programmable GateArray，FPGA)、数字处理器(Digital Signal Processor，DSP)、嵌入式设备等。The computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 that communicate with each other through a system bus. It should be pointed out that only the computer device 6 with components 61-63 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (ApplicationSpecific Integrated Circuit, ASIC), programmable gate array (Field-Programmable GateArray, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

所述存储器61至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如，SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中，所述存储器61可以是所述计算机设备6的内部存储单元，例如该计算机设备6的硬盘或内存。在另一些实施例中，所述存储器61也可以是所述计算机设备6的外部存储设备，例如该计算机设备6上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(FlashCard)等。当然，所述存储器61还可以既包括所述计算机设备6的内部存储单元也包括其外部存储设备。本实施例中，所述存储器61通常用于存储安装于所述计算机设备6的操作系统和各类应用软件，例如基于强化学习的文本纠错方法的计算机可读指令等。此外，所述存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 . In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (FlashCard) and so on. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed on the computer device 6, such as computer-readable instructions for a text error correction method based on reinforcement learning, and the like. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.

所述处理器62在一些实施例中可以是中央处理器(Central Processing Unit，CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制所述计算机设备6的总体操作。本实施例中，所述处理器62用于运行所述存储器61中存储的计算机可读指令或者处理数据，例如运行所述基于强化学习的文本纠错方法的计算机可读指令。In some embodiments, the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 62 is typically used to control the overall operation of the computer device 6 . In this embodiment, the processor 62 is configured to execute computer-readable instructions stored in the memory 61 or process data, for example, computer-readable instructions for executing the reinforcement learning-based text error correction method.

所述网络接口63可包括无线网络接口或有线网络接口，该网络接口63通常用于在所述计算机设备6与其他电子设备之间建立通信连接。The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.

在上述实施例中，本申请公开了一种计算机设备，属于自然语言处理技术领域。本申请通过收集训练语料，并对训练语料中的文本进行发音相似遮掩和字形相似遮掩，利用遮掩后的训练语料构建第一训练样本，对第一训练样本导入预训练语言模型，输出第一预训练文本纠错结果，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。本申请在训练纠错模型时引入发音信息和字形信息，通过发音相似遮掩和字形相似遮掩构造拥有丰富噪声的训练样本，并进一步通过强化学习技术训练纠错模型，使得模型能够更好地识别拼写错误，使得模型的泛化性能更强。In the above embodiments, the present application discloses a computer device, which belongs to the technical field of natural language processing. This application collects training corpus, performs similar pronunciation masking and glyph similarity masking on the text in the training corpus, uses the masked training corpus to construct a first training sample, imports the first training sample into a pre-training language model, and outputs the first pre-training language model. training text error correction results, adjusting the first training sample based on the first pre-training text error correction results, generating a second training sample, using the second training sample to iteratively train the pre-training language model, obtaining a text error correction model, and obtaining For the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result. This application introduces pronunciation information and glyph information when training the error correction model, constructs training samples with rich noise through similar pronunciation masking and similar glyph masking, and further trains the error correction model through reinforcement learning technology, so that the model can better recognize spelling error, making the model more generalizable.

本申请还提供了另一种实施方式，即提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机可读指令，所述计算机可读指令可被至少一个处理器执行，以使所述至少一个处理器执行如上述的基于强化学习的文本纠错方法的步骤。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the reinforcement learning based text error correction method as described above.

本申请公开了一种存储介质，属于自然语言处理技术领域。本申请通过收集训练语料，并对训练语料中的文本进行发音相似遮掩和字形相似遮掩，利用遮掩后的训练语料构建第一训练样本，对第一训练样本导入预训练语言模型，输出第一预训练文本纠错结果，基于第一预训练文本纠错结果对第一训练样本进行调整，生成第二训练样本，利用第二训练样本对预训练语言模型进行迭代训练，得到文本纠错模型，获取待纠错文本，将待纠错文本导入文本纠错模型，输出文本纠错结果。本申请在训练纠错模型时引入发音信息和字形信息，通过发音相似遮掩和字形相似遮掩构造拥有丰富噪声的训练样本，并进一步通过强化学习技术训练纠错模型，使得模型能够更好地识别拼写错误，使得模型的泛化性能更强。The present application discloses a storage medium, which belongs to the technical field of natural language processing. This application collects training corpus, performs similar pronunciation masking and glyph similarity masking on the text in the training corpus, uses the masked training corpus to construct a first training sample, imports the first training sample into a pre-training language model, and outputs the first pre-training language model. training text error correction results, adjusting the first training sample based on the first pre-training text error correction results, generating a second training sample, using the second training sample to iteratively train the pre-training language model, obtaining a text error correction model, and obtaining For the text to be corrected, import the text to be corrected into the text error correction model, and output the text error correction result. This application introduces pronunciation information and glyph information when training the error correction model, constructs training samples with rich noise through similar pronunciation masking and similar glyph masking, and further trains the error correction model through reinforcement learning technology, so that the model can better recognize spelling error, making the model more generalizable.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

本申请可用于众多通用或专用的计算机系统环境或配置中。例如：个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

显然，以上所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例，附图中给出了本申请的较佳实施例，但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现，相反地，提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明，对于本领域的技术人员来而言，其依然可以对前述各具体实施方式所记载的技术方案进行修改，或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构，直接或间接运用在其他相关的技术领域，均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims

1. a text error correction method based on reinforcement learning, is characterized in that, comprises:

Collect training corpus, and mask the text in the training corpus according to the preset text masking ratio for pronunciation similarity masking and glyph similarity masking;

Use the masked training corpus to construct a first training sample;

performing vector transformation on the first training sample to obtain a first sample embedding vector;

importing the first sample embedding vector into the pre-training language model, and outputting the error correction result of the first pre-training text;

Adjusting the first training sample based on the error correction result of the first pre-training text to generate a second training sample;

Perform iterative training on the pre-trained language model using the second training sample to obtain a text error correction model;

A text error correction instruction is received, a text to be corrected is obtained, the text to be corrected is imported into the text error correction model, and a text error correction result is output.

2. the text error correction method based on reinforcement learning as claimed in claim 1, is characterized in that, described collection training corpus, and according to preset text covering ratio to the text in described training corpus to carry out similar pronunciation masking and glyph Similar masking steps include:

Collect training corpus, and divide the training corpus to obtain several training corpus segments;

Determine the target masking text in the training corpus segment according to the preset text masking ratio;

Determine, in a preset text confusion set, texts with similar pronunciation and similar characters corresponding to the target masked text;

Pronunciation similarity masking and glyph similarity masking are performed on the target masked text based on the phonetic similarity text and the glyph similarity text.

3. The text error correction method based on reinforcement learning as claimed in claim 2, wherein the step of constructing the first training sample with the masked training corpus specifically comprises:

The first training sample is formed by combining the training corpus segment with the similar masked pronunciation and the training corpus segment with the similar glyph masked.

4. The text error correction method based on reinforcement learning as claimed in claim 2, wherein the step of constructing the first training sample with the masked training corpus specifically comprises:

Obtain random text from a preset vocabulary, and use the random text to perform random text masking on the target masking text;

Combining the training corpus fragments with similar pronunciation and masking, the training corpus fragments with similar glyphs, the training corpus fragments with random text masking, and the training corpus fragments without text masking, to form the first training sample .

5. The text error correction method based on reinforcement learning according to claim 3 or 4, wherein the first training sample is adjusted based on the first pre-training text error correction result to generate a second The steps of training samples include:

Calculate the action value score of each training corpus segment in the first training sample based on the error correction result of the first pre-training text;

The second training sample is obtained by adjusting the proportion of each training corpus segment in the first training sample based on the action value score of each training corpus segment.

6. The text error correction method based on reinforcement learning as claimed in claim 5, wherein the action value of each training corpus fragment in the first training sample is calculated based on the first pre-training text error correction result The steps for scoring include:

Determine the F1 value of the pre-trained language model based on the error correction result of the first pre-trained text;

obtaining the training cost of the pre-trained language model;

The action value score of each training corpus segment in the first training sample is calculated based on the F1 value of the pre-trained language model and the training cost.

7. The text error correction method based on reinforcement learning as claimed in claim 5, wherein the ratio of each training corpus segment in the first training sample is adjusted based on the action value score of each training corpus segment , the steps of obtaining the second training sample specifically include:

Judging the positive and negative value of the action value score of each training corpus segment;

When the action value score is a positive number, increase the proportion of the training corpus segment corresponding to the action value score;

When the action value score is a negative number, the proportion of the training corpus segment corresponding to the action value score is lowered.

8. The text error correction method based on reinforcement learning as claimed in claim 1, wherein the step of using the second training sample to iteratively train the pre-trained language model to obtain a text error correction model, specifically comprises :

importing the second training sample into the pre-training language model to obtain a second pre-training text error correction result;

Calculate the action value score of each training corpus segment in the second training sample based on the error correction result of the second pre-training text;

Summing the action value scores of each training corpus segment in the second training sample to obtain a final action value score;

The pre-trained language model is iteratively trained based on the final action value score to obtain a text error correction model.

9. The text error correction method based on reinforcement learning according to claim 1, wherein the first sample embedding vector comprises a text embedding vector, a position embedding vector, a pronunciation embedding vector and a glyph embedding vector, and the pair of The first training sample is transformed into a vector to obtain the first sample embedding vector, which specifically includes:

Perform feature extraction on the first training sample to obtain text features, position features, pronunciation features and glyph features;

The text feature, position feature, pronunciation feature and glyph feature are respectively vectorized to obtain the text embedding vector, the position embedding vector, the pronunciation embedding vector and the glyph embedding vector.

10. A text error correction device based on reinforcement learning, characterized in that, comprising:

A text masking module, used for collecting training corpus, and masking the text in the training corpus according to the preset text masking ratio for similar pronunciation and similar characters;

a sample building module, configured to use the masked training corpus to build a first training sample;

a vector transformation module, for performing vector transformation on the first training sample to obtain a first sample embedding vector;

A pre-training module, used for importing the first sample embedding vector into a pre-training language model, and outputting the error correction result of the first pre-training text;

a sample adjustment module, configured to adjust the first training sample based on the error correction result of the first pre-training text to generate a second training sample;

an iterative training module, configured to perform iterative training on the pre-trained language model by using the second training sample to obtain a text error correction model;

The text error correction module is used for receiving a text error correction instruction, obtaining the text to be corrected, importing the text to be corrected into the text error correction model, and outputting the text error correction result.

11. A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, any one of claims 1 to 9 is implemented The steps of the reinforcement learning-based text error correction method described in item.

12. A computer-readable storage medium, characterized in that, computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, any one of claims 1 to 9 is implemented. The steps of the reinforcement learning-based text error correction method described in item.