CN117421226A

CN117421226A - Defect report reconstruction method and system based on large language model

Info

Publication number: CN117421226A
Application number: CN202311420321.2A
Authority: CN
Inventors: 薄莉莉; 纪王杰; 孙小兵; 吕涛; 周运生
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2024-01-19

Abstract

The invention discloses a method and system for defect report reconstruction based on a generative large language model. The method includes the following steps: collecting and preprocessing statement sets and report sets, using the statement sets to establish a classification decision-making model, and using the classification decision-making model. Evaluate the quality of centralized defect reports, reconstruct low-quality defect reports using the established prompt template based on the generative large language model, and output a new defect report. The system includes a data acquisition module, a pre-evaluation data processing module, a defect report evaluation module, an input text creation module, and a defect report reconstruction module. Compared with the existing technology, the present invention has the characteristics of strong practicability and high accuracy.

Description

A method and system for defect report reconstruction based on a generative large language model

技术领域Technical field

本发明涉及软件维护领域，具体涉及一种基于生成式大语言模型的缺陷报告重构的方法及系统。The invention relates to the field of software maintenance, and in particular to a method and system for defect report reconstruction based on a generative large language model.

背景技术Background technique

软件缺陷指在软件中，如文档、程序中存在的影响软件正常运行的问题，通常被称为Bug。缺陷报告是软件维护过程中不可缺少的必要信息，描述软件缺陷现象和重现步骤的集合，缺陷报告是软件测试人员的工作成果之一，体现软件测试的价值缺陷报告可以把软件存在的缺陷准确的描述出来并进行定位，便于开发人员修正。同时反映项目/产品当前的质量状态，便于项目整体进度和质量控制。Software defects refer to problems in software, such as documents and programs, that affect the normal operation of the software. They are usually called bugs. Defect reports are indispensable and necessary information in the software maintenance process. They describe software defect phenomena and a collection of reproduction steps. Defect reports are one of the work results of software testers and reflect the value of software testing. Defect reports can accurately identify the defects in software. Description and positioning to facilitate developers to correct. At the same time, it reflects the current quality status of the project/product to facilitate overall project progress and quality control.

然而，现有的缺陷报告主要通过人工编写和分析，由于如缺陷报告者经验不一、软件自身及其使用场景的复杂性和缺陷追踪系统功能的不完善等因素的影响，缺陷修复人员所参考的缺陷报告其质量往往参差不齐，这对软件维护造成很大的困扰，准确性不高且效率不高。此外，依靠人工分析，增加了成本和开销。However, existing defect reports are mainly written and analyzed manually. Due to the influence of factors such as the varying experience of defect reporters, the complexity of the software itself and its usage scenarios, and the imperfect functions of the defect tracking system, defect fixers refer to The quality of defect reports is often uneven, which causes great trouble to software maintenance. It is not accurate and efficient. Additionally, relying on manual analysis adds cost and overhead.

现有的缺陷报告质量改进措施包括对缺陷报告机制和系统的改进以及对缺陷报告的内容进行改进。本发明对内容进行改进。现有的对缺陷报告本身质量的自动改进主要是利用相似的缺陷报告来补充缺陷报告中缺失的信息。如，目前已有一些工作使用动静态结合的混合分析方法来改进安卓应用或者众包测试应用，如文献《Ctras:Crowdsourcedtest report aggregation and summarization》利用冗余的测试报告来生成内容更丰富的增强版缺陷报告。根据报告中的描述文本和图片将冗余的缺陷报告进行分组，选取分组里面最有信息量的(informative)报告作为主报告(masterreport)，随后将剩余冗余报告中对缺陷理解和修复有补充作用的信息融入到主报告中，以此得到质量更高的众测报告。也有一些工作如《Bug report enrichment with application of automated fixerrecommendation》提出添加额外文本句子以扩充其内容的方法。待添加的句子取自历史报告中的问题描述内容，作者首先计算了候选句子和当前待扩充报告的文本、主题、组件/产品等6个相似性特征的值，通过对6个特征值进行加权求和得到最终的相似性值，值最大的前K个句子用来扩充短文本报告。然而由于相同功能的不同的项目版本，以及相同软件功能中存在不同的细小问题，都可能导致缺陷报告的内容即使大体上相同，但是报告的缺陷可能完全不相同的。而且利用相似的缺陷报告依赖于项目已有的数据，如果是一个全新项目和问题，那么这种改进措施的效果将大打折扣，建立的缺陷报告质量不高，实用性不强。Existing defect report quality improvement measures include improvements to the defect reporting mechanism and system as well as improvements to the content of defect reports. The present invention improves the content. Existing automatic improvements to the quality of defect reports themselves mainly use similar defect reports to supplement missing information in defect reports. For example, there are currently some works using a hybrid analysis method that combines dynamic and static methods to improve Android applications or crowdsourced test applications. For example, the document "Ctras: Crowdsourcedtest report aggregation and summarization" uses redundant test reports to generate an enhanced version with richer content. Defect report. Group the redundant defect reports according to the description text and pictures in the report, select the most informative report in the group as the master report, and then use the remaining redundant reports to supplement defect understanding and repair The function information is integrated into the main report to obtain a higher quality crowd testing report. There are also some works such as "Bug report enrichment with application of automated fixerrecommendation" that propose methods of adding additional text sentences to enrich their content. The sentence to be added is taken from the problem description content in the historical report. The author first calculated the values of 6 similarity features such as the candidate sentence and the text, topic, component/product of the current report to be expanded, and weighted the 6 feature values. The final similarity value is obtained by summing, and the top K sentences with the largest value are used to expand the short text report. However, due to different project versions of the same function and different minor problems in the same software function, even if the content of the defect report is generally the same, the reported defects may be completely different. Moreover, the use of similar defect reports relies on the existing data of the project. If it is a new project and problem, the effect of this improvement measure will be greatly reduced. The quality of the established defect report is not high and the practicality is not strong.

发明内容Contents of the invention

发明目的：本发明目的是提供一种精确度高、实用性强的一种基于生成式大语言模型的缺陷报告重构的方法及系统。Purpose of the invention: The purpose of the invention is to provide a method and system for defect report reconstruction based on a generative large language model that is highly accurate and practical.

技术方案：本发明所述的一种基于生成式大语言模型的缺陷报告重构的方法，所述方法包括以下步骤：Technical solution: A method for reconstructing defect reports based on a generative large language model according to the present invention, which method includes the following steps:

S1、采集并预处理初始语句集形成处理后语句集，所述初始语句集包括若干条语句、若干种标签，且一条语句对应至少一种标签；采集报告集，所述报告集包括若干个缺陷报告，每一个所述缺陷报告均包括ID、标题、描述文本；S1. Collect and preprocess an initial statement set to form a processed statement set. The initial statement set includes several statements and several tags, and one statement corresponds to at least one tag. Collect a report set, and the report set includes several defects. Report, each defect report includes ID, title, and description text;

S2、对所述处理后语句集进行数据增强，并与初始语句集合并得到扩充语句集，所述扩充语句集包括标签集；预处理所述缺陷报告形成语句列表，所述语句列表与缺陷报告一一对应；S2. Perform data enhancement on the processed statement set, and merge it with the initial statement set to obtain an expanded statement set. The expanded statement set includes a label set; preprocess the defect report to form a statement list, and the statement list and defect report one-to-one correspondence;

S3、基于注意力机制的预训练模型和所述扩充语句集建立分类决策模型，将所述语句列表输入所述分类决策模型，分类决策模型为语句列表分配并统计所述标签集中的标签，评估语句列表相应的缺陷报告的质量，按评估结果分为高质量缺陷报告和低质量缺陷报告；S3. Establish a classification decision-making model based on the pre-training model of the attention mechanism and the expanded statement set, input the statement list into the classification decision-making model, and the classification decision-making model allocates and counts labels in the label set for the statement list, and evaluates The quality of the defect report corresponding to the statement list is divided into high-quality defect report and low-quality defect report according to the evaluation results;

S4、获取生成式大语言模型，并建立生成式大语言模型的输入文本，所述输入文本包括提示模板和报告文本；其中，所述提示模板由人工编写和生成式大语言模型优化得到；所述报告文本由低质量缺陷报告的描述文本提取得到；S4. Obtain the generative large language model and establish the input text of the generative large language model. The input text includes a prompt template and a report text; wherein the prompt template is obtained by manual writing and optimization of the generative large language model; so The description text of the report is extracted from the description text of the low-quality defect report;

S5、将所述输入文本输入所述生成式大语言模型，重构所述低质量缺陷报告，输出新的缺陷报告。S5. Enter the input text into the generative large language model, reconstruct the low-quality defect report, and output a new defect report.

进一步的，步骤S1中，预处理所述初始语句集形成语句集的过程如下：1)删除重复语句及对应的标签；2)清除语句中的非法字符和强调字符，并保留对应的标签；3)删除字符数量小于5或大于1000的语句及对应的标签。Further, in step S1, the process of preprocessing the initial statement set to form a statement set is as follows: 1) Delete duplicate statements and corresponding labels; 2) Clear illegal characters and emphasized characters in the statements, and retain the corresponding labels; 3 )Delete statements and corresponding labels whose number of characters is less than 5 or greater than 1000.

进一步的，步骤S2中，预处理所述缺陷报告形成语句列表的过程如下：1)删除缺陷报告中的非文本数据；2)清除缺陷报告中的非法字符以及强调字符；3)调用NLTK的sent_tokenize函数将缺陷报告的标题和描述文本切分为若干条字符数量小于等于1000的语句，并汇总为语句列表。Further, in step S2, the process of preprocessing the defect report to form a statement list is as follows: 1) Delete non-text data in the defect report; 2) Clear illegal characters and emphasized characters in the defect report; 3) Call NLTK's sent_tokenize The function divides the title and description text of the defect report into several statements with a character count of less than or equal to 1000, and summarizes them into a statement list.

进一步的，步骤S2中，所述数据增强包括随机替换操作、随机插入操作和随机交换位置操作。Further, in step S2, the data enhancement includes random replacement operation, random insertion operation and random exchange position operation.

进一步的，步骤S3中，建立分类决策模型包括以下步骤：Further, in step S3, establishing the classification decision model includes the following steps:

S31、数据准备：将所述扩充语句集的一部分作为训练集，另外一部分作为验证集；S31. Data preparation: use part of the expanded statement set as a training set and the other part as a verification set;

S32、建立初始模型：获取基于注意力机制的预训练模型，设置任务为多标签分类任务，同时设置多标签分类任务的损失函数，用以学习所述扩充语句集中的知识，并根据知识将标签分配给缺陷报告对应的缺陷列表中的语句。S32. Establish an initial model: obtain a pre-training model based on the attention mechanism, set the task to a multi-label classification task, and set a loss function for the multi-label classification task to learn the knowledge in the expanded statement set, and classify the labels based on the knowledge. Statements in the defect list that are assigned to the defect report.

S33、训练初始模型得到分类决策模型：利用所述训练集训练所述初始模型，并用所述验证集验证，满足指标约束后停止训练得到分类决策模型。S33. Train the initial model to obtain the classification decision-making model: use the training set to train the initial model, and use the verification set to verify. Stop training after meeting the indicator constraints to obtain the classification decision-making model.

进一步的，所述指标包括准确率、精确率、召回率和F1分数。Further, the indicators include accuracy, precision, recall and F1 score.

进一步的，步骤S1中，所述标签包括空标签；步骤S2中，所述标签集包括空标签、OB、EB、S2R。Further, in step S1, the label includes an empty label; in step S2, the label set includes empty labels, OB, EB, and S2R.

进一步的，步骤S3中，评估缺陷报告指质量的过程包括：Further, in step S3, the process of evaluating the quality of the defect report includes:

1)接收缺陷报告对应的语句列表；1) Receive the statement list corresponding to the defect report;

2)为语句列表中的每一条语句分配标签；2) Assign a label to each statement in the statement list;

3)统计并判断语句列表中是否同时包括OB,EB,S2R，若包括，则评估所述缺陷报告为高质量报告；若不包括，则评估所述缺陷报告为低质量缺陷报告。3) Count and determine whether the statement list includes OB, EB, and S2R at the same time. If so, evaluate the defect report as a high-quality report; if not, evaluate the defect report as a low-quality defect report.

进一步的，还包括验证并重构新的缺陷报告的步骤：Further, it also includes steps to verify and reconstruct new defect reports:

预处理新的缺陷报告，形成对应的语句列表并输入所述分类决策模型，评估所述新的缺陷报告的质量，若评估为低质量缺陷报告，则提取所述低质量缺陷报告的描述文本作为报告文本，重复步骤S4-S5，直至评估为高质量缺陷报告。Preprocess the new defect report, form a corresponding statement list and input it into the classification decision model, evaluate the quality of the new defect report, if it is evaluated as a low-quality defect report, extract the description text of the low-quality defect report as Report text, repeat steps S4-S5 until it is evaluated as a high-quality defect report.

本发明所述的一种基于生成式大语言模型的缺陷报告重构的系统，所述系统包括：A defect report reconstruction system based on a generative large language model according to the present invention, the system includes:

数据采集模块，用以采集并预处理初始语句集形成处理后语句集，所述初始语句集包括若干条语句、若干种标签，且一条语句对应至少一种标签；采集报告集，所述报告集包括若干个缺陷报告，每一个所述缺陷报告均包括ID、标题、描述文本。The data collection module is used to collect and preprocess an initial statement set to form a processed statement set. The initial statement set includes several statements and several types of labels, and one statement corresponds to at least one label; collects a report set, and the report set Includes several defect reports, each of which includes ID, title, and description text.

评估前数据处理模块，用以对所述处理后语句集进行数据增强，并与初始语句集合并得到扩充语句集，所述扩充语句集包括标签集；预处理所述缺陷报告形成语句列表，所述语句列表与缺陷报告一一对应。The pre-evaluation data processing module is used to perform data enhancement on the processed statement set and merge it with the initial statement set to obtain an expanded statement set, where the expanded statement set includes a label set; preprocess the defect report to form a statement list, so The statement list corresponds to the defect report one-to-one.

缺陷报告评估模块，用以基于注意力机制的预训练模型和所述扩充语句集建立分类决策模型，将所述语句列表输入所述分类决策模型，分类决策模型为语句列表分配并统计所述标签集中标签，评估语句列表相应的缺陷报告的质量，按评估结果分为高质量缺陷报告和低质量缺陷报告。The defect report evaluation module is used to establish a classification decision-making model based on the pre-training model of the attention mechanism and the expanded statement set, input the statement list into the classification decision-making model, and the classification decision-making model allocates and counts the labels for the statement list. Centralize tags to evaluate the quality of defect reports corresponding to the statement list, and divide them into high-quality defect reports and low-quality defect reports based on the evaluation results.

输入文本建立模块，用以获取生成式大语言模型，并建立生成式大语言模型的输入文本，所述输入文本包括提示模板和报告文本；其中，所述提示模板由人工编写和生成式大语言模型优化得到；所述报告文本由低质量缺陷报告的描述文本提取得到。The input text creation module is used to obtain the generative large language model and establish the input text of the generative large language model. The input text includes a prompt template and a report text; wherein the prompt template is manually written and generated by the generative large language model. Obtained through model optimization; the report text is extracted from the description text of low-quality defect reports.

缺陷报告重构模块，用以将所述输入文本输入所述生成式大语言模型，重构所述低质量缺陷报告，输出新的缺陷报告。A defect report reconstruction module is used to input the input text into the generative large language model, reconstruct the low-quality defect report, and output a new defect report.

有益效果：本发明具有如下显著效果：1、精确度高：利用基于注意力机制的预训练模型和数据增强后的扩充语句集训练一个准确率高的分类决策模型；同时利用生成式大语言模型重构低质量的缺陷报告，利用生成式大语言模型内部海量的知识和提示模板引导生成一个高质量的新缺陷报告；2、实用性强：不同于传统的人工分析和利用相似内容来扩充原始的低质量缺陷报告，本发明利用生成式大语言模型内部的海量领域知识来自动化的撰写一个满足软件维护的高质量缺陷报告，降低人工理解低质量缺陷报告的成本，同时能够为下游自动化软件工程任务提供高质量的数据基础，实际应用领域更广、效率更高。Beneficial effects: The present invention has the following significant effects: 1. High accuracy: using the pre-training model based on the attention mechanism and the expanded sentence set after data enhancement to train a classification decision-making model with high accuracy; at the same time, using the generative large language model Reconstruct low-quality defect reports and use the massive knowledge and prompt templates inside the generative large language model to guide the generation of a high-quality new defect report; 2. Highly practical: different from traditional manual analysis and use of similar content to expand the original For low-quality defect reports, this invention uses the massive domain knowledge inside the generative large language model to automatically write a high-quality defect report that satisfies software maintenance, reduces the cost of manual understanding of low-quality defect reports, and at the same time can provide support for downstream automated software engineering. The task provides a high-quality data foundation, with wider practical application areas and higher efficiency.

附图说明Description of the drawings

图1为本发明方法的总流程图。Figure 1 is a general flow chart of the method of the present invention.

图2为生成新缺陷报告的具体流程图。Figure 2 is a specific flow chart for generating a new defect report.

具体实施方式Detailed ways

下面结合附图和具体实施方式，进一步阐明本发明。The present invention will be further elucidated below in conjunction with the accompanying drawings and specific embodiments.

请参阅图1所示，本发明公开了一种基于生成式大语言模型的缺陷报告重构的方法，所述方法包括以下步骤：Please refer to Figure 1. The present invention discloses a method for reconstructing defect reports based on a generative large language model. The method includes the following steps:

S1、采集并预处理初始语句集形成处理后语句集，所述初始语句集包括若干条语句、若干种标签，且一条语句对应至少一种标签；采集报告集，所述报告集包括若干个缺陷报告，每一个所述缺陷报告均包括ID、标题、描述文本。S1. Collect and preprocess an initial statement set to form a processed statement set. The initial statement set includes several statements and several tags, and one statement corresponds to at least one tag. Collect a report set, and the report set includes several defects. Report, each defect report includes ID, title, and description text.

S2、对所述处理后语句集进行数据增强，并与初始语句集合并得到扩充语句集，所述扩充语句集包括标签集。预处理所述缺陷报告形成语句列表，所述语句列表与缺陷报告一一对应。S2. Perform data enhancement on the processed statement set, and merge it with the initial statement set to obtain an expanded statement set, where the expanded statement set includes a label set. The defect report is preprocessed to form a statement list, and the statement list corresponds to the defect report one-to-one.

S3、基于注意力机制的预训练模型和所述扩充语句集建立分类决策模型，将所述语句列表输入所述分类决策模型，分类决策模型为语句列表分配并统计所述标签集中标签，评估语句列表相应的缺陷报告的质量，按评估结果分为高质量缺陷报告和低质量缺陷报告。S3. Establish a classification decision-making model based on the pre-training model of the attention mechanism and the expanded statement set, input the statement list into the classification decision-making model, and the classification decision-making model allocates and counts the labels in the label set for the statement list, and evaluates the statements The quality of the defect reports corresponding to the list is divided into high-quality defect reports and low-quality defect reports according to the evaluation results.

S4、获取生成式大语言模型，并建立生成式大语言模型的输入文本，所述输入文本包括提示模板和报告文本；其中，所述提示模板由人工编写和生成式大语言模型优化得到；所述报告文本由低质量缺陷报告的描述文本提取得到。S4. Obtain the generative large language model and establish the input text of the generative large language model. The input text includes a prompt template and a report text; wherein the prompt template is obtained by manual writing and optimization of the generative large language model; so The description text is extracted from the description text of low-quality defect reports.

其中，对该方法作具体阐述。首先，从开源社区和缺陷跟踪平台(如JIRA，Bugzilla,Mozilla)以及公开的人工标注的缺陷报告语句及语句标签数据集中收集初始语句集和报告集。Among them, the method is explained in detail. First, an initial statement set and report set are collected from open source communities and defect tracking platforms (such as JIRA, Bugzilla, Mozilla) and public manually annotated defect report statements and statement label data sets.

步骤S1中，预处理初始语句集形成处理后语句集的过程如下：1)删除重复语句及对应的标签；2)清除语句中的非法字符和强调字符(无法被utf-8编码的字符，emoji表情等)，并保留对应的标签；3)删除字符数量小于5或大于1000的语句及对应的标签。In step S1, the process of preprocessing the initial statement set to form the processed statement set is as follows: 1) Delete duplicate statements and corresponding labels; 2) Clear illegal characters and emphasized characters (characters that cannot be encoded by UTF-8, emoji) in the statement emoticons, etc.), and retain the corresponding tags; 3) Delete sentences and corresponding tags whose number of characters is less than 5 or greater than 1,000.

步骤S2中，预处理缺陷报告形成语句列表的过程如下：1)删除缺陷报告中的非文本数据(如超链接、图片)；2)清除缺陷报告中的非法字符以及强调字符；3)调用NLTK的sent_tokenize函数将缺陷报告的标题和描述文本切分为若干条字符数量小于等于1000的语句，并汇总为语句列表。其中，NLTK(Natural Language Toolkit是一个广泛使用的Python自然语言处理工具库，提供一个通用的、易于使用的自然语言处理工具集，同时还提供了可扩展的、可重用的模块和算法，以满足不同用户的需求。sent_tokenize函数通过统计学和启发规则相结合的方式来实现句子边界检测和句子分割，其核心是训练算法学习给定语言的句子特征。In step S2, the process of preprocessing the defect report to form a statement list is as follows: 1) Delete non-text data (such as hyperlinks, pictures) in the defect report; 2) Clear illegal characters and emphasized characters in the defect report; 3) Call NLTK The sent_tokenize function divides the title and description text of the defect report into several statements with a character count of less than or equal to 1000, and summarizes them into a list of statements. Among them, NLTK (Natural Language Toolkit) is a widely used Python natural language processing tool library, which provides a general and easy-to-use natural language processing toolset. It also provides scalable and reusable modules and algorithms to meet the needs of The needs of different users. The sent_tokenize function implements sentence boundary detection and sentence segmentation through a combination of statistics and heuristic rules. Its core is to train the algorithm to learn the sentence characteristics of a given language.

步骤S2中，数据增强包括随机替换操作、随机插入操作和随机交换位置操作，以解决数据类别不平衡的问题，得到更优的数据。具体过程如下：In step S2, data enhancement includes random replacement operations, random insertion operations and random exchange position operations to solve the problem of data category imbalance and obtain better data. The specific process is as follows:

S21、利用NLTK中的word_tokenize函数对语句中的所有单词进行token化，将语句中每个单词变成一个token。其中，word_tokenize通过统计学和启发规则相结合的方式来实现单词边界检测和字词划分。Token化属于去标识化技术的一种，将明文数据替换成与其一一对应的假名Token。在随后的应用环境中以Token高效流通，因为Token与明文是一一对应的，可以在绝大多数场景代替明文传输、交换、存储和使用。S21. Use the word_tokenize function in NLTK to tokenize all words in the statement and turn each word in the statement into a token. Among them, word_tokenize implements word boundary detection and word division through a combination of statistics and heuristic rules. Tokenization is a type of de-identification technology that replaces plaintext data with pseudonymous tokens that correspond to it one-to-one. Tokens are used for efficient circulation in subsequent application environments, because tokens correspond to plaintext one-to-one and can replace plaintext transmission, exchange, storage and use in most scenarios.

S22、所述随机替换操作为：随机在一个语句中选择n个token，从WordNet数据库获取选中的每个token对应的同义词集合，从所述同义词集合中随机选取一个同义词，替换对应的token。其中，n＝0.01*token_num，token_num为一条语句含有的token数量；WordNet是一种大规模的英语词汇数据库，它将英语单词和词组按照它们的意义进行组织，并建立起词意之间的各种关系。同义词会被组织在一个synset(同义词集)中。S22. The random replacement operation is: randomly select n tokens in a sentence, obtain the synonym set corresponding to each selected token from the WordNet database, randomly select a synonym from the synonym set, and replace the corresponding token. Among them, n=0.01*token_num, token_num is the number of tokens contained in a sentence; WordNet is a large-scale English vocabulary database that organizes English words and phrases according to their meanings and establishes the relationship between word meanings. kind of relationship. Synonyms are organized into a synset.

所述随机插入操作为：随机在一个语句中选择n个token，从WordNet数据库获取选中的每个token对应的同义词集合，从所述同义词集合中随机选取一个同义词，在所述语句中随机选择一个位置将同义词插入。The random insertion operation is: randomly select n tokens in a sentence, obtain the synonym set corresponding to each selected token from the WordNet database, randomly select a synonym from the synonym set, and randomly select a synonym in the sentence. Insert the synonym at the position.

所述随机交换位置操作为：随机在一个语句中选择两个token并交换两个token的位置。The random exchange position operation is: randomly selecting two tokens in one statement and exchanging the positions of the two tokens.

值得注意的是，在数据增强过程中，新生成的语句的标签与原始语句样本的标签保持一致。本实施例中，步骤S1中，所述标签包括空标签。步骤S2中，所述标签集包括空标签、OB、EB、S2R。It is worth noting that during the data augmentation process, the labels of the newly generated sentences are consistent with the labels of the original sentence samples. In this embodiment, in step S1, the label includes an empty label. In step S2, the label set includes empty labels, OB, EB, and S2R.

步骤S3中，建立分类决策模型，即为利用基于注意力机制的预训练模型BERT和扩充语句集为语句列表中的语句分配标签集中OB、EB、S2R的零个或多个标签的过程进行建模，训练分类决策模型并验证模型的有效性。具体包括以下步骤：In step S3, a classification decision model is established, which is a process of using the pre-training model BERT based on the attention mechanism and the expanded statement set to assign zero or more labels from the label set OB, EB, and S2R to the statements in the statement list. model, train the classification decision model and verify the effectiveness of the model. Specifically, it includes the following steps:

S31、数据准备：为了防止训练的分类决策模型过拟合，将所述扩充语句集的一部分作为训练集，另外一部分作为验证集。扩充语句集为将增强的数据集和初始语句集合并得到，此举可以扩充数据集规模，便于分类决策模型学习到更有用、更全面的知识。S31. Data preparation: In order to prevent the trained classification decision-making model from overfitting, part of the expanded statement set is used as a training set, and the other part is used as a verification set. The expanded statement set is obtained by merging the enhanced data set and the initial statement set. This can expand the size of the data set and facilitate the classification decision-making model to learn more useful and comprehensive knowledge.

S32、建立初始模型：获取基于注意力机制的预训练模型，设置任务为多标签分类任务，同时设置多标签分类任务的损失函数为BCEWithLogitsLoss，用以学习所述扩充语句集中的知识，并根据知识将标签分配给缺陷报告对应的缺陷列表中的语句；BCEWithLogitsLoss的公式为：S32. Establish the initial model: Obtain the pre-training model based on the attention mechanism, set the task to a multi-label classification task, and set the loss function of the multi-label classification task to BCEWithLogitsLoss to learn the knowledge in the expanded statement set, and based on the knowledge Assign labels to statements in the defect list corresponding to the defect report; the formula for BCEWithLogitsLoss is:

其中，N是样本数量；C是标签数量；x_i是第i个样本模型输出；y_i是第i个样本的真实标签，为0或1。Among them, N is the number of samples; C is the number of labels; _xi is the model output of the i-th sample; _yi is the true label of the i-th sample, which is 0 or 1.

其中，现有完成多标签分类任务的方法是训练三个二分类模型，对OB、EB、S2R中的每一个标签训练一个是或否的二分类模型，再将三个二分类模型的结果进行拼接。这样做的结果是需要训练多个模型，并且将多标签样本输入到二分类模型中分配单标签，并不符合扩充语句集的特征，会影响模型的表现。利用预训练模型BERT完成语句的多标签分类任务，仅需要训练一个模型即可，并且在训练过程中充分利用了标签集的所有标签，模型学习到的知识更符合扩充语句集的特征。在多标签分类任务常用的损失函数为Binary CrossEntropy和BCEWithLogitsLoss损失函数,这些损失函数可以同时计算每个标签的概率，并且允许一条语句属于多个标签。但本实施例中使用的BCEWithLogitsLoss直接在模型的原始输出上计算损失，不需要先应用Sigmoid函数。这样不仅提高计算效率，并且避免Sigmoid函数带来的数值不稳定问题。Among them, the existing method to complete the multi-label classification task is to train three two-classification models, train a yes or no two-classification model for each label in OB, EB, and S2R, and then compare the results of the three two-classification models. Splicing. The result of this is that multiple models need to be trained, and multi-label samples are input into the two-classification model to assign a single label, which does not meet the characteristics of the expanded statement set and will affect the performance of the model. Using the pre-trained model BERT to complete the multi-label classification task of statements only requires training one model, and all labels in the label set are fully utilized during the training process. The knowledge learned by the model is more in line with the characteristics of the expanded statement set. Commonly used loss functions in multi-label classification tasks are Binary CrossEntropy and BCEWithLogitsLoss loss functions. These loss functions can simultaneously calculate the probability of each label and allow one statement to belong to multiple labels. However, the BCEWithLogitsLoss used in this embodiment calculates the loss directly on the original output of the model without first applying the Sigmoid function. This not only improves calculation efficiency, but also avoids the numerical instability problem caused by the Sigmoid function.

S33、训练初始模型得到分类决策模型：利用所述训练集训练所述初始模型，并用所述验证集验证，满足指标约束后停止训练得到分类决策模型。其中，指标包括准确率、精确率、召回率和F1分数。准确率(Accuracy)是正确预测的标签数与总标签数之比。精确率(Precision)是正确预测为正例的样本数与所有预测为正例的样本数之比。召回率(Recall)是正确预测为正例的样本数与所有实际为正例的样本数之比，在很多场景下，不仅要预测准确，还需要模型能够识别尽可能多的正例，这需要评估模型的召回率(Recall)，即识别正样本的能力。F1分数(F1-score)是Precision和Recall的调和平均值，能够综合考虑模型的准确性和召回率，更加全面的评价模型的效果。S33. Train the initial model to obtain the classification decision-making model: use the training set to train the initial model, and use the verification set to verify. Stop training after meeting the indicator constraints to obtain the classification decision-making model. Among them, the indicators include accuracy, precision, recall and F1 score. Accuracy is the ratio of the number of correctly predicted labels to the total number of labels. Precision is the ratio of the number of samples correctly predicted as positive examples to the number of all samples predicted as positive examples. Recall is the ratio of the number of samples correctly predicted as positive examples to the number of all samples that are actually positive examples. In many scenarios, not only must the prediction be accurate, but the model must also be able to identify as many positive examples as possible, which requires Evaluate the model's recall, that is, its ability to identify positive samples. The F1 score (F1-score) is the harmonic average of Precision and Recall, which can comprehensively consider the accuracy and recall rate of the model and evaluate the effect of the model more comprehensively.

这些评估指标既可以衡量模型在每个标签上的表现，也可以衡量模型在多个标签的总体表现。并且，准确率公式如下：These evaluation metrics can measure both the performance of the model on each label and the overall performance of the model on multiple labels. And, the accuracy formula is as follows:

精确率公式如下：The accuracy formula is as follows:

召回率公式如下：The recall formula is as follows:

F1分数公式如下：The F1 score formula is as follows:

式中，TP是真正例，即预测为正且实际为正的数量；TN是真负例，即预测为负且实际为负的数量；FP是假正例，即预测为正但实际为负的数量；FN是假负例，即预测为负但实际为正的数量。In the formula, TP is the true positive example, that is, the number of positive cases that are predicted to be positive but actually positive; TN is the true negative cases, that is, the number of negative cases that are predicted to be negative but actually negative; FP is the false positive case, that is, the number of positive cases that are predicted to be positive but actually negative The number; FN is the number of false negatives, that is, the number that is predicted to be negative but is actually positive.

步骤S3中，评估缺陷报告质量的过程包括：In step S3, the process of evaluating defect report quality includes:

3)统计并判断语句列表中是否同时包括OB,EB,S2R；若包括，则评估所述缺陷报告为高质量报告；若不包括，则评估所述缺陷报告为低质量缺陷报告。3) Count and determine whether the statement list includes OB, EB, and S2R at the same time; if included, evaluate the defect report as a high-quality report; if not, evaluate the defect report as a low-quality defect report.

步骤S4中，建立提示模板的过程如下：In step S4, the process of establishing a prompt template is as follows:

S41、人工编写初始提示模板；S41. Manually write the initial prompt template;

所述提示模板包括角色赋予文本、标签(OB,EB,S2R)的定义和要求、任务描述。所述角色赋予文本用于赋予生成式大语言模型角色，引导生成式大语言模型输出更符合角色具有的知识。所述标签的定义和要求用以帮助生成式大语言模型理解任务中的关键信息。所述任务描述用于让生成式大语言模型理解任务，约束模型的输出范围。本实施例中，角色赋予文本为：你的角色是一名高级软件工程师，那你在软件测试和软件维护领域有着丰富的经验。标签(OB,EB,S2R)的定义和要求具体为，OB(观察行为)：相关的软件行为、动作、输出或结果。像“系统不起作用”这样没有信息的句子不被认为是OB。EB(期望行为)：如果句子中包含与软件应该/希望/期望发生的事情相关的短语，如‘应该……’、‘期望……’、‘希望……’应该被认为是EB。解决bug的建议或建议不被认为是EB。S2R(步骤重现)：如果一个句子潜在地包含了用户的动作或操作，应该被认为是S2R。像‘to reproduction’，‘stepsto reproduction’，‘follow these steps’，都不被认为是S2R”。本实施例中，任务描述为：你应该根据上下文推断出适当的细节，并用清晰的OB/EB/S2R补充完整的bug报告。在可能的情况下，你应该改进现有的OB,EB,S2R声明的措辞，使其更清晰。对于S2R，你应该尽可能给出清晰、准确和完整的步骤。The prompt template includes role assignment text, definition and requirements of tags (OB, EB, S2R), and task description. The role assignment text is used to assign roles to the generative large language model, and guide the generative large language model to output knowledge that is more in line with the role. The definition and requirements of the labels are used to help the generative large language model understand the key information in the task. The task description is used to allow the generative large language model to understand the task and constrain the output range of the model. In this embodiment, the role assignment text is: Your role is a senior software engineer, so you have rich experience in the fields of software testing and software maintenance. The definition and requirements of tags (OB, EB, S2R) are specifically, OB (observation behavior): related software behavior, action, output or result. Sentences without information like "The system doesn't work" are not considered OB. EB (Expected Behavior): Sentences containing phrases related to what the software should/want/expect to happen, such as 'should...', 'expect...', 'hope...' should be considered EB. Suggestions or recommendations to fix bugs are not considered EB. S2R (Step to Reproduction): If a sentence potentially contains an action or operation by the user, it should be considered S2R. Things like 'to reproduction', 'steps to reproduction', 'follow these steps', are not considered S2R". In this example, the task description is: You should infer appropriate details from the context and use clear OB/EB /S2R supplements complete bug reports. Where possible, you should improve the wording of existing OB, EB, and S2R statements to make them clearer. For S2R, you should give as clear, accurate, and complete steps as possible .

S42、将所述初始提示模板输入生成式大语言模型进行总结和优化，输出优化提示模板。让生成式大语言模型对人工编写的初始提示模板进行总结和优化，使得模型能够更全面准确的理解提示模板中描述的任务和相关定义及要求。S42. Input the initial prompt template into the generative large language model for summary and optimization, and output the optimized prompt template. Let the generative large language model summarize and optimize the manually written initial prompt template, so that the model can more comprehensively and accurately understand the tasks and related definitions and requirements described in the prompt template.

S43、验证所述优化提示模板，确保所述优化提示模板符合任务要求后输出为提示模板。验证方式为人工验证。S43. Verify the optimization prompt template, ensure that the optimization prompt template meets the task requirements and then output it as a prompt template. The verification method is manual verification.

此外，本发明还包括验证并重构新的缺陷报告的步骤：In addition, the present invention also includes the steps of verifying and reconstructing the new defect report:

预处理步骤S5输出的新的缺陷报告，形成对应的语句列表并输入所述分类决策模型，评估所述新的缺陷报告的质量，若评估为低质量缺陷报告，则提取所述低质量缺陷报告的描述文本作为报告文本，重复步骤S4-S5，直至评估为高质量缺陷报告。The new defect report output by preprocessing step S5 forms a corresponding statement list and is input into the classification decision model to evaluate the quality of the new defect report. If it is evaluated as a low-quality defect report, extract the low-quality defect report. The description text is used as the report text, and steps S4-S5 are repeated until it is evaluated as a high-quality defect report.

本发明还公开了一种基于生成式大语言模型的缺陷报告重构的系统，所述系统包括：The invention also discloses a system for reconstructing defect reports based on a generative large language model. The system includes:

数据采集模块：用以采集并预处理初始语句集形成处理后语句集，所述初始语句集包括若干条语句、若干种标签，且一条语句对应至少一种标签；采集报告集，所述报告集包括若干个缺陷报告，每一个所述缺陷报告均包括ID、标题、描述文本；Data collection module: used to collect and preprocess an initial statement set to form a processed statement set. The initial statement set includes several statements and several tags, and one statement corresponds to at least one tag; collects a report set, and the report set Includes several defect reports, each of which includes ID, title, and description text;

评估前数据处理模块：用以对所述处理后语句集进行数据增强，并与初始语句集合并得到扩充语句集，所述扩充语句集包括标签集；预处理所述缺陷报告形成语句列表，所述语句列表与缺陷报告一一对应；Pre-assessment data processing module: used to perform data enhancement on the processed statement set and merge it with the initial statement set to obtain an expanded statement set, where the expanded statement set includes a label set; preprocess the defect report to form a statement list, so The statement list corresponds to the defect report one-to-one;

缺陷报告评估模块：基于注意力机制的预训练模型和所述扩充语句集建立分类决策模型，将所述语句列表输入所述分类决策模型，分类决策模型为语句列表分配并统计所述标签集中的标签，评估语句列表相应的缺陷报告的质量，按评估结果分为高质量缺陷报告和低质量缺陷报告；Defect report evaluation module: establish a classification decision-making model based on the pre-training model of the attention mechanism and the expanded statement set, input the statement list into the classification decision-making model, and the classification decision-making model allocates and counts the statements in the label set for the statement list Label, evaluate the quality of the defect report corresponding to the statement list, and divide it into high-quality defect report and low-quality defect report according to the evaluation results;

输入文本建立模块：获取生成式大语言模型，并建立生成式大语言模型的输入文本，所述输入文本包括提示模板和报告文本；其中，所述提示模板由人工编写和生成式大语言模型优化得到；所述报告文本由低质量缺陷报告的描述文本提取得到；Input text creation module: obtains the generative large language model and establishes the input text of the generative large language model. The input text includes a prompt template and a report text; wherein the prompt template is manually written and optimized by the generative large language model. Obtained; the report text is extracted from the description text of the low-quality defect report;

缺陷报告重构模块：用以将所述输入文本输入所述生成式大语言模型，重构所述低质量缺陷报告，输出新的缺陷报告。Defect report reconstruction module: used to input the input text into the generative large language model, reconstruct the low-quality defect report, and output a new defect report.

Claims

1. A method for reconstructing defect reports based on a generative large language model, characterized in that the method includes the following steps:

S1. Collect and preprocess an initial statement set to form a processed statement set. The initial statement set includes several statements and several tags, and one statement corresponds to at least one tag. Collect a report set, and the report set includes several defects. Report, each defect report includes ID, title, and description text;

S2. Perform data enhancement on the processed statement set, and merge it with the initial statement set to obtain an expanded statement set. The expanded statement set includes a label set; preprocess the defect report to form a statement list, and the statement list and defect report one-to-one correspondence;

S3. Establish a classification decision-making model based on the pre-training model of the attention mechanism and the expanded statement set, input the statement list into the classification decision-making model, and the classification decision-making model allocates and counts labels in the label set for the statement list, and evaluates The quality of the defect report corresponding to the statement list is divided into high-quality defect report and low-quality defect report according to the evaluation results;

S4. Obtain the generative large language model and establish the input text of the generative large language model. The input text includes a prompt template and a report text; wherein the prompt template is obtained by manual writing and optimization of the generative large language model; so The description text of the report is extracted from the description text of the low-quality defect report;

S5. Enter the input text into the generative large language model, reconstruct the low-quality defect report, and output a new defect report.

2. The method of defect report reconstruction based on a generative large language model according to claim 1, characterized in that, in step S1, the process of preprocessing the initial statement set to form a statement set is as follows: 1) Delete duplicate statements. and corresponding labels; 2) Clear illegal characters and emphasized characters in the statement, and retain the corresponding labels; 3) Delete statements and corresponding labels whose number of characters is less than 5 or greater than 1,000.

3. The method for reconstructing a defect report based on a generative large language model according to claim 1, characterized in that, in step S2, the process of preprocessing the defect report to form a statement list is as follows: 1) Delete the defect report non-text data; 2) Clear illegal characters and emphasized characters in the defect report; 3) Call NLTK's sent_tokenize function to divide the title and description text of the defect report into several statements with a number of characters less than or equal to 1000, and summarize them into statements list.

4. The method of defect report reconstruction based on a generative large language model according to claim 2, characterized in that in step S2, the data enhancement includes a random replacement operation, a random insertion operation and a random exchange position operation.

5. The method of defect report reconstruction based on a generative large language model according to claim 1, characterized in that, in step S3, establishing a classification decision model includes the following steps:

S31. Data preparation: use part of the expanded statement set as a training set and the other part as a verification set;

S32. Establish an initial model: obtain a pre-training model based on the attention mechanism, set the task to a multi-label classification task, and set a loss function for the multi-label classification task to learn the knowledge in the expanded statement set, and classify the labels based on the knowledge. Statements in the defect list assigned to the defect report;

S33. Train the initial model to obtain the classification decision-making model: use the training set to train the initial model, and use the verification set to verify. Stop training after meeting the indicator constraints to obtain the classification decision-making model.

6. The method of defect report reconstruction based on a generative large language model according to claim 5, wherein the indicators include accuracy, precision, recall and F1 score.

7. The method of defect report reconstruction based on a generative large language model according to claim 1, characterized in that, in step S1, the label includes an empty label; in step S2, the label set includes an empty label, OB, EB, S2R.

8. The method of defect report reconstruction based on a generative large language model according to claim 7, characterized in that, in step S3, the process of evaluating the quality of the defect report includes:

1) Receive the statement list corresponding to the defect report;

2) Assign a label to each statement in the statement list;

3) Count and determine whether the statement list includes OB, EB, and S2R at the same time. If so, evaluate the defect report as a high-quality report; if not, evaluate the defect report as a low-quality defect report.

9. The method for reconstructing defect reports based on a generative large language model according to claim 1, further comprising the step of verifying and reconstructing the new defect report:

Preprocess the new defect report, form a corresponding statement list and input it into the classification decision model, evaluate the quality of the new defect report, if it is evaluated as a low-quality defect report, extract the description text of the low-quality defect report as Report text, repeat steps S4-S5 until it is evaluated as a high-quality defect report.

10. A system for reconstructing defect reports based on a generative large language model, characterized in that the system includes:

The data collection module is used to collect and preprocess an initial statement set to form a processed statement set. The initial statement set includes several statements and several types of labels, and one statement corresponds to at least one label; collects a report set, and the report set Includes several defect reports, each of which includes ID, title, and description text;

The pre-evaluation data processing module is used to perform data enhancement on the processed statement set and merge it with the initial statement set to obtain an expanded statement set, where the expanded statement set includes a label set; preprocess the defect report to form a statement list, so The statement list corresponds to the defect report one-to-one;

The defect report evaluation module is used to establish a classification decision-making model based on the pre-training model of the attention mechanism and the expanded statement set, input the statement list into the classification decision-making model, and the classification decision-making model allocates and counts the labels for the statement list. Centralize labels to evaluate the quality of defect reports corresponding to the statement list, and divide them into high-quality defect reports and low-quality defect reports based on the evaluation results;

The input text creation module is used to obtain the generative large language model and establish the input text of the generative large language model. The input text includes a prompt template and a report text; wherein the prompt template is manually written and generated by the generative large language model. Obtained through model optimization; the report text is extracted from the description text of low-quality defect reports;

A defect report reconstruction module is used to input the input text into the generative large language model, reconstruct the low-quality defect report, and output a new defect report.