KR101691083B1

KR101691083B1 - System and Method for Bug Fixing Developers Recommendation and Bug Severity Prediction based on Topic Model and Multi-Feature

Info

Publication number: KR101691083B1
Application number: KR1020150008201A
Authority: KR
Inventors: 이병정; 장도; 양근석
Original assignee: 서울시립대학교 산학협력단
Priority date: 2015-01-16
Filing date: 2015-01-16
Publication date: 2016-12-29
Also published as: KR20160088737A

Abstract

본 발명은 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템 및 방법에 관한 것으로, 보다 상세하게는 버그의 심각성을 예측하고, 소프트웨어의 개발 시간과 비용을 줄일 수 있는 버그 정정 개발자를 추천하는 시스템 및 방법에 관한 것이다.
이러한 목적을 달성하기 위하여 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템은 제1 추출부, 선정부 및 추천부를 포함한다.The present invention relates to a bug-correcting developer recommendation and a bug severity prediction system and method based on a topic model and a multi-attribute, more particularly, to a bug-correcting developer capable of predicting the severity of a bug, &Lt; / RTI >
In order to achieve the above object, a bug correction developer recommendation and a bug severity prediction system based on a topic model and multiple characteristics according to an embodiment of the present invention includes a first extracting unit, a prefixing unit, and a recommending unit.

Description

{System and Method for Bug Fixing Developers Recommendation and Bug Severity Prediction based on Topic Model and Multi-Feature}

본 발명은 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템 및 방법에 관한 것으로, 보다 상세하게는 버그의 심각성을 예측하고, 소프트웨어의 개발 시간과 비용을 줄일 수 있는 버그 정정 개발자를 추천하는 시스템 및 방법에 관한 것이다.The present invention relates to a bug-correcting developer recommendation and a bug severity prediction system and method based on a topic model and a multi-attribute, more particularly, to a bug-correcting developer capable of predicting the severity of a bug, &Lt; / RTI >

본 발명은 미래창조과학부 및 한국연구재단의 차세대정보컴퓨팅기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2014050304, 과제명: 의미기반 상시모니터링을 위한 SW 공학 기법 및 도구 원천기술 개발 (상시모니터링 연동 의미기반 테스트 지원 기술].The present invention is derived from research conducted as a part of the next generation information computing technology development project of the future creation science department and the Korea Research Foundation [assignment number: 2014050304, task name: SW engineering technique and tool source technology Development (Semantic-based test support technology with integrated monitoring).

PC(개인용 컴퓨터)나 스마트 폰, 비디오 게임기와 같은 컴퓨터 기기를 통해 각종 소프트웨어를 이용하는 과정에서 이유 없이 오류 메시지가 출력되거나 기기가 오작동하는 경우를 종종 볼 수 있다. 이런 현상은 단순한 기기 고장 때문에 일어나기도 하지만 사용하는 하드웨어나 소프트웨어 자체의 결함에 의한 경우도 많으며, 이는 해당 하드웨어와 소프트웨어를 구성하고 있는 프로그램의 내용 중에 잘못된 코드가 들어 있음을 의미한다. 이렇게 프로그램 상의 결함에 의해 컴퓨터 오류나 오작동이 일어나는 현상을 버그(Bug)라고 한다.Often, an error message is output or a device malfunctions in the process of using various software through a computer such as a PC (personal computer), a smart phone, or a video game machine. This phenomenon is caused by a simple device malfunction, but it is often caused by defects in the hardware or the software itself. This means that the contents of the program constituting the hardware and the software contain erroneous codes. A computer fault or malfunction caused by a program defect is called a bug.

버그는 복잡한 소프트웨어일수록 많이 발생되며, 발생된 버그는 정정을 위하여 버그 리포트와 함께 개발자에게 할당된다. 이때, 새로운 버그는 버그 저장소에 있는 배정자(triager)에 의하여 개발자에게 할당된다.Bugs occur more often in complex software, and bugs that occur are assigned to developers along with bug reports for correction. At this point, the new bug is assigned to the developer by the triager in the bug repository.

종래의 대규모 소프트웨어는 대부분 버그 정정 활동을 수행하기 위하여 제출되는 버그 리포트에 많이 의존하고 있다. 하지만, 종래에는 많은 버그 리포트가 제출됨에도 불구하고, 각각의 버그가 얼마나 조속히 처리되어야 하는지를 알 수 없었기 때문에, 제출되는 버그 리포트를 효과적으로 정정하는 작업이 이루어지지 못했다.Conventional large-scale software is largely dependent on bug reports submitted for performing bug correction activities. However, despite the fact that many bug reports have been submitted in the past, it has not been possible to effectively correct the submitted bug reports because it was impossible to know how quickly each bug should be handled.

또한, 종래에는 많은 버그 리포트가 적절한 개발자에게 할당되지 못함에 따라 재할당되는 경우가 빈번히 발생하였으며, 이는 버그 정정의 시간 및 비용을 증가시키고, 소프트웨어의 개발 비용 또한 증대시키는 문제가 있었다.In addition, in the past, many bug reports have been frequently reallocated due to failure to allocate them to appropriate developers. This has increased the time and cost of bug correction and increased software development costs.

한편, 한국공개특허 제2012-0069388호 "정적 결함 분류 및 보고 자동화 시스템 및 그 방법"은 하나 이상의 버그 검출기로부터 추출된 데이터를 바탕으로, 버그의 유형에 따른 클래스를 분류하여 기본 점수를 부여하고, 버그의 오류발생 조건(precondition) 및 버그 검출기들로부터 각각 추출된 버그 검사 데이터를 비교 검토(Cross-check)하여 가산점을 부여하고, 부여된 점수에 따른 버그 항목을 클래스 별로 최종 조정함으로써, 버그 검출기를 통해 보고된 버그 중 True Alarm일 가능성이 높은 항목을 쉽게 할당할 수 있으며, 전체적으로는 False Alarm rate를 줄일 수 있는 기술을 제시한다.Korean Unexamined Patent Publication No. 2012-0069388 entitled " Automatic Static Defect Classification and Reporting Automation System and Method "thereof classifies classes according to types of bugs based on data extracted from one or more bug detectors, Bug check data extracted from the bug's precondition and bug detectors are cross-checked to give added points and the bug items according to the given scores are finally adjusted for each class, Among the reported bugs, it is easy to allocate items that are likely to be true alarms, and overall, it suggests techniques that can reduce the false alarm rate.

하지만, 상기 선행기술은 단순히 버그 검출기로부터 추출된 방대한 양의 결과 데이터를 효율적으로 검토하고, 대상 소프트웨어의 버그를 빠른 시간 안에 찾아내기 위한 버그 할당 기술일 뿐, 상기 할당된 버그를 정정하기 위하여 상기 버그를 개발자에게 효율적으로 할당하는 기술에 대해서는 전혀 언급하고 있지 않다.However, the prior art is merely a bug allocation technique for efficiently reviewing a vast amount of result data extracted from the bug detector and finding the bug of the target software in a short time. In order to correct the allocated bug, To developers in a more efficient manner.

즉, 상기 선행기술은 많은 버그 리포트가 적절한 개발자에게 할당되지 못하여 재할당되는 경우가 발생하게 되며, 이에 따라 버그 정정의 시간 및 비용이 증가하고, 소프트웨어의 개발 비용 또한 증대되는 종래의 문제가 여전히 존재하며, 또한, 상기 선행기술은 제출되는 버그 리포트를 효율적으로 정정하지 못하고 있다.That is, in the prior art, many bug reports are not allocated to appropriate developers and are reallocated. Accordingly, there is still a problem in that bug correction time and cost are increased and software development cost is increased. In addition, the prior art fails to efficiently correct the bug report to be submitted.

한국공개특허 제2012-0069388호 (공개일: 2012.06.28)Korean Patent Publication No. 2012-0069388 (Publication date: June 28, 2012)

본 발명은 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템 및 방법에 관한 것으로, 보다 상세하게는 버그의 심각성을 예측하고, 소프트웨어의 개발 시간과 비용을 줄일 수 있는 버그 정정 개발자를 추천하는 시스템 및 방법을 제공하려는 것을 목적으로 한다.The present invention relates to a bug-correcting developer recommendation and a bug severity prediction system and method based on a topic model and a multi-attribute, more particularly, to a bug-correcting developer capable of predicting the severity of a bug, It is an object of the present invention to provide a recommended system and method.

본 발명은 제출되는 버그 리포트의 심각성을 예측함으로써 제출되는 버그 리포트를 효율적으로 정정하려는 것을 목적으로 한다.The present invention aims at efficiently correcting a bug report submitted by predicting the severity of a submitted bug report.

본 발명은 복잡한 소프트웨어에서 발생하는 많은 버그를 정정하기 위한 적절한 버그 정정 개발자를 추천함으로써, 소프트웨어의 품질을 향상시키고 개발의 시간 및 비용을 절감하려는 것을 목적으로 한다.The present invention aims at improving the quality of software and reducing the time and cost of development by recommending an appropriate bug correction developer to correct many bugs that occur in complicated software.

이러한 목적을 달성하기 위하여 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템은 제1 추출부, 선정부 및 추천부를 포함한다.In order to achieve the above object, a bug correction developer recommendation and a bug severity prediction system based on a topic model and multiple characteristics according to an embodiment of the present invention includes a first extracting unit, a prefixing unit, and a recommending unit.

상기 제1 추출부는 모델링된 토픽을 기반으로 수신한 새로운 버그 리포트와 대응하는 토픽을 식별하고, 상기 식별된 토픽을 가진 과거 버그 리포트를 추출하고, 상기 선정부는 상기 새로운 버그 리포트를 정정할 개발자를 추천하기 위하여, 상기 추출된 과거 버그 리포트를 이용하여 후보 개발자를 선정하며, 상기 추천부는 상기 선정된 후보 개발자의 활동 경험 정보를 이용하여 상기 후보 개발자의 추천 순위를 연산하고, 상기 연산된 추천 순위를 기반으로 상기 새로운 버그 리포트를 정정할 개발자를 추천한다.The first extracting unit identifies a new bug report and a corresponding topic received based on the modeled topic, extracts a past bug report having the identified topic, and the selecting unit specifies a developer to correct the new bug report The recommendation unit computes a recommendation rank of the candidate developer using the activity experience information of the selected candidate developer, and calculates the recommendation rank based on the calculated recommendation rank, To recommend the developer to correct the new bug report.

또한, 상기 제1 추출부는 상기 새로운 버그 리포트에 나타나는 토픽 용어의 빈도수를 이용하여 식별할 수 있으며, 상기 선정부는 상기 과거 버그 리포트로부터 추출된 담당자와 코멘터를 포함하는 제1 개발자와, 상기 과거 버그 리포트와 같은 특성을 갖는 버그 리포트로부터 추출된 제2 개발자를 기반으로 상기 후보 개발자를 선정할 수 있다. 이때, 상기 선정부는 상기 과거 버그 리포트 내에 포함된 제품, 구성 요소, 우선 순위, 심각성 중 적어도 하나 이상의 특성 정보를 이용하여 상기 제2 개발자를 추출할 수 있다.Also, the first extracting unit may identify the first bug report using the frequency of the topic term appearing in the new bug report. The selecting unit may include a first developer that includes a contact person and a commenter extracted from the past bug report, The candidate developer can be selected based on the second developer extracted from the bug report having the same characteristics as the report. At this time, the selection unit may extract the second developer using at least one characteristic information of the product, the component, the priority, and the severity included in the past bug report.

또한, 상기 추천부는 상기 활동 경험 정보로서, 상기 후보 개발자가 버그 정정에 참여한 활동을 나타내는 코멘트(comments) 수와 커밋(commits) 수, 및 상기 후보 개발자가 버그 정정에 참여한 경험을 나타내는 할당(assignments) 수와 첨부 파일(attachment) 수를 이용할 수 있으며, 이때, 상기 추천부는 상기 코멘트 수, 상기 커밋 수, 상기 할당 수가 높고, 상기 첨부 파일의 할당 수가 낮을수록, 상기 후보 개발자들의 추천 순위가 높은 것으로 판단할 수 있다.In addition, the recommendation unit may include, as the activity experience information, information on the number of comments and commits indicating the activity in which the candidate developer participated in the bug correction, assignments indicating the experience that the candidate developer participated in bug correction, Number of attachments and the number of attachments can be used. In this case, the recommendation unit determines that the recommendation rank of the candidate developers is higher as the number of comments, the number of commits, the number of assignments, can do.

또한, 본 발명의 시스템은 상기 제1 추출부에서 추출된 과거 버그 리포트를 기반으로, 상기 새로운 버그 리포트의 심각성을 예측하는 예측부를 더 포함할 수 있다.The system of the present invention may further include a predicting unit for predicting the severity of the new bug report based on the past bug report extracted by the first extracting unit.

이때, 상기 예측부는 상기 제1 추출부에서 추출된 과거 버그 리포트로부터 같은 특성을 갖는 공통 버그 리포트를 추출하는 제2 추출부와, 상기 추출된 공통 버그 리포트와 상기 새로운 버그 리포트 간에 텍스트 유사도를 계산하는 계산부를 포함할 수 있으며, 상기 예측부는 상기 계산된 텍스트 유사도를 기반으로 하고, K-최근접 이웃(K-nearest Neighbor) 알고리즘을 이용하여 상기 새로운 버그 리포트의 심각성을 예측할 수 있으며, 상기 계산부는 벡터로 표현된 상기 새로운 버그 리포트와 KL 발산(Kullback-Leibler divergence)을 이용하여 상기 텍스트 유사도를 계산할 수 있다.Here, the predicting unit may include a second extracting unit that extracts common bug reports having the same characteristics from the past bug reports extracted by the first extracting unit, and a second extracting unit that calculates text similarity between the extracted common bug report and the new bug report Wherein the predicting unit can predict the severity of the new bug report based on the calculated text similarity and using a K-nearest neighbor algorithm, The text similarity can be calculated using the new bug report and Kullback-Leibler divergence.

한편, 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 방법은 모델링된 토픽을 기반으로 수신한 새로운 버그 리포트와 대응하는 토픽을 식별하고, 상기 식별된 토픽을 가진 과거 버그 리포트를 추출하는 단계, 상기 새로운 버그 리포트를 정정할 개발자를 추천하기 위하여, 상기 추출된 과거 버그 리포트를 이용하여 후보 개발자를 선정하는 단계, 상기 선정된 후보 개발자의 활동 경험 정보를 이용하여 상기 후보 개발자의 추천 순위를 연산하고, 상기 연산된 추천 순위를 기반으로 상기 새로운 버그 리포트를 정정할 개발자를 추천하는 단계를 포함할 수 있다.Meanwhile, the topic model and the multi-feature-based bug correction developer recommendation and bug severity prediction method according to an embodiment of the present invention identify a new bug report and a corresponding topic received based on the modeled topic, Selecting a candidate developer using the extracted past bug report to recommend a developer to correct the new bug report, selecting the candidate bug report based on the activity experience information of the selected candidate developer Calculating a recommendation rank of the candidate developer, and recommending a developer to correct the new bug report based on the calculated recommendation rank.

또한, 상기 과거 버그 리포트를 추출하는 단계는 상기 새로운 버그 리포트에 나타나는 토픽 용어의 빈도수를 이용하여 식별할 수 있으며, 상기 후보 개발자를 선정하는 단계는 상기 과거 버그 리포트로부터 추출된 담당자와 코멘터를 포함하는 제1 개발자와, 상기 과거 버그 리포트와 같은 특성을 갖는 버그 리포트로부터 추출된 제2 개발자를 기반으로 상기 후보 개발자를 선정할 수 있다. 이때, 상기 후보 개발자를 선정하는 단계는 상기 과거 버그 리포트 내에 포함된 제품, 구성 요소, 우선 순위, 심각성 중 적어도 하나 이상의 특성 정보를 이용하여 상기 제2 개발자를 추출할 수 있다.In addition, the step of extracting the past bug report may be identified using the frequency of the topic term appearing in the new bug report, and the step of selecting the candidate developer may include a representative and a commenter extracted from the past bug report And the second developer extracted from the bug report having the same characteristics as the past bug report. In this case, the step of selecting the candidate developer may extract the second developer using at least one characteristic information of the product, the component, the priority, and the severity included in the past bug report.

또한, 상기 개발자를 추천하는 단계는 상기 활동 경험 정보로서, 상기 후보 개발자가 버그 정정에 참여한 활동을 나타내는 코멘트(comments) 수와 커밋(commits) 수, 및 상기 후보 개발자가 버그 정정에 참여한 경험을 나타내는 할당(assignments) 수와 첨부 파일(attachment) 수를 이용할 수 있으며, 이때, 상기 개발자를 추천하는 단계는 상기 코멘트 수, 상기 커밋 수, 상기 할당 수가 높고, 상기 첨부 파일의 할당 수가 낮을수록, 상기 후보 개발자들의 추천 순위가 높은 것으로 판단할 수 있다.The step of recommending the developer may include the number of comments and commits indicating the activity in which the candidate developer participated in the bug correction and the experience of participating in the bug correction by the candidate developer, The number of assignments and the number of attachments may be used. In this case, the step of recommending the developer may include the step of recommending the developer that the number of comments, the number of commits, the number of assignments, It can be judged that the developer's recommendation rank is high.

또한, 본 발명의 방법은 상기 과거 버그 리포트를 추출하는 단계에서 추출된 과거 버그 리포트를 기반으로, 상기 새로운 버그 리포트의 심각성을 예측하는 단계를 더 포함할 수 있다.In addition, the method of the present invention may further include predicting the severity of the new bug report based on the past bug report extracted at the step of extracting the past bug report.

이때, 상기 심각성을 예측하는 단계는 상기 과거 버그 리포트를 추출하는 단계에서 추출된 과거 버그 리포트로부터 같은 특성을 갖는 공통 버그 리포트를 추출하는 단계와 상기 추출된 공통 버그 리포트와 상기 새로운 버그 리포트 간에 텍스트 유사도를 계산하는 단계를 포함할 수 있으며, 상기 심각성을 예측하는 단계는 상기 계산된 텍스트 유사도를 기반으로 하고, K-최근접 이웃(K-nearest Neighbor) 알고리즘을 이용하여 상기 새로운 버그 리포트의 심각성을 예측할 수 있으며, 상기 텍스트 유사도를 계산하는 단계는 벡터로 표현된 상기 새로운 버그 리포트와 KL 발산(Kullback-Leibler divergence)을 이용하여 상기 텍스트 유사도를 계산할 수 있다.The step of predicting the severity may include extracting a common bug report having the same characteristics from the past bug reports extracted at the extracting of the past bug reports, extracting a common bug report between the extracted common bug report and the new bug report, Estimating the severity of the new bug report based on the calculated text similarity and estimating the severity of the new bug report using a K-nearest neighbor algorithm, The step of calculating the text similarity may calculate the text similarity using the new bug report expressed by a vector and Kullback-Leibler divergence.

본 발명은 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템 및 방법에 관한 것으로, 보다 상세하게는 버그의 심각성을 예측하고, 소프트웨어의 개발 시간과 비용을 줄일 수 있는 버그 정정 개발자를 추천하는 시스템 및 방법을 제공할 수 있는 효과가 있다.The present invention relates to a bug-correcting developer recommendation and a bug severity prediction system and method based on a topic model and a multi-attribute, more particularly, to a bug-correcting developer capable of predicting the severity of a bug, It is possible to provide a recommended system and method.

본 발명은 제출되는 버그 리포트의 심각성을 예측함으로써 제출되는 버그 리포트를 효율적으로 정정할 수 있는 효과가 있다.The present invention has the effect of efficiently correcting a bug report submitted by predicting the severity of a submitted bug report.

본 발명은 복잡한 소프트웨어에서 발생하는 많은 버그를 정정하기 위한 적절한 버그 정정 개발자를 추천함으로써, 소프트웨어의 품질을 향상시키고 개발의 시간 및 비용을 절감할 수 있는 효과가 있다.The present invention has the effect of improving the quality of software and reducing the development time and cost by recommending an appropriate bug correction developer for correcting many bugs occurring in complicated software.

본 발명은 버그 리포트의 토픽 모델과, 버그 리포트의 제품, 구성 요소, 우선 순위 및 심각성을 포함하는 다중 특성을 이용하여, 보다 적절한 버그 정정 개발자를 추천하고, 버그 리포트의 심각성을 예측할 수 있는 효과가 있다.The present invention proposes a more appropriate bug correction developer by using multiple features including a topic model of a bug report and products, components, priority, and severity of a bug report, and has an effect of predicting the severity of a bug report have.

도 1은 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템의 구성도이다.
도 2는 본 발명의 제2 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템의 구성도이다.
도 3은 버그 ID 번호가 36465인 Eclipse JDT 버그 리포트의 세부 정보를 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따른 버그 정정 개발자 추천을 위한 프레임워크를 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 버그 리포트의 심각성 예측을 위한 작업 흐름을 나타낸 도면이다.
도 6은 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 방법에 대한 동작 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating a bug correction developer recommendation and a bug severity prediction system based on a topic model and multiple characteristics according to an embodiment of the present invention.
FIG. 2 is a block diagram of a bug-correcting developer recommendation and a bug severity predicting system based on a topic model and multiple characteristics according to a second embodiment of the present invention.
3 is a view showing details of an Eclipse JDT bug report having a bug ID number of 36465. FIG.
4 is a diagram illustrating a framework for bug correction developer recommendation according to an embodiment of the present invention.
5 is a diagram illustrating a workflow for predicting the severity of a bug report according to an exemplary embodiment of the present invention.
FIG. 6 is a flowchart illustrating an operation of recommending a bug correction developer and a bug severity prediction method based on a topic model and multiple characteristics according to an exemplary embodiment of the present invention.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략하기로 한다. 또한 본 발명의 실시예들을 설명함에 있어 구체적인 수치는 실시예에 불과하다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In the following description of the embodiments of the present invention, specific values are only examples.

도 1은 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템의 구성도이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram illustrating a bug correction developer recommendation and a bug severity prediction system based on a topic model and multiple characteristics according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템(100)은 제1 추출부(110), 선정부(120) 및 추천부(130)를 포함한다.Referring to FIG. 1, a bug correcting developer recommendation and bug severity predicting system 100 based on a topic model and multiple characteristics according to an embodiment of the present invention includes a first extracting unit 110, a selecting unit 120, (130).

제1 추출부(110)는 모델링된 토픽을 기반으로 수신한 새로운 버그 리포트와 대응하는 토픽(Topic)을 식별하고, 상기 식별된 토픽을 가진 과거 버그 리포트를 추출한다.The first extracting unit 110 identifies a new bug report and a corresponding topic received based on the modeled topic, and extracts a past bug report having the identified topic.

제1 추출부(110)는 상기 수신한 새로운 버그 리포트에 나타나는 토픽 용어의 빈도수를 이용하여, 상기 수신한 새로운 버그 리포트와 대응하는 토픽을 식별할 수 있다. 이때, 빈도 수는 높을수록 수신한 새로운 버그 리포트와 토픽이 일치하는 것을 의미한다.The first extracting unit 110 can identify the topic corresponding to the received new bug report using the frequency of the topic term shown in the received new bug report. At this time, the higher the frequency, the more the new bug report received matches the topic.

더 자세히 말하자면, 제1 추출부(110)는 먼저, 수신한 새로운 버그 리포트와 대응하는 토픽을 식별하기 위하여, 토픽을 모델링할 수 있다. 이때, 제1 추출부(110)는 토픽을 모델링하기 전에, 버그 리포트에 미리 정의된 필드(예를 들어, "요약(summary)", "설명(description)" 등)를, 단어 토큰(word tokens)으로 되돌리는(returns) coreNLP로 가져온다. 즉, 제1 추출부(110)는 단어 토큰을 얻기 위한 전처리(preprocessing)를 수행하기 위하여, 각각의 새로운 버그 리포트를 coreNLP로 가져온다.More specifically, the first extractor 110 may first model the topic to identify the corresponding new bug report and corresponding topic. At this time, the first extracting unit 110 extracts a predefined field (e.g., a "summary", "description", etc.) in the bug report, a word token ) To coreNLP. That is, the first extracting unit 110 fetches each new bug report to coreNLP in order to perform preprocessing to obtain a word token.

그런 다음, 제1 추출부(110)는 특성(features)으로써, 각 버그 리포트로부터 상기 단어 토큰을 연결시키고, 그리고, 각 토픽에 대한 용어(terms)를 생성하기 위하여 스탠포드 TMT(Stanford TMT)에 상기의 특성들을 플러그(plug)한다. 제1 추출부(110)는 수신한 새로운 버그 리포트와 대응하는 토픽을 식별하기 위하여, 단어 토큰을 사용하게 된다.The first extractor 110 then connects the word tokens from each bug report to the Stanford TMT to generate terms for each topic, Plugs the characteristics of the device. The first extractor 110 uses a word token to identify a new bug report and a corresponding topic received.

아래 표 1은 TMT으로부터 추출된 이클립스(Eclipse)의 각 토픽에 대한 용어를 나타낸다.Table 1 below shows the terms for each topic in Eclipse extracted from TMT.

즉, 본 발명의 시스템(100)은 모델링된 토픽으로 표 1과 같은 데이터를 저장하고 있을 수 있으며, 상기 표 1을 참조하면, 토픽 1(Topic-1)에는, 기 정의된 버그 리포트의 필드를 기반으로 전처리하여 획득한, "time", "set", "run", "execution" 등의 단어 토큰이 포함되어 있을 수 있고, 토픽 2(Topic-2)에는 "context", "text", help" 등의 단어 토큰이, 토픽 3(Topic-3)에는 "import", "export", "jar" 등의 단어 토큰이 포함되어 있을 수 있다.That is, the system 100 of the present invention may store data as shown in Table 1 as a modeled topic. Referring to Table 1, Topic-1 includes fields of predefined bug reports , "Text", "help", etc.) may be included in the topic 2 (Topic-2) , And word tokens such as "import "," export ", and "jar " may be included in Topic-3.

따라서, 제1 추출부(110)는 새로운 버그 리포트가 도착하면, 기 저장된 모델링된 토픽에서 수신한 새로운 버그 리포트와 매칭(matching)되는 토픽을 식별하여 선택하게 된다.Accordingly, when a new bug report arrives, the first extracting unit 110 identifies and selects a topic that matches the new bug report received in the pre-stored modeled topic.

이때, 제1 추출부(110)는 수신한 새로운 버그 리포트와 매칭되는 토픽을 선택할 때, 새로운 버그 리포트에 나타나는 토픽 용어의 빈도수를 계산함으로써 선택할 수 있다. 만약에, 빈도수가 가장 높은 경우, 주어진 새로운 버그 리포트는 토픽에 속해 있는 것처럼 고려할 수 있다. 예를 들어, 새로운 버그 리포트를 전처리한 결과, 새로운 버그 리포트에 토픽 2에 포함되어 있는 "context", "text", help" 등의 단어가 토픽 3의 "import", "export" 등의 단어 보다 많이 출현한 경우, 상기 새로운 버그 리포트는 토픽 2에 속하는 것으로 판단할 수 있다.At this time, when the first extracting unit 110 selects a topic to be matched with the received new bug report, the first extracting unit 110 can select the topic word by calculating the frequency of the topic term appearing in the new bug report. If the frequency is highest, a given new bug report can be considered as belonging to the topic. For example, if a new bug report is preprocessed, words such as "context", "text", and help "included in topic 2 will appear in the new bug report as" import "and" export " The bug report can be determined to belong to the topic 2.

그리고 제1 추출부(110)는 주어진 새로운 버그 리포트를 정정할 적절한 개발자를 추천하고 새로운 버그 리포트의 심각성을 예측하기 위하여, 수신한 새로운 버그 리포트와 같은 토픽에 속해 있는 과거 버그 리포트를 추출한다. 즉, 제1 추출부(110)는 수신한 새로운 버그 리포트가 토픽 2에 속할 경우, 토픽 2에 속한 과거 버그 리포트들을 모두 추출하게 된다. 즉, 본 발명은 모델링된 각 토픽에 대응하는 과거 버그 리포트들을 저장하고 있을 수 있다.Then, the first extracting unit 110 extracts past bug reports belonging to the same topic as the received new bug report, in order to recommend an appropriate developer to correct the given new bug report and to predict the seriousness of the new bug report. That is, when the received new bug report belongs to the topic 2, the first extracting unit 110 extracts all the past bug reports belonging to the topic 2. That is, the present invention may store past bug reports corresponding to each modeled topic.

본 발명에서는 주어진 새로운 버그 리포트를 정정할 적절한 개발자를 추천하고, 새로운 버그 리포트의 심각성을 예측하기 위하여, 앞서 설명한 버그 리포트의 토픽 모델 뿐만 아니라, 버그 리포트의 멀티-특성(multi-feature)을 이용하게 된다.In the present invention, in order to recommend an appropriate developer to correct a given new bug report and to predict the severity of a new bug report, a multi-feature of the bug report as well as a topic model of the bug report described above do.

이하에서는 본 발명에서 활용하는 멀티-특성의 개념에 대해 설명한다.Hereinafter, the concept of the multi-characteristic utilized in the present invention will be described.

버그 리포트는 자유 형식의 텍스트 콘텐츠(content)이며, 일반적으로 버그 리포트는 소스 코드 파일에 나타나는 기본값(defeauts)을 포함한다. 버그 리포트는 사전에 정의된 "Bug ID", "제목(Title)", "구성 요소(Component)" 등으로 구성되어 있다.Bug reports are free-form text content, and bug reports typically include a default value (defeauts) that appears in the source code file. The bug report consists of predefined "Bug ID", "Title", "Component" and so on.

도 3은 버그 ID 번호가 36465인 Eclipse JDT 버그 리포트의 세부 정보를 나타낸 도면이다.3 is a view showing details of an Eclipse JDT bug report having a bug ID number of 36465. FIG.

도 3을 참조하여, 버그 ID 번호가 36465인 Eclipse JDT 버그 리포트의 세부 정보를 살펴보면, 도 3의 버그 리포트는 Jason Sholl에 의해 제출되었고, 현재 상태는 "CLOSED"와 "FIXED"이고, Eplipse 제품(product)는 "JDT"로 분류되며, 구성 요소(Component)는 "Core"이다. 게다가 상기 버그 리포트는 수정되기 위하여 Philipe Mulet이라는 이름의 개발자에게 할당되었다.3, the bug report of FIG. 3 was submitted by Jason Sholl and the current status is "CLOSED" and "FIXED ", and the Eplipse product product) is classified as "JDT" and the component is "Core". In addition, the bug report has been assigned to a developer named Philipe Mulet to be modified.

상기 버그 리포트의 맨 밑에 부분에는, 도면상에 도시되지는 않았지만, 버그 설명(description), 첨부 파일(예를 들어, 패치(patch), 테스트 케이스(test case) 등), 코멘터(즉, 코멘트를 단 사람, commenters) 등과 같은 상세 정보가 제공된다. 이때, 버그 리포트에는 커밋(commits) 정보가 나타나 있지 않는 것에 주목할 필요가 있다.At the bottom of the bug report, a bug description, an attachment (e.g., a patch, a test case, etc.), a commenter (i.e., a comment, Commenters, etc.) are provided. Note that commits information is not shown in the bug report at this time.

실제로, 커밋은 버그 저장소(bug repository)의 이력 로그(history log)에 저장된다. 만약에 개발자가 코드 라인을 변경한 경우, 개발자들은 코드 변경을 선언하는 메시지를 보내야 한다. 이에 본 발명은 이러한 버그 리포트의 특성(feature)을 통해, 버그를 정정하는 개발자들의 경험 검증을 향상시킬 수 있다.In practice, commits are stored in the history log of the bug repository. If the developer changes the code line, the developer must send a message declaring the code change. Thus, the present invention can improve the experience verification of developers correcting bugs through the features of such bug reports.

즉, 버그 리포트에는 "제품(Product)", "구성 요소(Component)", "우선 순위(Priority)", "심각성(Severity)" 등과 같이 중요한 메타-필드(meta-fields)가 있다. 본 발명에서는 상기와 같은 중요한 메타-필드를 멀티-특성(multi-feature)으로서 채택하였다.That is, the bug report has important meta-fields such as "Product", "Component", "Priority", "Severity" In the present invention, such an important meta-field is adopted as a multi-feature.

멀티-특성(multi-feature) 중에서, "제품(Product)"은 Eclipse의 "JDT", "PDE"와 같이, 고객의 요구 사항(requirements)을 개발하는 첫번째 카테고리(category)를 나타낸다. 그리고, "구성 요소(Component)"는 "Debug", "UI'와 같이, 특정 제품에 속하는 서브-카테고리(sub-category)를 나타낸다. "우선 순위(Priority)"와 "심각성(Severity)"은 주어진 새로운 버그 리포트가 긴급한지 안한지의 정도를 나타내는 스케일 영향 요소이다.Among the multi-features, "Product" represents the first category that develops customer requirements, such as "JDT" and "PDE" in Eclipse. "Component" represents a sub-category belonging to a specific product, such as "Debug" and "UI." "Priority" and "Severity" It is a scale factor that indicates the degree to which a given new bug report is urgent or not.

또한, "우선 순위(Priority)"는 버그를 수정하는 우선 순위를 나타내는 것으로서, P1에서 P5까지 5단계를 포함하며, 여기서 P1은 가장 높은 우선 순위를, P5는 가장 낮은 우선 순위를 나타낸다. 예를 들어, 도 3의 버그 리포트 36465는 버그 리포트가 가능한 빨리 수정되어야 하는 것을 의미하는, 가장 높은 우선 순위인 P1을 가진다.Also, "Priority" indicates a priority for correcting a bug, and includes five steps from P1 to P5, where P1 indicates the highest priority and P5 indicates the lowest priority. For example, the bug report 36465 of FIG. 3 has the highest priority, P1, which means that the bug report should be corrected as soon as possible.

"심각성(Severity)"은 버그에 대한 심각성의 정도를 의미하며, 일반적인 방법으로 "심각성(Severity)"은 7 단계(예를 들어, 차단(Blocker), 비판적인(Critical), 주요한(Major), 보통(Normal), 작은(Minor), 사소한(Trivial), 향상(Enhancement))를 포함한다. 심각성 중 Enhancement 단계는 새로운 기능을 위해 필요한 버그가 아니지만, 본 발명에서는 심각성 예측에 이를 포함시킨다."Severity" refers to the degree of severity of a bug, and in a general way "Severity" can be divided into seven stages (eg Blocker, Critical, Major, (Normal), Minor, Trivial, Enhancement). Among the severities, the enhancement step is not a necessary bug for the new function, but the present invention includes it in the severity prediction.

상기의 7단계 중에서, Blocker, Critical, Major 단계는 프로그램에서 충돌(crashes), 데이터 손실(loss of data), 메모리 누수(memory leak)와 같은 심각한 문제를 나타낸다. 그리고, Normal, Minor, Trivial, Enhancement 단계는 작은 에러(minor errors), 미용 문제(cosmetic problem), 기능의 작은 손실(minor loss of function)을 나타낸다.Among the seven steps above, the Blocker, Critical, and Major stages represent serious problems in the program such as crashes, loss of data, and memory leaks. The Normal, Minor, Trivial, and Enhancement steps represent minor errors, cosmetic problems, and minor loss of function.

특히 Blocker는 가장 높은 심각성을 갖는 버그 유형을 나타내고, Trivial은 가장 낮은 심각성을 갖는 버그 유형을 나타낸다. 예를 들어, 도 3의 버그 리포트 36465는 "Blocker"로 표시되어 있는 바, 이는 가장 심각한 오류임을 알 수 있다.In particular, Blocker indicates the type of bug with the highest severity, and Trivial indicates the type of the bug with the lowest severity. For example, the bug report 36465 of FIG. 3 is labeled "Blocker ", which is the most serious error.

본 발명에서는 토픽 모델과 멀티-특성을 이용함으로써, 새로운 버그를 정정하기 위해 적절한 개발자를 추천하고, 버그의 심각성을 예측하는 기술을 제공한다.The present invention provides techniques for predicting the seriousness of a bug, recommending a suitable developer to correct a new bug by using a topic model and multi-characteristics.

한편, 선정부(120)는 상기 수신한 새로운 버그 리포트를 정정할 적절한 개발자를 추천하기 위하여, 상기 추출된 과거 버그 리포트를 이용하여 후보 개발자를 선정한다. 이때, 후보 개발자는 토픽 모델을 이용한 개발자들 집합과 특성 정보를 이용한 개발자들 집합의 교집합으로부터 선정할 수 있다.On the other hand, the selection unit 120 selects a candidate developer using the extracted past bug report to recommend an appropriate developer to correct the received new bug report. In this case, the candidate developer can select from a set of developers using the topic model and a set of developers using the feature information.

선정부(120)는 상기 과거 버그 리포트로부터 추출된 담당자(assigness)와 코멘터(commenters)를 포함하는 제1 개발자(즉, 토픽 모델을 이용하여 추출된 개발자들의 집합)와, 상기 과거 버그 리포트와 같은 특성을 갖는 버그 리포트로부터 추출된 제2 개발자(즉, 다중-특성을 이용하여 추출된 개발자들의 집합)를 기반으로 상기 후보 개발자를 선정할 수 있다.The selection unit 120 may include a first developer (that is, a set of developers extracted using the topic model) that includes assigners and commenters extracted from the past bug reports, The candidate developer can be selected based on the second developer extracted from the bug report having the same characteristics (i.e., the set of developers extracted using the multi-characteristic).

이때, 선정부(120)는 상기 과거 버그 리포트 내에 포함된 제품(Product), 구성 요소(Component), 우선 순위(Priority), 심각성(Severity) 중 적어도 하나 이상의 멀티-특성(Multi-Feature) 정보를 이용하여 상기 제2 개발자를 추출할 수 있다.At this time, the selection unit 120 may generate at least one of Multi-Feature information of Product, Component, Priority, and Severity included in the past bug report. The second developer can be extracted.

더 자세히 말하자면, 선정부(120)는 제1 추출부(110)가 새로운 버그 리포트와 동일한 토픽을 가진 과거 버그 리포트를 추출하면, 상기 추출된 과거 버그 리포트로부터 해당 버그 리포트를 정정하기 위해 참여했던 모든 개발자들(이때의 참여 개발자는 제1 개발자로서, 담당자(assigness)와 코멘터(commenters)를 포함할 수 있음)을 추출하며, 뿐만 아니라 상기 추출된 과거 버그 리포트와 같은 특성(즉, 이때 특성은 제품(Product), 구성 요소(Component), 우선 순위(Priority), 심각성(Severity)을 포함하는 멀티-특성을 의미함)을 가진 버그 리포트로부터 해당 버그 리포트를 정정하기 위해 참여했던 개발자들(이때의 참여 개발자는 제2 개발자를 의미함)을 추출할 수 있다.More specifically, when the first extracting unit 110 extracts a past bug report having the same topic as the new bug report, the selecting unit 120 extracts, from the extracted past bug report, all The developer (the participating developer at this time is the first developer, which may include assigness and commenters), as well as extracting characteristics such as the extracted past bug report (i.e., Developers who have participated in correcting the bug report from a bug report that has a multi-feature that includes product, component, priority, and severity And the participating developer means the second developer).

선정부(120)는 제1 개발자와 제2 개발자를 추출한 후, 제1 개발자와 제2 개발자의 교집합을 후보 개발자로서 선정할 수 있다. 이때, 새로운 버그를 정정할 후보 개발자들의 리스트는 수학식 1에 의해 생성될 수 있다. 이하 수학식 1은 후보 개발자의 교집합 프로세스(process)를 나타낸다.The selection unit 120 can extract the first developer and the second developer and then select the intersection of the first developer and the second developer as candidate developers. At this time, a list of candidate developers to correct a new bug can be generated by Equation (1). Equation (1) below represents a process of intersection of candidate developers.

수학식 1에서 D는 버그 리포트를 정정하는 데에 기여한 후보 개발자들의 집합을 의미하며, 이러한 버그 리포트들은 같은 토픽, 제품(product), 같은 구성 요소(component), 같은 우선 순위(priority), 같은 심각성(severity)을 갖는다. 즉, 선정부(120)는 새로운 버그 리포트와 같은 토픽을 갖는 과거 버그 리포트로부터 추출된 개발자 뿐만 아니라, 새로운 버그 리포트와는 다른 멀티-특성을 가지는 버그 리포트로부터 추출된 개발자를 이용함으로써, 후보 개발자를 선정할 수 있다.In Equation (1), D represents a set of candidate developers who have contributed to correcting a bug report. These bug reports include the same topic, product, same component, same priority, (severity). That is, the selection unit 120 uses a developer extracted from a bug report having multi-characteristics different from a new bug report, as well as a developer extracted from a past bug report having a topic such as a new bug report, Can be selected.

한편, 추천부(130)는 상기 선정된 후보 개발자의 활동 경험 정보(예를 들어, 코멘트 수, 커밋 수, 할당 수, 첨부 파일 수 등)를 이용하여 상기 후보 개발자의 추천 순위(즉, 랭킹)를 연산하고, 상기 연산된 추천 순위를 기반으로 상기 새로운 버그 리포트를 정정할 개발자를 추천한다.The recommendation unit 130 may be configured to determine a recommendation rank (i.e., ranking) of the candidate developer using the activity experience information (e.g., the number of comments, the number of commits, the number of assignments, And recommends a developer to correct the new bug report based on the calculated recommendation ranking.

추천부(130)는 상기 선정된 후보 개발자들의 활동 경험 정보로서, 상기 후보 개발자가 버그 정정에 참여한 활동을 나타내는 코멘트(comments) 수와 커밋(commits) 수, 및 상기 후보 개발자가 버그 정정에 참여한 경험을 나타내는 할당(assignments) 수와 첨부 파일(attachment) 수를 이용할 수 있다.The recommendation unit 130 may include activity experience information of the selected candidate developers, which includes the number of comments and commits indicating the activity in which the candidate developer participated in the bug correction, and the number of commits that the candidate developer has participated in bug correction The number of assignments and the number of attachments indicating the number of attachments.

이때, 추천부(130)는 상기 코멘트 수, 상기 커밋 수, 상기 할당 수가 높고, 상기 첨부 파일의 할당 수가 낮을수록, 상기 후보 개발자들의 추천 순위가 높은 것으로 판단할 수 있다. 이에 대해 자세히 설명하면 다음과 같다.At this time, the recommendation unit 130 can determine that the recommendation rank of the candidate developers is higher as the number of comments, the number of commits, the number of assignments, and the number of assignments of attachments are lower. This is explained in detail as follows.

추천부(130)는 새로운 버그를 정정할 적절한 개발자를 추천하기 위하여, 소셜 네트워크를 통해 후보 개발자들의 활동 정보(예를 들어, 코멘트와 커밋 활동)를 고려할 수 있다. 즉, 추천부(130)는 먼저, 후보 개발자들의 행동(혹은 활동)을 분석하기 위해, Eclipse, Mozilla and Netbeans 등의 저장소로부터 XML 파일을 파싱(parsing)함으로써, 후보 개발자들의 활동을 수집할 수 있다. 이때, 파싱은 다른 형식으로 저장된 데이터를 원하는 형식의 데이터로 변환하는 방식을 의미한다.The recommendation unit 130 may consider activity information (e.g., comments and commit activities) of the candidate developers via the social network in order to recommend an appropriate developer to correct a new bug. That is, the recommendation unit 130 may first collect the activity of the candidate developers by parsing the XML file from the repository of Eclipse, Mozilla and Netbeans, etc., in order to analyze the behavior (or activity) of the candidate developers . At this time, parsing means a method of converting data stored in another format into data of a desired format.

또한, 본 발명에서 추천부(130)는 적절한 개발자 추천 성능을 향상시키기 위한 요소로서, 코멘트(comment) 수와 커밋(commit) 수 뿐만 아니라, 할당(assignments) 수와 첨부 파일(attachments) 수를 고려할 수 있다.In addition, in the present invention, the recommendation unit 130 is an element for improving the developer recommendation performance as appropriate, considering not only the number of comments and commits, but also the number of assignments and the number of attachments .

그리고, 추천부(130)는 상기 후보 개발자들의 활동 경험 정보(예를 들어, 코멘트 수, 커밋 수, 할당 수, 첨부 파일 수 등)를 기반으로 한 수학식 2를 통해, 후보 개발자들의 추천 순위(즉, 랭킹)를 연산할 수 있다. 수학식 2는 후보 개발자들의 추천 순위(즉, 랭킹 점수)를 연산하는 계산법을 나타낸다.Then, the recommendation unit 130 obtains the recommendation rankings of the candidate developers through Equation (2) based on the activity experience information (e.g., the number of comments, the number of commits, the number of assignments, That is, ranking). Equation (2) represents a calculation method for calculating a recommendation rank (i.e., a ranking score) of candidate developers.

여기에서, Dev_assignments는 버그 정정을 위해 개발자 Dev에게 할당된 수를 나타내고, Dev_attachments는 Dev에게 할당된 버그 리포트에 있는 첨부 파일(예를 들어, 업로드된 패치 파일 등)의 수를 나타낸다. 또한, Dev_commit은 Dev에 의해 보내진 커밋(예를 들어, 버그 리포트의 이력에 대한 변경) 수를 나타내고, Dev_comment는 Dev에 의해 달려진 코멘트 수를 나타낸다. n은 토픽 수를 나타낸다.Where Dev _assignments represents the number assigned to developer Dev for bug correction and Dev _attachments represents the number of _attachments (e.g., uploaded patch files, etc.) in the bug report assigned to Dev. Dev _commit indicates the number of commits sent by the Dev (for example, changes to the history of the bug report), and Dev _comment indicates the number of comments run by Dev. n represents the number of topics.

수학식 2를 참조하면, 본 발명은 버그 정정에서 개발자들의 활동 경험을 더욱 명확히 구분하기 위하여, 소셜 네트워크에서 후보 개발자들에 의해 게시된 코멘트와 커밋 수 외에, 할당 수와 첨부 파일 수를 고려하였다.Referring to Equation (2), the present invention considers the number of assignments and the number of attachments in addition to the number of comments and commits posted by candidate developers in the social network in order to more clearly distinguish developers' experience in bug correction.

이때, 경험적 분석(empirical aesthetics) 면에서 보았을 때, 할당 수가 높을수록 개발자들의 경험이 더 많은 것으로 생각할 수 있다. 또한, 할당 수와 마찬가지로, 커밋 수는 개발자들의 경험을 증명하는 데에 이용될 수 있다.In this case, the more empirical aesthetics the higher the number of assignments, the more experience developers can have. Also, like the number of allocations, the number of commits can be used to demonstrate the developer's experience.

한편, Hooimeijer와 Weimer가 발표한 논문("Modeling bug report quality," Proc. of IEEE/ACM Inlernational Conference on Automated Software Engineering 2007, pp. 34-43.)에 따르면, Hooimeijer와 Weimer는 코멘트 수가 양의 계수임을 증명하였으며, 그 이유는 해당 버그가 정정될 가능성이 높으므로, 할당된 버그가 사용자의 관심을 더 많이 받았다는 것을 의미하기 때문이다. 또한 그들은 첨부 파일 수가 음의 계수임을 발견했으며, 그 이유는 첨부 파일의 존재는 버그를 수정하기 위해 증가된 비용을 의미하기 때문이다.On the other hand, according to Hooimeijer and Weimer (Hooimeijer and Weimer), according to the article "Modeling bug report quality," Proceedings of IEEE / ACM Inlernational Conference on Automated Software Engineering 2007, , Because the bug is likely to be corrected, which means that the bug allocated is more of a user's interest. They also found that the number of attachments is negative, because the presence of attachments represents an increased cost to fix bugs.

이에 따라, 본 발명의 추천부(130)는 후보 개발자들의 추천 순위를 연산하기 위하여, 수학식 2에서 Dev_assignments, Dev_commit, Dev_comment는 분자로, Dev_attachments는 분모로 지정하였다. 즉, 추천부(130)는 상기 코멘트 수, 상기 커밋 수, 상기 할당 수가 높고, 상기 첨부 파일의 할당 수가 낮을수록, 상기 후보 개발자들의 추천 순위가 높은 것으로 판단할 수 있다. 추천부(130)는 각 후보 개발자들의 DRScore를 계산함으로서, 상기 후보 개발자의 추천 순위(즉, 랭킹)를 연산할 수 있으며, 상기 연산된 추천 순위를 기반으로 상기 새로운 버그 리포트를 정정할 적절한 개발자를 추천할 수 있다.Accordingly, the recommendation unit 130 of the present invention assigns Dev _assignments , Dev _commit , Dev _comment as a numerator, and Dev _attachments as a denominator in Equation (2) to calculate a recommendation rank of candidate developers. That is, the recommendation unit 130 can determine that the recommendation rank of the candidate developers is high as the number of comments, the number of commits, the number of assignments, and the number of assignments of the attached files are low. The recommendation unit 130 may calculate the recommendation rank (i.e., ranking) of the candidate developer by calculating the DRScore of each candidate developer, and may calculate an appropriate developer to correct the new bug report based on the calculated recommendation rank Recommended.

도 4는 본 발명의 일 실시예에 따른 버그 정정 개발자 추천을 위한 프레임워크를 나타낸 도면이다.4 is a diagram illustrating a framework for bug correction developer recommendation according to an embodiment of the present invention.

상기에 자세히 설명된 내용을 기반으로, 이하에서는 본 발명의 일 실시예에 따른 버그 정정 개발자 추천을 위한 프레임워크를 간단히 설명하기로 한다.Based on the contents described in detail above, a framework for bug correction developer recommendation according to an embodiment of the present invention will be briefly described below.

도 4를 참조하면, 본 발명의 시스템(100)은 수신한 새로운 버그 리포트를 정정할 적절한 개발자를 추천하기 위하여, 먼저, 스텝 1(Step 1)에서는 제1 추출부(110)를 통해, 모델링된 토픽을 기반으로 수신한 새로운 버그 리포트와 대응하는 토픽을 식별하고, 상기 식별된 토픽을 가진 과거 버그 리포트를 추출한다.Referring to FIG. 4, in order to recommend an appropriate developer to correct a received new bug report, the system 100 of the present invention firstly extracts, from the model 1 through the first extracting unit 110, Identifies a topic corresponding to a new bug report received based on the topic, and extracts a past bug report having the identified topic.

그리고 나서, 스텝 1에서 선정부(120)는 상기 추출된 과거 버그 리포트를 이용하여 후보 개발자를 선정한다. 이때, 후보 개발자는 과거 버그 리포트로부터 추출된 담당자(assigness)와 코멘터(commenters)를 포함하는 제1 개발자(토픽 모델을 이용한 개발자)와, 상기 과거 버그 리포트와 같은 특성을 갖는 버그 리포트로부터 추출된 제2 개발자(다중-특성을 이용한 개발자)를 기반으로 선정될 수 있다. 이에 대한 설명은 상기에 자세히 설명되어 있으므로, 이를 참조하기로 한다.Then, in Step 1, the selection unit 120 selects a candidate developer using the extracted past bug report. At this time, the candidate developer may include a first developer (a developer using a topic model) including assigners and commenters extracted from past bug reports, and a first developer (a developer using a topic model) extracted from bug reports having the same characteristics as the past bug reports And may be selected based on the second developer (developer using multi-characteristics). This is described in detail above, and will be referred to here.

그리고 스텝 2(Step 2)는 상기 후보 개발자들의 추천 순위를 연산하기 위하여, 추천부(130)를 통해, 각 후보 개발자들의 활동 경험 정보를 추출할 수 있다. 이때, 후보 개발자들의 활동 경험 정보로서, 상기 후보 개발자가 버그 정정에 참여한 활동을 나타내는 코멘트(comments) 수와 커밋(commits) 수, 및 상기 후보 개발자가 버그 정정에 참여한 경험을 나타내는 할당(assignments) 수와 첨부 파일(attachment) 수가 추출될 수 있다.In step 2, the activity information of each candidate developer can be extracted through the recommendation unit 130 to calculate the recommendation rank of the candidate developers. At this time, as the activity experience information of the candidate developers, the number of comments and commits indicating the activity in which the candidate developer participated in the bug correction and the number of assignments indicating the experience that the candidate developer participated in the bug correction And the number of attachments can be extracted.

마지막으로, 스텝 3(Step 3)에서는 추천부(130)를 통해, 상기 스텝 2에서 추출된 후보 개발자들의 활동 경험 정보를 기반으로, 상기 후보 개발자의 추천 순위(즉, 랭킹 순위)를 연산하고, 상기 연산된 추천 순위를 기반으로 상기 새로운 버그 리포트를 정정할 개발자를 추천할 수 있다. 이에 대한 설명은 상기에 자세히 설명했으므로, 이를 참조하기로 한다.Finally, in Step 3, the recommendation unit 130 calculates the recommendation rank (i.e., ranking ranking) of the candidate developer based on the activity experience information of the candidate developers extracted in Step 2, And recommend a developer to correct the new bug report based on the calculated recommendation rank. The description thereof has been described above in detail, so that reference will be made thereto.

한편, 도 2는 본 발명의 제2 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템의 구성도이다.FIG. 2 is a block diagram of a bug-correcting developer recommendation and bug severity prediction system based on a topic model and multiple characteristics according to a second embodiment of the present invention.

도 2를 참조하면, 본 발명의 제2 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템(100)은 제1 추출부(110)와 예측부(140)를 포함할 수 있다. 그리고 예측부(140)는 제2 추출부(141)와 계산부(142)를 포함할 수 있다. 우선, 각 구성의 역할에 대해 간단히 설명하면 다음과 같다.Referring to FIG. 2, a bug-correcting developer recommendation and bug severity prediction system 100 based on a topic model and multiple characteristics according to a second embodiment of the present invention includes a first extracting unit 110 and a predicting unit 140 can do. The prediction unit 140 may include a second extraction unit 141 and a calculation unit 142. First, the role of each configuration will be briefly described as follows.

제1 추출부(110)에 대한 설명은 상기에 자세히 설명했으므로, 이하 생략하기로 한다. 예측부(140)는 제1 추출부(110)에서 추출된 상기 식별된 토픽을 가진 과거 버그 리포트를 기반으로, 상기 새로운 버그 리포트의 심각성을 예측한다.Since the description of the first extracting unit 110 has been described in detail above, the following description is omitted. The predicting unit 140 predicts the severity of the new bug report based on the past bug report having the identified topic extracted by the first extracting unit 110.

제2 추출부(141)는 상기 제1 추출부에서 추출된 상기 식별된 토픽을 가진 과거 버그 리포트로부터 같은 특성을 갖는 공통 버그 리포트를 추출한다. 즉, 제2 추출부(141)는 상기 식별된 토픽을 가진 과거 버그 리포트들을 기반으로, 같은 특성을 지닌 버그 리포트들끼리 집합으로 형성하고, 상기 형성된 집합들의 교집합을 공통 버그 리포트로서 추출할 수 있다.The second extracting unit 141 extracts common bug reports having the same characteristics from the past bug reports having the identified topics extracted by the first extracting unit. That is, the second extracting unit 141 may form a collection of bug reports having the same characteristics based on the past bug reports having the identified topic, and extract the intersection of the formed sets as a common bug report .

계산부(142)는 상기 추출된 공통 버그 리포트와 상기 새로운 버그 리포트 간에 텍스트 유사도를 계산한다. 이때, 계산부(142)는 벡터로 표현된 상기 새로운 버그 리포트(즉, smoothed UM)와 KL 발산(Kullback-Leibler divergence)을 이용하여 상기 텍스트 유사도를 계산할 수 있다.The calculation unit 142 calculates text similarity between the extracted common bug report and the new bug report. At this time, the calculation unit 142 may calculate the text similarity using the new bug report (i.e., smoothed UM) and KLB (Kullback-Leibler divergence) represented by a vector.

그리고 예측부(140)는 계산부(142)에서 계산된 텍스트 유사도를 기반으로 하고, K-최근접 이웃(K-nearest Neighbor) 알고리즘을 이용하여 상기 새로운 버그 리포트의 심각성을 예측할 수 있다.The predicting unit 140 may predict the severity of the new bug report based on the text similarity calculated by the calculating unit 142 and using a K-nearest neighbor algorithm.

이에 대해 보다 자세히 설명하면 다음과 같다. 앞서 도 1을 참조한 실시예에서는, 새로운 버그 리포트를 정정할 적절한 개발자를 추천하기 위하여, "제품(Product)", "구성 요소(Component)", "우선 순위(Priority)", "심각성(Severity)"을 포함하는 멀티-특성(multi-feature)을 이용하였지만, 이하의 실시예에서는, 새로운 버그 리포트의 심각성을 예측하기 위하여, "심각성(Severity)"을 제외한 "제품(Product)", "구성 요소(Component)", "우선 순위(Priority)"를 포함하는 멀티-특성(multi-feature)을 이용하기로 한다.This will be described in more detail as follows. 1, in order to recommend an appropriate developer to correct a new bug report, the "Product", "Component", "Priority", "Severity" Quot ;, but in the following embodiments, in order to predict the severity of a new bug report, " product ", "component ", "Quot;, " Component ", and "Priority ".

본 발명의 시스템(100)은 수신한 버그 리포트의 심각성을 예측하기 위하여, "제품(Product)", "구성 요소(Component)", "우선 순위(Priority)"를 포함하는 멀티-특성(multi-feature)을 이용해 해당 버그 리포트를 필터링한 후, 잠재적인(potential) 버그 리포트를 생성할 수 있다. 그리고 본 발명의 시스템(100)은 KL 발산을 통해 새로운 버그 리포트와 과거 버그 리포트 사이에 유사도를 계산하고, 새로운 버그의 심각성을 예측하기 위하여 KNN에 상기의 유사성 측정 값을 적용함으로써, 새로운 버그 리포트의 심각성을 예측할 수 있다. 이에 대한 설명은 도 5를 참조하여 더 자세히 설명하기로 한다.The system 100 of the present invention can be used to predict the severity of a received bug report based on a multi-attribute, including "Product," "Component," "Priority, feature to filter the bug report and generate a potential bug report. The system 100 of the present invention calculates the similarity between the new bug report and the past bug report through KL divergence and applies the similarity measure to the KNN to predict the severity of the new bug, Severity can be predicted. This will be described in more detail with reference to FIG.

도 5는 본 발명의 일 실시예에 따른 버그 리포트의 심각성 예측을 위한 작업 흐름을 나타낸 도면이다.5 is a diagram illustrating a workflow for predicting the severity of a bug report according to an exemplary embodiment of the present invention.

도 5를 참조하면, 본 발명의 시스템(100)은 수신한 새로운 버그 리포트의 심각성을 예측하기 위하여, 앞서 도 1을 참조한 실시예와 마찬가지로 토픽 모델과 멀티-특성을 이용할 수 있다.Referring to FIG. 5, the system 100 of the present invention can use a topic model and a multi-characteristic in order to predict the severity of a received new bug report, as in the embodiment of FIG.

먼저, 도 5의 스텝 1(Step 1)에서 보는 바와 같이, 제1 추출부(110)는 먼저, 수신한 새로운 버그 리포트가 속한 토픽을 식별하고, 상기 식별된 토픽을 포함하는 과거 버그 리포트, 즉 새로운 버그 리포트와 동일한 토픽을 가진 과거 버그 리포트를 추출한다.First, as shown in step 1 of FIG. 5, the first extracting unit 110 first identifies a topic to which the received new bug report belongs, and outputs a past bug report including the identified topic, that is, Extract past bug reports with the same topic as the new bug report.

그리고 스텝 1에서 제2 추출부(141)는 추출된 과거 버그 리포트로부터 같은 특성을 갖는 공통 버그 리포트를 추출한다. 이때, 제2 추출부(141)는 제1 추출부(110)에서 추출된 과거 버그 리포트를 기반으로, 같은 특성(예를 들어, 제품, 구성 요소, 우선 순위)을 가진 버그 리포트들끼리 하나의 집합을 형성한다. 그리고 각 특성끼리 형성된 집합들의 교집합은 공통 버그 리포트가 된다. 이하 수학식 3은 버그 리포트 집합의 교집합 프로세스를 나타낸다.In step 1, the second extracting unit 141 extracts common bug reports having the same characteristics from the extracted past bug reports. At this time, the second extracting unit 141 extracts bug reports having the same characteristics (e.g., product, component, priority) based on the past bug report extracted from the first extracting unit 110 Form a set. And the intersection of sets formed by each property is a common bug report. Equation (3) below represents the intersection process of the bug report set.

즉, 수학식 3을 참조하면, 제2 추출부(141)는 새로운 버그 리포트가 속하는 토픽을 갖는 과거 버그 리포트들을 기반으로, 동일한 "제품(product)" 특성을 갖는 버그 리포트의 집합, 동일한 "구성 요소(component)" 특성을 갖는 버그 리포트의 집합, 및 동일한 "우선 순위(priority)" 특성을 갖는 버그 리포트의 집합을 형성할 수 있다. 그리고, 제2 추출부(141)는 각 집합들의 교집합으로부터, 필터핑된 버그 리포트 B(즉, 최종 추출된 버그 리포트)를 획득할 수 있다.Referring to Equation (3), the second extracting unit 141 extracts a set of bug reports having the same "product" property, the same " A set of bug reports having a " component "property, and a set of bug reports having the same" priority " Then, the second extracting unit 141 can obtain the filtered bug report B (i.e., the finally extracted bug report) from the intersection of the sets.

다음으로, 스텝 2에서 계산부(142)는 smoothed UM과 KL 발산(Kullback-Leibler divergence)을 이용하여, 새로운 버그 리포트와 제2 추출부(141)에서 획득한 필터링된 버그 리포트 B와 사이에, 텍스트 유사성을 계산한다.Next, in step 2, the calculator 142 calculates the difference between the new bug report and the filtered bug report B acquired by the second extracting unit 141, using the smoothed UM and KL divergence (Kullback-Leibler divergence) Calculate text similarity.

이때, smoothed UM과 KL 발산(Kullback-Leibler divergence)에 대해 자세히 설명하면 다음과 같다.Here, the smoothed UM and KL divergence will be described in detail.

상기 smoothed UM(smoothed unigram model)은 평활화된 유니그램 모델을 의미하는 것으로서, 이는 특정 문서에 대한 확률 벡터의 발생 확률이 0이 되는 것을 방지하기 위하여, 문서의 전체 컬렉션에서 특정 단어의 발생 확률을 측정하는 컬렉션 모델을 사용함으로써 유니그램 모델을 smoothed하게 만든 것을 의미한다. 본 발명은 smoothed UM을 이용함으로써, 필터링된 버그 리포트 B와 새로운 버그 리포트 간에 텍스트 유사성을 계산할 수 있다.The smoothed UM (smoothed unigram model) means a smoothed unigram model, which measures the probability of occurrence of a specific word in the entire collection of documents in order to prevent the occurrence probability of a probability vector for a specific document from becoming zero Which makes the unigram model smoothed. By using smoothed UM, the present invention can calculate text similarity between filtered bug report B and new bug report.

각 문서의 유니그램(unigram)은 벡터 공간 상에 둘 이상의 확률 변수로 구성된 벡터(즉, 확률 벡터)로 표현될 수 있다.The unigram of each document can be represented as a vector (i.e., a probability vector) composed of two or more random variables on the vector space.

하지만, 만약 어떤 단어가 한 문서에서 발생하지 않았을 경우, 이 문서에 대한 확률 벡터의 발생 확률은 0이 되며, 이러한 현상은 확률 수식에서 문제를 야기할 수 있다. 따라서, 이러한 문제를 해결하기 위하여, 최초의 유니그램 모델은 문서들의 전체 컬렉션에서 특정 단어의 발생 확률을 측정하는 컬렉션 모델을 사용함으로써 유니그램 모델(unigram model)을 평활(smoothed)하게 만든다.However, if a word does not occur in one document, the probability of occurrence of the probability vector for this document becomes zero, and this phenomenon can cause problems in the probability expression. Thus, to solve this problem, the first unigram model makes the unigram model smoothed by using a collection model that measures the probability of occurrence of a particular word in the entire collection of documents.

아래에 정의된 수학식 4는 K번째 문서의 smoothed UM을 나타내며, 이 식의 두번째 부분은 컬렉션 모델을 나타낸다. 다시 말해, 수학식 4의 두번째 부분은, 만약에 특정 단어가 버그 리포트에서 전혀 발생하지 않았을 경우에 발생 확률이 0이 되는 것을 회피하기 위해 평활화된 유니그램 모델(smoothed UM)을 사용한, 컬렉션 모델을 의미한다. 이하 수학식 4는 smoothed UM을 나타낸다.Equation 4 defined below represents the smoothed UM of the Kth document, the second part of which expresses the collection model. In other words, the second part of equation (4) is to use a smoothed UM model to avoid a probability of occurrence of zero if a particular word did not occur at all in the bug report, it means. Equation (4) represents smoothed UM.

이때, ω는 용어를 의미하고,

는 k 리포트의 가중치 벡터(weight vector)를 의미한다.

는 k 리포트 안에 단어 수를 의미하며, ω_k(n)은 k 리포트 안에 n번째 출현 빈도를 나타낸다. K는 리포트의 총 개수를 의미하며, μ는 수학식 1에서 두 부분의 서로 다른 가중치를 의미한다.In this case,? Denotes a term,

Denotes a weight vector of the k-th report.

Denotes the number of words in the k-report, and ω _k (n) denotes the n-th occurrence frequency in the k-report. K denotes the total number of reports, and μ denotes different weights of the two parts in Equation (1).

한편, 계산부(142)는 KL 발산(Kullback-Leibler divergence)을 이용함으로써, 필터링된 버그 리포트 B와 새로운 버그 리포트 간에 텍스트 유사성을 계산할 수 있다.On the other hand, the calculation unit 142 can calculate the text similarity between the filtered bug report B and the new bug report by using KLB (Kullback-Leibler divergence).

KL 발산은 새로운 버그 리포트와 필터링된 버그 리포트 B(즉, 과거 버그 리포트) 사이에 유사도를 계산하기 위해 사용되는 것으로서, 두 확률 분포의 차이를 계산하는 데에 사용하는 함수이다. 이때, KL 발산이 높을수록 유사도는 낮은 것을 의미한다.KL divergence is a function used to calculate the similarity between a new bug report and a filtered bug report B (ie, a past bug report), which is used to calculate the difference between two probability distributions. At this time, the higher the KL divergence, the lower the similarity.

즉, 이하 수학식 5는 KL 발산을 이용한 유사도 계산법을 나타낸다.That is, Equation (5) represents a method of calculating the degree of similarity using KL divergence.

이때,

는 쿼리 q에서 용어 ω가 나타나는 확률을 의미하고,

는 버그 리포트 k에서 용어 ω가 나타나는 확률을 의미한다.At this time,

Denotes the probability that the term? Appears in the query q,

Means the probability that the term ω appears in the bug report k.

따라서, 계산부(142)는 smoothed UM과 KL 발산(Kullback-Leibler divergence)을 이용하여, 새로운 버그 리포트와 필터링된 버그 리포트 사이에, 텍스트 유사성을 계산할 수 있다. 즉, 본 발명의 시스템(100)은 상기 수학식 5에 따른 KL 발산을 이용함으로써, 버그 리포트 간에 텍스트 유사성을 측정할 수 있다.Accordingly, the calculator 142 can calculate the text similarity between the new bug report and the filtered bug report using the smoothed UM and KL divergence (Kullback-Leibler divergence). That is, the system 100 of the present invention can measure the text similarity between bug reports by using the KL divergence according to Equation (5) above.

마지막으로, 스텝 3에서 예측부(140)는 상기의 텍스트 유사도 값을 가져옴으로써, 차단(Blocker), 비판적인(Critical), 주요한(Major), 보통(Normal), 작은(Minor), 사소한(Trivial) 또는 향상(Enhancement) 중 하나로 상기 새로운 버그 리포트의 심각성을 예측하기 위하여, R 언어에서 실행되는 KNN을 사용한다.Lastly, in step 3, the predicting unit 140 obtains the text similarity value as described above, thereby generating Blocker, Critical, Major, Normal, Minor, Trivial, Or KNN executed in the R language in order to predict the severity of the new bug report in one of the enhancement or enhancement.

즉, 본 발명의 시스템(100)은 새로운 버그가 도착했을 때, 주어진 새로운 버그 리포트와 같은 토픽과 같인 멀티-특성을 가지는 상위 k와 유사한 과거 버그 리포트를 검색하기 위하여, KNN을 사용하며, 본 발명은 K-최근접 이웃(K-nearest Neighbor) 알고리즘의 심각성 라벨(severity labels)을 고려함으로써, 새로운 버그에 속한 심각도 상태를 예측할 수 있다.That is, the system 100 of the present invention uses KNN to retrieve past bug reports similar to the top k with multi-characteristics that are the same as the given new bug report, when the new bug arrives, Can consider the severity labels of the new bug by considering the severity labels of the K-nearest neighbor algorithm.

KNN은 비모수적(non-parametric) 지연 학습 알고리즘을 나타내는 것으로서, 이는 주어진 인스턴스(instance)(이때, 인스턴스는 어떤 집합 내의 개별적인 요소를 의미함)와 유사한 상위 K 개체(objects)를 찾는 기술을 의미한다.KNN represents a non-parametric delayed learning algorithm, which means a technique for finding upper K objects similar to a given instance (where the instance is an individual element in a set) .

더 자세히 말하자면, KNN은 라벨링 되어있는 데이터를 기반으로, 거리를 통해 어떤 데이터가 어떤 군집에 속하는지를 판별하는 기술로서, 새로 추가된 데이터와 가장 가까운 거리에 있는 K 개의 데이터를 모두 확인한 후, 확인된 K 개의 데이터에서 가장 많은 특성을 나타내는 라벨로 새로운 데이터의 라벨을 결정하는 것을 말한다.More specifically, KNN is a technique for determining what data belongs to which clusters over a distance based on the labeled data. After checking all the K data at the closest distance to the newly added data, The labeling of new data with the label showing the most characteristics in K data.

이를 통해, 예측부(140)는 KNN을 기반으로 수신한 버그 리포트의 심각도 상태를 분석함으로써, 새로운 버그의 심각성을 쉽게 예측할 수 있다.Accordingly, the prediction unit 140 can easily predict the severity of the new bug by analyzing the severity status of the bug report received based on the KNN.

이하에서는 상기에 자세히 설명된 내용을 기반으로 본 발명의 동작 흐름도를 간단히 설명하기로 한다.Hereinafter, an operational flow diagram of the present invention will be briefly described based on the details described above.

도 6은 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 방법에 대한 동작 흐름도이다.FIG. 6 is a flowchart illustrating an operation of recommending a bug correction developer and a bug severity prediction method based on a topic model and multiple characteristics according to an exemplary embodiment of the present invention.

도 6을 참조하면, 본 발명의 일 실시예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 방법은 우선, 시스템(100)의 제1 추출부(110)에 의하여, 모델링된 토픽을 기반으로 수신한 새로운 버그 리포트와 대응하는 토픽(Topic)을 식별하고, 상기 식별된 토픽을 가진 과거 버그 리포트를 추출한다(S610).Referring to FIG. 6, a bug correcting developer recommendation and a bug severity predicting method based on a topic model and multiple characteristics according to an embodiment of the present invention are first performed by a first extracting unit 110 of the system 100, (Topic) corresponding to the received new bug report based on the topic, and extracts a past bug report having the identified topic (S610).

이때, 제1 추출부(110)는 상기 수신한 새로운 버그 리포트에 나타나는 토픽 용어의 빈도수를 이용하여, 상기 수신한 새로운 버그 리포트와 대응하는 토픽을 식별할 수 있다. 상기 빈도 수는 높을수록 수신한 새로운 버그 리포트와 토픽이 일치하는 것을 의미한다. 이는 상기에 자세히 설명했으므로 이를 참조하기로 한다.At this time, the first extracting unit 110 can identify the topic corresponding to the received new bug report using the frequency of the topic term appearing in the received new bug report. The higher the frequency, the more the new bug report received matches the topic. This has been described in detail above and will be referred to here.

다음으로, 시스템(100)의 선정부(120)가 상기 새로운 버그 리포트를 정정할 적절한 개발자를 추천하기 위하여, 상기 추출된 과거 버그 리포트를 이용하여 후보 개발자를 선정한다(S620).Next, in order to recommend a suitable developer to correct the new bug report, the system manager 100 selects a candidate developer using the extracted bug report (S620).

이때, 후보 개발자는 토픽 모델을 이용한 개발자들 집합과 특성 정보를 이용한 개발자들 집합의 교집합으로부터 선정할 수 있다.In this case, the candidate developer can select from a set of developers using the topic model and a set of developers using the feature information.

더 자세히 말하자면, 선정부(120)는 상기 과거 버그 리포트로부터 추출된 담당자(assigness)와 코멘터(commenters)를 포함하는 제1 개발자(즉, 토픽 모델을 이용하여 추출된 개발자들의 집합)와, 상기 과거 버그 리포트와 같은 특성을 갖는 버그 리포트로부터 추출된 제2 개발자(즉, 다중-특성을 이용하여 추출된 개발자들의 집합)를 기반으로 상기 후보 개발자를 선정할 수 있으며, 이때, 선정부(120)는 상기 과거 버그 리포트 내에 포함된 제품(Product), 구성 요소(Component), 우선 순위(Priority), 심각성(Severity) 중 적어도 하나 이상의 멀티-특성(Multi-Feature) 정보를 이용하여 상기 제2 개발자를 추출할 수 있다. 이는 상기에 자세히 설명했으므로 이를 참조하기로 한다.More specifically, the selection unit 120 selects a first developer (that is, a set of developers extracted using the topic model) that includes assigners and commenters extracted from the past bug report, The candidate developer can be selected based on the second developer extracted from the bug report having the same characteristics as the past bug report (i.e., the set of developers extracted using the multi-characteristic). At this time, Feature information using at least one of a product, a component, a priority, and a severity included in the past bug report, Can be extracted. This has been described in detail above and will be referred to here.

다음으로, 시스템(100)의 추천부(130)가 상기 선정된 후보 개발자의 활동 경험 정보(예를 들어, 코멘트 수, 커밋 수, 할당 수, 첨부 파일 수 등)를 이용하여 상기 후보 개발자의 추천 순위(즉, 랭킹)를 연산하고, 상기 연산된 추천 순위를 기반으로 상기 새로운 버그 리포트를 정정할 개발자를 추천한다(S630).Next, the recommendation unit 130 of the system 100 acquires the recommendation information of the candidate developer using the activity experience information (e.g., the number of comments, the number of commits, the number of assignments, the number of attached files, etc.) (I.e., ranking), and recommends a developer to correct the new bug report based on the calculated recommendation rank (S630).

이때, 추천부(130)는 상기 선정된 후보 개발자들의 활동 경험 정보로서, 상기 후보 개발자가 버그 정정에 참여한 활동을 나타내는 코멘트(comments) 수와 커밋(commits) 수, 및 상기 후보 개발자가 버그 정정에 참여한 경험을 나타내는 할당(assignments) 수와 첨부 파일(attachment) 수를 이용할 수 있다. 그리고, 추천부(130)는 상기 코멘트 수, 상기 커밋 수, 상기 할당 수가 높고, 상기 첨부 파일의 할당 수가 낮을수록, 상기 후보 개발자들의 추천 순위가 높은 것으로 판단할 수 있다. 이는 상기에 자세히 설명했으므로 이를 참조하기로 한다.At this time, the recommendation unit 130 may include activity experience information of the selected candidate developers, including the number of comments and commits indicating the activity in which the candidate developer participated in the bug correction, and the number of commits, You can use the number of assignments and the number of attachments that represent your experience. The recommendation unit 130 may determine that the recommendation rank of the candidate developers is higher as the number of comments, the number of commits, the number of assignments, and the number of assignments of attachments are lower. This has been described in detail above and will be referred to here.

다음으로, 본 발명의 시스템(100)은 예측부(140)에 의하여, 제1 추출부(110)에서 추출된 상기 식별된 토픽을 가진 과거 버그 리포트를 기반으로, 상기 새로운 버그 리포트의 심각성을 예측할 수 있다(S640).Next, the system 100 of the present invention predicts the severity of the new bug report based on the past bug report having the identified topic extracted by the first extracting unit 110 by the predicting unit 140 (S640).

이때, 단계 S640에서는 새로운 버그 리포트의 심각성을 예측하기 위하여, 우선, 예측부(140) 내의 제2 추출부(141)에 의하여, 상기 제1 추출부에서 추출된 상기 식별된 토픽을 가진 과거 버그 리포트로부터 같은 특성을 갖는 공통 버그 리포트를 추출할 수 있다. 상기 제2 추출부(141)는 상기 식별된 토픽을 가진 과거 버그 리포트들을 기반으로, 같은 특성을 지닌 버그 리포트들끼리 집합으로 형성하고, 상기 형성된 집합들의 교집합을 공통 버그 리포트로서 추출할 수 있으며, 이는 상기에 자세히 설명했으므로 이를 참조하기로 한다.At this time, in order to predict the severity of the new bug report, the second extracting unit 141 in the predicting unit 140 extracts a past bug report having the identified topic extracted by the first extracting unit 141 It is possible to extract a common bug report having the same characteristics. The second extracting unit 141 may form a collection of bug reports having the same characteristics based on past bug reports having the identified topic, extract an intersection of the formed sets as a common bug report, This has been described in detail above and will be referred to here.

그리고 단계 S640에서는 예측부(140) 내의 계산부(142)에 의하여, 상기 추출된 공통 버그 리포트와 상기 새로운 버그 리포트 간에 텍스트 유사도를 계산할 수 있다. 상기 계산부(142)는 벡터로 표현된 상기 새로운 버그 리포트(즉, smoothed UM)와 KL 발산(Kullback-Leibler divergence)을 이용하여 상기 텍스트 유사도를 계산할 수 있으며, 이는 상기에 자세히 설명했으므로 이를 참조하기로 한다.In step S640, the calculation unit 142 in the predicting unit 140 can calculate the text similarity between the extracted common bug report and the new bug report. The calculator 142 may calculate the text similarity using the new bug report (i.e., smoothed UM) and KL (kullback-leibler divergence) represented by a vector, which is described in detail above. .

그리고 단계 S640에서 예측부(140)는 계산부(142)에서 계산된 텍스트 유사도를 기반으로 하고, K-최근접 이웃(K-nearest Neighbor) 알고리즘을 이용하여 상기 새로운 버그 리포트의 심각성을 예측할 수 있다. 이는 상기에 자세히 설명했으므로 이를 참조하기로 한다.In step S640, the prediction unit 140 may predict the severity of the new bug report based on the text similarity calculated by the calculation unit 142 and using a K-nearest neighbor algorithm . This has been described in detail above and will be referred to here.

이에 따라, 본 발명의 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템 및 방법은 버그 리포트의 토픽 모델과, 버그 리포트의 제품, 구성 요소, 우선 순위 및 심각성을 포함하는 다중 특성을 이용함으로써, 보다 적절한 버그 정정 개발자를 추천하고, 버그 리포트의 심각성을 예측할 수 있는 효과가 있다.
Accordingly, the topic model and multi-attribute based bug correction developer recommendation and bug severity prediction system and method of the present invention can be applied to a multi-feature including a topic model of a bug report and a product, a component, priority and severity of a bug report It is possible to recommend a more suitable bug correction developer and to predict the severity of the bug report.

본 발명의 일 실시 예에 따른 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The bug correction developer recommendation and the bug severity prediction method based on the topic model and the multi-characteristic according to an embodiment of the present invention can be implemented in a form of a program command which can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- And various modifications and changes may be made thereto by those skilled in the art to which the present invention pertains.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

100: 토픽 모델과 다중 특성 기반의 버그 정정 개발자 추천 및 버그 심각성 예측 시스템
110: 제1 추출부 120: 선정부
130: 추천부 140: 예측부
141: 제2 추출부 142: 계산부100: Fixes bug based on topic model and multiple attributes Developer recommendation and bug severity prediction system
110: first extracting unit 120:
130: Recommendation part 140: Prediction part
141: second extracting unit 142: calculating unit

Claims

A first extracting unit for identifying a new bug report received based on the modeled topic and a corresponding topic, and extracting a past bug report having the identified topic;
A selecting unit for selecting a candidate developer using the extracted past bug report to recommend a developer to correct the new bug report;
A recommendation unit for calculating a recommendation rank of the candidate developer using the activity experience information of the selected candidate developer and recommending a developer to correct the new bug report based on the calculated recommendation rank; And
A predictor for predicting the severity of the new bug report based on the past bug report extracted by the first extracting unit;
Lt; / RTI >
The predicting unit
A second extracting unit for extracting a common bug report having the same characteristics from past bug reports extracted by the first extracting unit; And
A calculation unit for calculating text similarity between the extracted common bug report and the new bug report;
Lt; / RTI >
Based on the calculated text similarity, a topic model for predicting the severity of the new bug report using a K-nearest neighbor algorithm, a bug correction based on multiple characteristics, a developer recommendation and a bug severity prediction system .

The method according to claim 1,
The first extracting unit
Using the frequency of the topic term appearing in the new bug report
Bug fix based on topic model and multiple characteristics Developer recommendation and bug severity prediction system.

The method according to claim 1,
The selecting unit
A first developer including a contact person and a commander extracted from the past bug report and a second developer extracted from a bug report having the same characteristics as the past bug report
Bug fix based on topic model and multiple characteristics Developer recommendation and bug severity prediction system.

The method of claim 3,
The selecting unit
Extracts the second developer using at least one characteristic information of the product, the component, the priority, and the severity included in the past bug report
Bug fix based on topic model and multiple characteristics Developer recommendation and bug severity prediction system.

The method according to claim 1,
The recommendation section
Wherein the activity experience information includes a number of comments and commits indicating an activity of the candidate developer participating in the bug correction and a number of assignments indicating an experience in which the candidate developer participated in bug correction and an attachment file attachment
Bug fix based on topic model and multiple characteristics Developer recommendation and bug severity prediction system.

6. The method of claim 5,
The recommendation section
It is determined that the recommendation rank of the candidate developers is high as the number of comments, the number of commits, the number of assignments, and the number of assignments of the attached files are low
Bug fix based on topic model and multiple characteristics Developer recommendation and bug severity prediction system.

delete

The method according to claim 1,
The calculation unit
The text similarity degree is calculated using the new bug report and Kullback-Leibler divergence represented by a vector
Bug fix based on topic model and multiple characteristics Developer recommendation and bug severity prediction system.

Identifying a new bug report and a corresponding topic received based on the modeled topic, and extracting a past bug report having the identified topic;
Selecting a candidate developer using the extracted past bug report to recommend a developer to correct the new bug report;
Calculating a recommendation rank of the candidate developer using the activity experience information of the selected candidate developer and recommending a developer to correct the new bug report based on the calculated recommendation rank; And
Predicting the severity of the new bug report based on the past bug report extracted in the extracting of the past bug report;
Lt; / RTI >
The step of predicting the severity
Extracting a common bug report having the same characteristics from past bug reports extracted at the step of extracting the past bug reports; And
Calculating text similarity between the extracted common bug report and the new bug report;
Lt; / RTI >
Based on the calculated text similarity, a topic model for predicting the severity of the new bug report using a K-nearest neighbor algorithm, a bug correction based on multiple characteristics, a developer recommendation, and a bug severity prediction method .

11. The method of claim 10,
The step of extracting the past bug report
Using the frequency of the topic term appearing in the new bug report
Fixed bug fixes based on topic model and multiple attributes Developer recommendation and bug severity prediction method.

11. The method of claim 10,
The step of selecting the candidate developer
A first developer including a contact person and a commander extracted from the past bug report and a second developer extracted from a bug report having the same characteristics as the past bug report
Fixed bug fixes based on topic model and multiple attributes Developer recommendation and bug severity prediction method.

13. The method of claim 12,
The step of selecting the candidate developer
Extracts the second developer using at least one characteristic information of the product, the component, the priority, and the severity included in the past bug report
Fixed bug fixes based on topic model and multiple attributes Developer recommendation and bug severity prediction method.

11. The method of claim 10,
The step of recommending the developer
Wherein the activity experience information includes a number of comments and commits indicating an activity of the candidate developer participating in the bug correction and a number of assignments indicating an experience in which the candidate developer participated in bug correction and an attachment file attachment
Fixed bug fixes based on topic model and multiple attributes Developer recommendation and bug severity prediction method.

15. The method of claim 14,
The step of recommending the developer
It is determined that the recommendation rank of the candidate developers is high as the number of comments, the number of commits, the number of assignments, and the number of assignments of the attached files are low
Fixed bug fixes based on topic model and multiple attributes Developer recommendation and bug severity prediction method.

delete

11. The method of claim 10,
The step of calculating the text similarity
The text similarity degree is calculated using the new bug report and Kullback-Leibler divergence represented by a vector
Fixed bug fixes based on topic model and multiple attributes Developer recommendation and bug severity prediction method.

A computer-readable recording medium having recorded therein a program for executing the method according to any one of claims 10 to 15 and 18.