A kind of API recommended method based on object classification and adaptive subgraph match
Technical field
The invention belongs to the code completion fields in field of software engineering, especially software development, and in particular to
The recommendation completion field of API (Application Programming Interface, application programming interface).
Background technique
With the continuous development of computer technology and the continuous variation of the application demand of people, the life of software upgrading iteration
Cycle time is ordered, the development efficiency for improving developer is extremely urgent.Present many mainstream Integrated Development Environment
It is integrated with code completion system in (Intergrated Development Environments, be abbreviated as IDEs), is solved
Key problem be API recommend.API recommender system be one can be in real time according to the current exploitation of developer environment above
For the system of next API approach that its recommendation may call.Exploit person can be reduced to a great extent in this recommender system
The time of member's manual screening and mnemonic learning API approach can also avoid out to a certain extent to improve development efficiency
Hair personnel introduce program Bug because of spelling words mistake.
The working principle of API approach recommendation tool is: first carrying out to the API approach use pattern in existing Open Source Code
Study, establishes model, will developer current exploitation context and the model that has built up when developer develops
It combines, intelligently generates next API sequence that one group of developer may call, and be presented to developer.Recommending
API sequence in, it is contemplated that the API of calling sequence is more forward, represents that proposed algorithm effect is better, thus the work effect of developer
Rate also improves more.
In Integrated Development Environment (such as Eclipse), developer can be needed to call by initial API recommender system
All API approaches called are presented to developer, in this theory according to initial ascending sort in class belonging to API approach
This API approach recommended models are referred to as Eclipse Model in bright book.When Eclipse Model at work, if exploitation
When method in person's SWT frame to be called, Eclipse Model can generate according to first letter female an ascending sort, length 168
API approach list recommend developer, this recommended method in practical application scene, can't reduce developer memory
And the workload of screening API approach, also without the development efficiency of substantially raising developer.Therewith, numerous studies personnel
Trial learns project source code, therefrom extracts API approach use pattern string, reuses the models such as statistical learning and built
Mould study, recommends for API approach, has regularity as Abram Hindle et al. proposes code also like natural language,
N-gram language model is introduced into (Abram Hindle, Earl T.Barr, Zhendong Su, Mark in API recommendation
Gabel, Premkumar Devanbu.On the naturalness of software.2012 34th
International Conference on Software Engineering (ICSE), 2012:837-847), and propose one
Kind of the n-gram model (being abbreviated as NGSE) based on corpus, their methods for being proposed of the results show, compared with existing
API approach recommended models recommend have a degree of raising in accuracy rate in API approach.
In the proposed NGSE model of Abram Hindle et al., by carrying out simple text-processing operation life to source code
It at corpus, reuses n-gram model and is learnt, recommend for API approach later.Only to source code text in NGSE
Part carries out plain text processing, generates corpus to be learned, this code process mode can lose the semanteme of source code itself
Information so that the model for finally learning to obtain is unable to the semantic information of representative code itself, therefore can recommend API approach
The recommendation accuracy rate of model has a certain impact.Based on this, the present invention proposes a kind of based on object classification and adaptive subgraph
The API recommended method matched, first can to source code carry out semantic analysis, therefrom extract can expression code semantic information API
Method use pattern string is trained generates picture library GDB on this basis, for program to be recommended, according to method to be recommended
The code building subgraph SubGraph (being abbreviated as SG) above of position, in order to maximize search result, using beta pruning method to obtaining
Subgraph SG carry out cut operator, obtain subgraph set SGs.For each subgraph in set SGs, each subgraph is first calculated
Probability is then searched for matching in picture library GDB using the subgraph and obtains recommendation results and the corresponding probability of each method, will be sub
Graph search matches obtained method probability and combines with subgraph probability, generates the final corresponding probability of each method, passes through
Probability is ranked up from high to low, obtains the API approach sequence that should recommend at the L of final position.
Summary of the invention
The technical problems to be solved by the present invention are: although the current existing class largely encapsulated and method make for developer
To reduce its development amount, but for the developer less for a development Experience, select suitable API approach
And learn API approach can spend its a large amount of time using document, so that increase developer enters gate threshold;If deposited
In a kind of method, intelligence can be carried out according to the current exploitation code context of developer and accurately recommend the side that next can be called
Method can then greatly improve the development efficiency of developer, reduce the learning time of developer, can also reduce the introduction door of developer
Sill.
The technical solution of the present invention is as follows: a kind of API recommended method based on object classification and adaptive subgraph match, first
Some projects source code can be downloaded from GitHub, carried out syntactic analysis from method rank using abstract syntax tree, obtained several
API use pattern string;For each of API pattern string API approach, it is grouped according to the class belonging to it, thus
Obtain the API approach use pattern string of each class in class libraries;It is established according to the API approach pattern string for belonging to single class extracted
Picture library GDB (Graph DataBase) then can when needing to carry out API approach recommendation at the L of position in developer's development process
From the API approach string for belonging to a class libraries with method to be recommended from the L of position is extracted in the environment above of position L, subgraph SG is established
(SubGraph), adaptive subgraph match algorithm (the Pruning-Based Adaptive Subgraph based on beta pruning is reused
Matching Algorithm) most preferably possible matching result is searched in picture library GDB, so that recommended method list is generated, presentation
To developer.
A kind of API recommended method based on object classification and adaptive subgraph match according to claim 1, it is special
Sign is to comprise the steps of:
1) according to the source code file downloaded from GitHub, abstract syntax tree AST (Abstract is established
Syntax Tree), the API approach invocation pattern string of method rank is obtained by syntactic analysis;
2) it is grouped, obtains for each of API approach invocation pattern string call method, the class libraries according to belonging to it
API approach use pattern string into class libraries in each class;
3) the API pattern string of each class libraries according to obtained in step 2) carries out statistical learning, establishes picture library GDB;
4) when developer needs to carry out API approach recommendation at the L of position, according to described in step 1) and step 2)
Method extracted from the code above from the L of position and belong to mutually similar API approach with the API approach of position to be recommended and use
Pattern string, spanning subgraph SG;
5) according to described by Pruning-based ASMA model, first using beta pruning method to generate in step 4) to
Beta pruning is carried out with subgraph SG, subgraph set SGs is obtained, for any subgraph in subgraph set, first calculates its probability P rob1,
It reuses and scans for matching in the picture library GDB that the subgraph is established in the step 3), the result recommended and corresponding general
Rate Prob2, Prob1 and Prob2 are combined, and obtain probability corresponding to each result;
6) the recommendation API approach and the corresponding probability of each method that can be obtained by step 5), by probability from up to
It is low successively to sort, final API approach sequence is obtained, developer is presented to.
A kind of API recommended method based on object classification and adaptive subgraph match according to claim 2,
It is characterized in that the API approach use pattern string that will be arrived in step 1), the class libraries according to belonging to wherein API approach are grouped, obtain
API approach use pattern string in class libraries in each class.It has main steps that: project source code is first downloaded from GitHub, it is right
In each of these source code file, abstract syntax tree (Abstract Syntax Tree is abbreviated AST) is established respectively, into
The syntactic analysis of row method rank obtains the API approach use pattern string of method rank, according in API approach use pattern string
Each API approach belonging to class be grouped, obtain the API approach use pattern string for belonging to each class.It is final to use
The sorted API approach use pattern string that above-mentioned steps obtain carries out statistical learning, establishes picture library GDB.
A kind of API recommended method based on object classification and adaptive subgraph match according to claim 2, it is special
Sign is adaptive subgraph match algorithm Pruning-Based ASMA (the Pruning-Based Adaptive based on beta pruning
Subgraph Matching Algorithm).In developer's development process, need to carry out API approach recommendation at the L of position
When, first can be using with above-mentioned steps 2) identical method, it establishes abstract syntax tree and carries out syntactic analysis, it is above to obtain position L
In have API Calls method string, then take steps 3) described in method be filtered, obtain and class libraries one belonged at the L of position
The API approach string of cause, establishes subgraph, and Pruning-Based ASMA automatically can carry out cut operator to current subgraph SG, obtain
To several subgraph set SGs<SubGraph1, SubGraph2, SubGraph3 ... ... to be matched>, in subgraph set
Each subgraph (by taking SubGraph1 as an example), first calculate the corresponding probability P rob1 of subgraph SubGraph1, reuse
SubGraph1 searches for best match figure in picture library GDB, so that possible recommendation results<Res1 is obtained, Res2 ... ...>, with
And corresponding probability is<Prob2_1, Prob2_2 ... ...>, then by the probability of recommendation results with to be used to search for matched subgraph general
Rate combines, and obtains consequently recommended result<Res1, and Res2 ... ...>, and its corresponding probability is<Prob1*Prob2_1,
Prob1*Prob2_2 ... ... >.In this way, the obtained result of remaining subgraph and probability in subgraph set are successively calculated,
Probability is ranked up according to from high to low sequence, final recommendation results list can be obtained.
In API recommended technology, there are two the Key technique problems that need to solve: one is that how to extract can be with maximum journey
Degree represents the API use pattern string of reaction code semantic information, another is the API that how will be extracted in previous step
Method use pattern string carries out reasonable modeling study generation pattern string library can be according to when needing to carry out method recommendation
Some method use pattern strings match the most similar code string of semantic information in slave pattern string library, so as to recommend most
Suitable API approach.The API recommended method based on object classification and adaptive subgraph match that the invention proposes a kind of, first
Semantic analysis will be carried out to source code, and extract API approach pattern string therein, according to belonging to each API approach in pattern string
Class be grouped, establish picture library GDB.When developer needs API approach to recommend, using API mode similar to the above
String abstracting method, the API approach for belonging to a class with the API approach to be recommended of recommended location is extracted from source code above to be made
With pattern string, subgraph SG is established, in order to enable search matching result maximizes, using Pruning-Based ASMA algorithm to working as
Preceding subgraph carries out reasonable beta pruning, obtains subgraph set SGs.For each subgraph in set SGs, it is general that each subgraph is first calculated
Rate is then searched for matching in picture library GDB using the subgraph and obtains recommendation results and the corresponding probability of each method, by subgraph
Search matches obtained method probability and combines with subgraph probability, generates the final corresponding probability of each method, pass through by
Probability is ranked up from high to low, obtains the API approach sequence that should recommend at the L of final position.The results show that using in the present invention
A kind of API recommended method based on object classification and adaptive subgraph match of proposition, compared to existing API recommended method
(such as Eclipse Model, NGSE) recommends accuracy rate to obtain a degree of promotion, thus also reduces to a certain extent
Developer learns the time of the API approach provided in existing class libraries, improves developer's working efficiency.
The invention is characterized in that
Project source code file is downloaded from GitHub, establishes abstract syntax tree, is carried out syntactic analysis from method rank, is obtained
To the use pattern string of mixing class libraries API approach, it is grouped, is obtained according to the affiliated class libraries of API Calls method each in pattern string
Into class libraries in each class API approach invocation pattern;Picture library GDB is established according to sorted API use pattern string;Work as exploitation
During staff development, when needing to carry out API recommendation, using ibid identical source code processing method, from position L's to be recommended
The API approach use pattern string for belonging to identical class libraries is extracted in environment above, spanning subgraph reuses the base proposed in the present invention
Subgraph is operated in the adaptive subgraph match algorithm Pruning-Based ASMA algorithm of beta pruning, obtains several sub-collective drawings
Conjunction<SubGraph1, SubGraph2, SubGraph3 ... ...>, using each subgraph, scans for, pushed away in the GDB of library
The probability of the method and each method recommended obtains the final probability of each method to be recommended in conjunction with the probability of search subgraph.
After searching for matching to all subgraphs, all results are ranked up according to probability from high to low, the final side API is obtained
Method recommends sequence.
Present invention firstly provides the API pattern string abstracting method based on object classification and the adaptive sons based on beta pruning
Figure matching algorithm.Syntactic analysis is carried out to source code file and obtains API use pattern string, is divided according to the affiliated class libraries of method
Group establishes picture library;When needing to recommend, current corresponding subgraph above is generated, adaptively subgraph is carried out in conjunction with beta pruning method
Operation, carries out matching search using the subgraph after beta pruning in picture library GDB, according to the probability of each result, obtains final push away
Recommend result sequence.
Detailed description of the invention
Fig. 1 is a kind of stream of API recommended method based on object classification and adaptive subgraph match of the embodiment of the present invention
Cheng Tu.
Fig. 2 is the extraction process and result of the API use pattern string based on object classification in example.
Fig. 3 is the extraction result of the API use pattern string in example in the environment above of code to be recommended.
Fig. 4 (a) is the PASMA Model proposed in the present invention and Eclipse Model, Graph Model in data set
API on JGit recommends accuracy rate.
Fig. 4 (b) is the PASMA Model proposed in the present invention and Eclipse Model, Graph Model in data set
API on log4j recommends accuracy rate.
Specific embodiment
Traditional n-gram language model carries out simple text maninulation to source code file, carries out generating corpus progress
API approach is recommended, although although this method recommends accuracy rate to have a certain upgrade existing API approach, to source generation
The semantic information missing that code file is segmented, the text maninulations such as stop words is gone to will lead to code.And API approach recommendation tool
Whether the corpus information that fine or not a part depends on extracting can reflect the intention of developer, i.e., whether the semantic information of code
Adequately excavated;Whether another part then depends on learning model can be correctly by the code with close semantic information
String matching.A kind of API recommended method based on object classification and adaptive subgraph match is proposed in the present invention.For source code text
Part carries out semantic analysis, obtains API approach use pattern string therein, then according to class libraries belonging to each method in pattern string into
Row grouping, obtains the API approach use pattern string of each class libraries, establishes picture library GDB, first when needing to carry out API approach recommendation
It first will use same method and extract the API approach use pattern string for belonging to same class libraries from code above, spanning subgraph,
Using beta pruning method, several subgraph set are obtained, successively according to subgraph each in subgraph set, search is matched in picture library, is obtained
It is combined with the probability of corresponding subgraph, obtains the probability of final each recommended method by the probability of each recommended method, according to
Probability sorts from high to low, generates and recommends API approach sequence.
Method detailed step proposed in the present invention is as follows:
1) project source code is collected: GitHub is the maximum social programming in the current whole world and code trustship website, the present invention
In all experimental project source codes be to download to obtain from GitHub.In order to obtain the good project of coding style as much as possible
Source code, the present invention in using the Star score of each project as the measurement index of a project quality height, from GitHub or more
1000 project source code before load Star number ranking;
2) project source code file process: filtering out source code file from the source code of each project, establishes abstract language
Method tree is analyzed from method rank, and extraction obtains the API approach use pattern string in each method;For the obtained side API
Each of method use pattern string API approach, the class libraries according to belonging to it are grouped, and are obtained in class libraries in each class
API approach use pattern string;
3) statistical model is established: the API approach use pattern string of each class in the class libraries according to obtained in step 2), statistics
The probability of each node and the probability of each edge are obtained, picture library GDB (Graph DataBase) is thus established;
4) it generates subgraph to be matched: during developer's exploitation code, needing to carry out API approach at the L of position to push away
When recommending, the present invention can carry out syntactic analysis to current code file first using the identical method with step 2 first, establish and take out
As syntax tree (Abstract Syntax Tree), obtain the API use pattern string in environment above, then therefrom extract with to
Recommended method belongs to the API use pattern string of identical class libraries, generates band Matching sub-image SubGraph;
5) API recommends: proposing a kind of adaptive subgraph match algorithm Pruning-Based based on beta pruning in the present invention
ASMA(Pruning-Based Adaptive Subgraph Matching Algorithm).Pruning-Based ASMA is calculated
Method can carry out cut operator for subgraph obtained in step 4), obtain several subgraph set SGs < SubGraph1 to be matched,
SubGraph2, SubGraph3 ... ... >, for each subgraph (by taking SubGraph1 as an example), first accounting operator figure
SubGraph1<API1, API2, API3 ... ...>probability P rob1, reuse the picture library that SubGraph1 is established in the step 3)
GDB scans for matching, and obtains possible recommendation results<Res1, and Res2 ... ...>, corresponding probability is respectively<Prob2_1,
Prob2_2 ... ...>, then consequently recommended result<Res1, Res2 ... ...>, corresponding probability is<Prob1*Prob2_1,
Prob1*Prob2_2 ... ...>, to avoid result underflow, each probability is carried out to take log operation, then corresponding result is<log
(Prob1)+log (Prob2_1), log (Prob1)+log (Prob2_2) ... ... >;
With subgraph SubGraph1<API1, API2, API3>for, demonstrate the calculating process of subgraph probability P rob1:
Prob1 (API1, API2, API3)=Pr (API1) * Pr (API2 | API1) * Pr (API3 | API1, API2)
According to the above process, the probability of each possible recommendation results can be obtained, be ranked up according to probability from high to low,
It obtains final method and recommends sequence, be presented to developer;
6) recommendation results analysis and evaluation: for the accuracy rate of the API recommended method proposed in the verifying present invention, researcher
Can by each time recommendations API result sequence and desired API compare and analyze, calculating TopAccuracy@k value, be used for
The recommendation effect of balancing method;
Top-k accuracy rate (TopAccuracy@k) indicates that API recommended models are accurate when k API approach before recommendation
Rate.Value is bigger, indicates that API recommended models precision is better, calculation formula is as follows:
In above-mentioned formula, TopAccuracy@k be in recommendation results sequence in preceding k result comprising expection API approach
Accuracy rate.hit(APIk) value whether represent in the API sequence recommended in preceding k API approach comprising expected API approach,
If comprising hit (APIk) value be 1, be otherwise 0;
Illustrate specific implementation process of the invention, but side proposed in the present invention below by a simulation example demonstration
The not limited to this example of the method institute scope of application.
The process flow in the present invention about source code file is illustrated in Figure of description in Fig. 2.It is given for one
Source code file Demo.java, establish abstract syntax tree, from method rank carry out syntactic analysis, if obtaining dry-mixing class libraries
API approach use pattern string.In order to enable the API use pattern string extracted is reported as precisely as possible, the present invention proposes that one kind is based on
The method of object classification, the i.e. class libraries according to each API approach are grouped, and are extracted and are obtained the API in final each class
In code sample shown in Fig. 2, three API approaches that class A can be obtained are handled by above-mentioned steps for the use pattern string of method
Use pattern string, respectively<A.ma1 (), A.ma2 (), A.ma4 ()>,<A.ma2 (), A.ma3 (), A.ma1 (), A.ma2
(), A.ma5>,<A.ma1 (), A.ma3 ()>, three API approach use pattern strings of class B, respectively<B.mb1 (), B.mb4
()>,<B.mb3 (), B.mb3 (), B.mb3>,<B.mb3 (), B.mb1 ()>;Belong to as obtained in above-mentioned steps same
All API approach use pattern strings of class, count each method using probability and each edge (wherein A.ma1 (),
A.ma2 () representation method A.ma1 () arrives the side of method A.ma2 ()) probability, establish Weighted Directed Graph, in this example for
Class A eventually generates 3 Weighted Directed Graphs, constitutes picture library GDB.
It the time for learning method in class libraries the purpose of the present invention is reducing developer, reduces into gate threshold, in exploit person
When member carries out development task, it can recommend current most probable that can call for developer intelligently according to existing code above
Method, to improve the development efficiency of developer.Fig. 3 shows the generation that one section of developer is writing in Figure of description
Code, horizontal line part indicate the position that tool will be needed to recommend API approach, and the method proposed in the present invention at this time will use identical
Code process method extract API approach use pattern string in environment above, at this point, the method for position to be recommended belongs in class A
Method, therefore carry out subgraph match using the API approach use pattern string for belonging to class A in environment above, in this example, above
The API approach use pattern string for belonging to class A in environment is<A.ma3 (), A.ma1 (), A.ma2 ()>.At this time in order to improve
Recommend accuracy rate, proposes a kind of adaptive subgraph match algorithm Pruning-Based ASMA based on beta pruning in the present invention.
Next the process flow that will be described in detail Pruning-Based ASMA algorithm carries out beta pruning first against current subgraph, obtains
To subgraph set SGs={ SG1, SG2, S63 }=<A.ma3 (), A.ma1 (), A.ma2 ()>,<A.ma1 (), A.ma2 ()
>,<A.ma2 ()>};For each subgraph in subgraph set, subgraph probability P rob1 is calculated, reuses SG1 in picture library GDB
Matching result is searched for, and then obtains corresponding results set and the corresponding probability P rob2 of each result, is finally calculated every
The corresponding probability of a result Res is Prob, and its calculation formula is Prob (Res)=Prob1 (SG) * Prob2 (Res)).
In this example with subgraph SG1 (<A.ma3 (), A.ma1 (), A.ma2 ()>) for, demonstrate its probability calculation and
The probability calculation process of recommendation results:
In picture library GDB, subgraph<A.ma3 (), A.ma1 (), A.ma2 ()>there was only unique a line, pointing method
A.ma5 (), therefore the probability P rob (A.ma5) of Prob2 (A.ma5)=1, ultimate method A.ma5 ()=Prob1 (SG1) *
Prob2 (A.ma5)=(1/15) * 1=1/15.The probability calculation of the calculation method of remaining subgraph probability and corresponding recommendation results
Method is same as above, and the results are shown in Table 1.
By result in table 1 it is found that the probability being finally calculated according to the API approach of recommendation is ranked up from high to low,
Obtained API approach sequence be<A.ma5 (), A.ma4 (), A.ma3 ()>.If current developer needs method to be used lucky
For A.ma5 (), then TopAccuracy@1=100%, indicates that first result is developer institute in the result sequence recommended
The result needed.
The recommended method of 1 example of table test code and corresponding probability
The API recommended method proposed in Eclipse Model, NGSE and the present invention is shown in table 2 and table 3
It is accurate that preceding Top10 of the Pruning-Based ASMA (being abbreviated as PASMA Model) on data set JGit and log4j recommends
Rate.By taking the Top1 accuracy rate of PASMA Model model in table as an example, the meaning that value 0.4810 represents is in all recommendation
As a result in, first recommendation results is the accounting of required method.
Each model of table 2 API on data set JGit recommends accuracy rate
|
Top1 |
Top2 |
Top3 |
Top4 |
Top5 |
Top10 |
PASMA Model |
0.4810 |
0.6493 |
0.6988 |
0.7257 |
0.7416 |
0.7731 |
NGSE |
0.2626 |
0.3398 |
0.3820 |
0.4136 |
0.4356 |
0.5112 |
Eclipse Model |
0.1975 |
0.2364 |
0.2720 |
0.2979 |
0.3188 |
0.5135 |
Each model of table 3 API on data set JGit recommends accuracy rate
|
Top1 |
Top2 |
Top3 |
Top4 |
Top5 |
Top10 |
PASMA Model |
0.4849 |
0.6745 |
0.7419 |
0.7745 |
0.7884 |
0.8111 |
NGSE |
0.3259 |
0.3988 |
0.4346 |
0.4570 |
0.4831 |
0.5571 |
Eclipse Model |
0.1545 |
0.1870 |
0.2159 |
0.2620 |
0.2717 |
0.4082 |
4 (a) and 4 (b) graphically visually present Eclipse Model, NGSE and PASMA in attached drawing
Preceding Top10 of the Model on data set JGit and log4j recommends accuracy rate.
In conclusion the present invention provides a kind of API recommended method based on object classification and adaptive subgraph match, phase
For existing API recommended method, this method is built by the way that API approach use pattern string to be grouped according to the affiliated class of method
The vertical picture library GDB for belonging to each class in class libraries.When needing the carry out method recommendation at the L of position, from the L of position on generation above
In code, extracts and belong to mutually similar API approach use pattern string with location method to be recommended, obtain subgraph SG, in order to enable
It searches for matching result to maximize, the present invention proposes a kind of method (Pruning-Based of adaptive subgraph match based on beta pruning
Adaptive Subgraph Matching Algorithm), i.e., the subgraph SG that above-mentioned steps generate is carried out using beta pruning method
Cut operator, spanning subgraph set SGs are first calculated each subgraph probability, then make for each subgraph in set SGs
Matching is searched in picture library GDB with the subgraph and obtains recommendation results and the corresponding probability of each method, subgraph search is matched
Obtained method probability is combined with subgraph probability, generates the final corresponding probability of each method, by by probability by height
It is ranked up to low, obtains the API approach sequence that should recommend at the L of final position.The side proposed in the experimental result display present invention
For method in terms of recommending accuracy rate, the method by method initial sequence sequence than carrying in Integrated Development Environment has very great Cheng
Promotion on degree, and than the API recommended method of the current existing n-gram language model based on corpus, recommending precision
On also have a degree of promotion.