CN111176993A - Code static detection method based on abstract syntax tree - Google Patents
Code static detection method based on abstract syntax tree Download PDFInfo
- Publication number
- CN111176993A CN111176993A CN201911349236.5A CN201911349236A CN111176993A CN 111176993 A CN111176993 A CN 111176993A CN 201911349236 A CN201911349236 A CN 201911349236A CN 111176993 A CN111176993 A CN 111176993A
- Authority
- CN
- China
- Prior art keywords
- rule
- class
- constructing
- code
- abstract syntax
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3616—Software analysis for verifying properties of programs using software metrics
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses a code static detection method based on an abstract syntax tree, which constructs rule extension templates for different language environments; constructing a rule base based on base class rules of the rule extension template; adopting a registration list mechanism to carry out rule registration; nesting a rule base into an open source platform; and performing code scanning and analysis based on the abstract syntax tree. The invention supports the customization of scanning rules of mainstream development languages, can be quickly integrated into open source software, scans codes by an automatic means, finds unsafe, ambiguous and fuzzy codes in a program, reduces defects and problems in the development and design process of software or a system, and ensures the quality of the software.
Description
Technical Field
The invention relates to a software development and test technology, in particular to a code static detection method based on an abstract syntax tree.
Background
With the updating of software design technology and the expansion of software or system scale, the complexity of software products is continuously improved, the problem of software safety is increasingly exposed, and the software test is very important for ensuring the software quality. At present, a code defect-oriented method is mainly adopted for code static test, firstly defects are summarized from a code level of software, the defects are abstracted into corresponding defect modes, then values of related expressions in a program are approximately calculated, and finally a calculation result is applied to defect detection. Static testing is the scanning of code problems without running code, with which 30% -70% of defects in logic design and coding can be effectively discovered. In this field, companies at home and abroad have developed a series of typical code detection tools, such as Klocwork, Fortify, CheckmarxSuite, CodeSecure, etc. The Klockwork is very effective for detecting C/C + + defects and security vulnerabilities of embedded software, but is weak in detection capability for non-C/C + + languages and inconvenient to expand; fortify is the most widely used and most language-supporting static source code detection tool in the world at present, but is inconvenient to use and insufficient in source opening property; the CheckmarxConsuite scans and analyzes security vulnerabilities and weaknesses in source codes by adopting a lexical analysis method, but language support is incomplete; CodeSecure finds the source code with information security problem by syntax parsing and provides the patching suggestion to adjust, but also supports less kinds of languages. In summary, although some more efficient static code detection tools exist at present, the following two problems still exist in these tools: (1) the openness of the tool is insufficient, and the tool cannot be customized and expanded according to the needs; (2) the languages supporting scanning are not comprehensive, and the universality is poor.
Disclosure of Invention
The invention aims to provide a code static detection method based on an abstract syntax tree.
The technical solution for realizing the purpose of the invention is as follows: a code static detection method based on an abstract syntax tree comprises the following steps:
step 2, constructing a rule base based on the base class rules of the rule extension template;
step 3, adopting a registration list mechanism to perform rule registration;
step 4, nesting the rule base into an open source platform;
and 5, scanning codes based on the abstract syntax tree to obtain error codes which do not accord with the construction rule.
Compared with the prior art, the invention has the following remarkable advantages: 1) customizing a code static detection judgment standard, and adapting to the code quality scanning requirements of different language obstructed scenes; 2) the degree of polymerization of the static scanning rule and the scanning tool is reduced, so that the static scanning rule can be integrated into an open source tool in a plug-in mode; 3) the automation degree is high, the application range is wide, manual intervention and development environment support are not needed in the detection process, and the detection result is obtained quickly.
Drawings
FIG. 1 is a diagram of a static detection architecture for code based on abstract syntax trees.
FIG. 2 is a rule customization extension template development flow diagram.
FIG. 3 is a rule customization development flow diagram.
FIG. 4 is a schematic diagram of a customized rule registration.
FIG. 5 is a flowchart of code scanning based on an abstract syntax tree.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings.
The method supports the customization of all the scanning rules of the mainstream development language at present, can be quickly integrated into open source software, scans codes by an automatic means, finds unsafe, ambiguous and fuzzy codes in a program, reduces defects and problems in the development and design process of software or a system, and ensures the quality of the software. The method comprises the following steps:
step 1: rule expansion templates are built for different language environments to form a logic production line based on an abstract tree, and the process of building the rule expansion templates is shown in FIG. 2.
Step 1.1: building different language operation environments, and importing different language operation environment dependency packages;
step 1.2: customizing different programming language base classes;
step 1.3: carrying out standardized description on the customized rule and constructing a description expansion interface;
taking a java programming language as an example, dividing a rule into a left part and a right part, and describing the left part of the custom rule in a standardized manner according to a java programming language base class, as shown in table 1:
table 1java rule left part specification description table
The right part of the custom rule is described in an expanded way, as shown in table 2:
TABLE 2 Java rule right part specification description Table
Step 1.4: and creating a unit test case according to the customized rule, and compiling through the build instruction to ensure the normal operation of the test case and construct a perfect operation environment. The project can also be compiled to form a jar package, the jar package is imported into a third-party tool, and whether the jar package is loaded successfully or not is tested by running the third-party integration tool;
step 1.5: and releasing the base class interface and the specification.
Step 2: the rule base is constructed based on the rules of the base class, and the rule customization development flow chart is shown in fig. 3.
Step 2.1: for the above five types of base classes, five types of rule resolvers are constructed: interface scan class (classTree), interface scan Variable (Variable Tree), interface scan Metal (metaldTree Tree), interface scan Block (blockTree), interface scan expression (expressTree Tree).
Step 2.2: aiming at the five rule analyzers, a five-class rule analysis class library is constructed: class scanClass, class scanVariable, class scanMethold, class scanBlock, class scanexpress.
Step 2.3: aiming at the five rule class libraries, a five-class rule analysis method is constructed: scanClass (classtTree), scanVariable (variable Tree tree), scanMethold (metaldTree), scanBlock (blockTree), scanExpression (expressTree).
Step 2.4: the registration rule detection criteria include rule content, rule type, rule scope, rule key attribute, and rule level, for example, as follows:
table 3 registration rule testing basis table
Step 2.5: and (3) constructing a rule stage detection queue, and dividing a whole section of code into 6 stages of member initialization, construction class, construction function, callback function, exception throwing, result feedback and the like.
Step 2.6: and constructing a component detection queue, continuously dividing the component detection queue for the stage obtained by analysis to obtain a component rule unit jointly composed of parameters and annotations, and further representing the code of a certain stage into smaller components.
Step 2.7: and constructing a unit detection queue, continuously dividing the components to form rule units such as independent individuals, relations and member variables, and generating corresponding rule class instances. In actual operation, the rule classes are distributed into different queues according to the division basis of the analysis stage where the rule classes are located during generation, and during analysis, the analyzer only needs to call the corresponding rule of the stage.
Step 2.8: and constructing a complex unit detection queue, and merging associated rules to generate a rule class instance when the unit queue of the divided component unit has context association relation. The method adopts a single-case design mode for design, calls the case creating function for multiple times under the condition of not creating multiple cases, adds related parameters into the case created for the first time, and can realize convenient operation of detecting a group of multiple rules at one time.
Step 2.9: because the rule unit cannot be directly used for executing code detection, the rule class instances generated in the steps 2.7-2.8 need to be converted into the rule file in the xml format, and error detection comments are added to the xml rule file to construct a rule base.
Step 2.10: and integrally packaging each rule to constrain the rule structure and the rule file.
In order to accurately describe the rules in the rule base, each rule is integrally encapsulated according to an encapsulation paradigm, information quantity meeting the basis of the registration rule is used as rule input, and the encapsulation paradigm is shown in table 3:
TABLE 3 encapsulation paradigm table
Query key | Attribute group (Structure) | Description information (Structure) | Regular filename |
The description information comprises a scope, a key attribute and a detection level; the attribute group comprises a stage name, a component name, a unit name, a complex unit incidence relation list and the like; and querying a rule file index corresponding to the keyword, wherein the rule file index comprises file error annotation information and keyword information.
And step 3: the rule registration is performed by using a registration list mechanism, and a schematic diagram of customized rule registration is shown in fig. 4.
Step 3.1: connecting a rule registration interface;
step 3.2: calling a rule registration method;
step 3.3: configuring a dependent package file path;
step 3.4: transmitting the path of the rule;
step 3.5: a user-defined rule object is transmitted;
step 3.6: adding into a registration list;
step 3.7: extracting a rule file keyword annotation list and constructing an Issures list;
step 3.8: an example file of pages is formed.
And 4, step 4: establishing and compiling rules, and embedding the rule models into the open source platform;
and 5: rule scanning and analysis, abstract syntax tree based code scanning flow diagram as shown in FIG. 5
Step 5.1: scanning a code extraction convention type as a key node (root) and marking as a starting symbol;
step 5.2: recording each leaf node, marking the inner node of each leaf node with a non-terminal character, if A is the non-terminal character mark of a certain inner node, X1, X2, …, Xn is the mark of all child nodes arranged from left to right of the node, A → X1X2 … Xn is a production formula, X1, X2, …, Xn is a terminal character or a non-terminal character;
step 5.5: and (4) lexical analysis. The source program is decomposed into word-symbol strings to form a symbol table for syntactic analysis. And compiling a lexical analysis program by using a lexical analysis tool, generating a source program file after compiling, and decomposing the source program file into symbol strings by executing the lexical analysis program.
Step 5.6: and (5) analyzing the syntax. The method includes the steps of compiling a grammar analysis program by means of a grammar analysis tool, compiling to generate a source program file, recognizing an input character string as a word symbol stream by means of the grammar analysis program, namely judging whether a symbol string can be generated by a grammar, and generating a grammar tree.
Step 5.7: and (5) semantic analysis. Through the processing of names and operators, the syntax tree is converted into a standard form, which includes an object and a symbol table representing type information, to form a complete syntax tree containing all information of the program structure.
Step 5.8: and (4) fault detection. Except for the root node root, the rest of the syntax tree are all child nodes. The remaining nodes are divided into two types: an inner node and a leaf node. The inner nodes are divided into father nodes and child nodes, and the leaf nodes only contain the father nodes. Each type of node is represented by a corresponding symbol, different data types define corresponding attributes and operations, and rule action, generated result, expected output and external interfaces are given. And (4) realizing function built-in actions by traversing a program syntax tree, and finishing static detection on software faults.
The invention is friendly and compatible with a third-party tool, completes the static detection work of a concrete project by using the abstract syntax tree principle, and forms a test feedback result. In order to facilitate the continuous increase and improvement of the rules, the invention provides an open expansion interface, provides a base class for developing each type of rules in a source program, and can carry out the customized development and the expansion development of the rules in the base class through method rewriting; the invention provides a one-key deployment function, which is convenient for adapting to the access of a third-party tool, constructing a continuous integration and delivery tool chain and realizing the quality inspection and the rapid iteration of software.
Claims (7)
1. A code static detection method based on an abstract syntax tree is characterized by comprising the following steps:
step 1, establishing rule extension templates for different language environments;
step 2, constructing a rule base based on the base class rules of the rule extension template;
step 3, adopting a registration list mechanism to perform rule registration;
step 4, nesting the rule base into an open source platform;
and 5, scanning codes based on the abstract syntax tree to obtain error codes which do not accord with the construction rule.
2. The method for detecting static state of code based on abstract syntax tree as claimed in claim 1, wherein in step 1, the concrete method for constructing the rule extension template is:
step 1.1: building different language operation environments, and importing different language operation environment dependency packages;
step 1.2: customizing different programming language base classes;
step 1.3: carrying out standardized description on the customized rule and constructing a description expansion interface;
step 1.4: creating a unit test case according to the customized rule, compiling through a build instruction to ensure that the test case normally runs and a complete running environment is constructed, or compiling the project to form a jar package, importing the jar package into a third-party tool, and testing whether the jar package is successfully loaded or not by running a third-party integration tool;
step 1.5: and releasing the base class interface and the specification.
3. The abstract syntax tree-based code static detection method as claimed in claim 2, wherein in step 1.3, the rule is divided into two parts, namely a left part and a right part, for the java programming language, and the left part of the custom rule is normalized and described according to the java programming language base class, as shown in table 1:
table 1java rule left part specification description table
The right part of the custom rule is described in an expanded way, as shown in table 2:
TABLE 2 Java rule right part specification description Table
4. The method for statically detecting codes based on abstract syntax trees as claimed in claim 1, wherein in step 2, the concrete method for constructing the rule base is:
step 2.1: for the above five types of base classes, five types of rule resolvers are constructed: interface scan class (classTree), interface scan Variable (Variable Tree), interface scan Metal (metaldTree Tree), interface scan Block (blockTree), interface scan expression (expressTree);
step 2.2: aiming at the five rule analyzers, a five-class rule analysis class library is constructed: class scanClass, class scanvariable, class scanMethold, class scanBlock, class scanexpress;
step 2.3: aiming at the five rule class libraries, a five-class rule analysis method is constructed: scanClass (classtTree), scanVariable (variable Tree tree), scanMethold (metaldTree), scanBlock (blockTree), scanexpress (expressTree);
step 2.4: the registration rule detection basis comprises rule content, rule type, rule scope, rule key attribute and rule level;
step 2.5: constructing a rule stage detection queue, and dividing a whole section of code into 6 stages of member initialization, construction class, construction function, callback function, abnormal throwing, result feedback and the like;
step 2.6: constructing a component detection queue, and continuously dividing the component detection queue for the stage obtained by analysis to obtain a component rule unit consisting of parameters and annotations;
step 2.7: constructing a unit detection queue, continuously dividing components to form independent rule units such as individuals, relations and member variables, and generating corresponding rule class instances;
step 2.8: constructing a complex unit detection queue, and merging associated rules to generate a rule class instance when context association exists in a unit queue divided by a component unit;
step 2.9: converting the rule class instances generated in the steps 2.7-2.8 into rule files in an xml format, adding error detection comments to the xml rule files, and constructing a rule base;
step 2.10: and integrally packaging each rule to constrain the rule structure and the rule file.
5. The method according to claim 4, wherein in step 2.10, each rule is encapsulated integrally according to an encapsulation paradigm, and the information quantity meeting the basis of the registration rule is input as the rule, and the encapsulation paradigm is shown in table 3:
TABLE 3 encapsulation paradigm table
The description information comprises a scope, a key attribute and a detection level; the attribute group comprises a stage name, a component name, a unit name and a complex unit incidence relation list; and querying a rule file index corresponding to the keyword, wherein the rule file index comprises file error annotation information and keyword information.
6. The method for detecting static state of code based on abstract syntax tree as claimed in claim 1, wherein in step 3, the specific method for rule registration is:
step 3.1: connecting a rule registration interface;
step 3.2: calling a rule registration method;
step 3.3: configuring a dependent package file path;
step 3.4: transmitting the path of the rule;
step 3.5: a user-defined rule object is transmitted;
step 3.6: adding into a registration list;
step 3.7: extracting a rule file keyword annotation list and constructing an Issures list;
step 3.8: an example file of pages is formed.
7. The method for detecting static state of code based on abstract syntax tree as claimed in claim 1, wherein in step 5, the specific method for scanning and analyzing code is:
step 5.1: scanning a code extraction appointed type as a key node, and marking the key node as a starting symbol;
step 5.2: recording each leaf node, and marking the inner node of each leaf node by using a non-terminal character;
step 5.3: and performing lexical analysis, syntactic analysis, semantic analysis and fault detection.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911349236.5A CN111176993A (en) | 2019-12-24 | 2019-12-24 | Code static detection method based on abstract syntax tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911349236.5A CN111176993A (en) | 2019-12-24 | 2019-12-24 | Code static detection method based on abstract syntax tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111176993A true CN111176993A (en) | 2020-05-19 |
Family
ID=70648865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911349236.5A Pending CN111176993A (en) | 2019-12-24 | 2019-12-24 | Code static detection method based on abstract syntax tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111176993A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989731A (en) * | 2021-03-22 | 2021-06-18 | 湖南大学 | Method and system for obtaining integrated circuit modeling based on abstract syntax tree |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101286132A (en) * | 2008-06-02 | 2008-10-15 | 北京邮电大学 | Test method and system based on software defect mode |
CN102339252A (en) * | 2011-07-25 | 2012-02-01 | 大连理工大学 | Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching |
CN103914374A (en) * | 2012-12-31 | 2014-07-09 | 梁彬 | Program slicing and frequent pattern extraction based code defect detection method and device |
CN109857630A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | Code detection method, system and equipment |
CN110147235A (en) * | 2019-03-29 | 2019-08-20 | 中国科学院信息工程研究所 | Semantic comparison method and device between a kind of source code and binary code |
US20190361787A1 (en) * | 2013-03-14 | 2019-11-28 | Whitehat Security, Inc. | Techniques for traversing representations of source code |
-
2019
- 2019-12-24 CN CN201911349236.5A patent/CN111176993A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101286132A (en) * | 2008-06-02 | 2008-10-15 | 北京邮电大学 | Test method and system based on software defect mode |
CN102339252A (en) * | 2011-07-25 | 2012-02-01 | 大连理工大学 | Static state detecting system based on XML (Extensive Makeup Language) middle model and defect mode matching |
CN103914374A (en) * | 2012-12-31 | 2014-07-09 | 梁彬 | Program slicing and frequent pattern extraction based code defect detection method and device |
US20190361787A1 (en) * | 2013-03-14 | 2019-11-28 | Whitehat Security, Inc. | Techniques for traversing representations of source code |
CN109857630A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | Code detection method, system and equipment |
CN110147235A (en) * | 2019-03-29 | 2019-08-20 | 中国科学院信息工程研究所 | Semantic comparison method and device between a kind of source code and binary code |
Non-Patent Citations (1)
Title |
---|
牛婷芝: "一种Java源代码安全分析系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112989731A (en) * | 2021-03-22 | 2021-06-18 | 湖南大学 | Method and system for obtaining integrated circuit modeling based on abstract syntax tree |
CN112989731B (en) * | 2021-03-22 | 2023-10-13 | 湖南大学 | Integrated circuit modeling acquisition method and system based on abstract syntax tree |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033843B (en) | Java file dependency analysis method and module for distributed static detection system | |
US8453126B1 (en) | System and method for converting base SAS runtime macro language scripts to JAVA target language | |
CN110781086B (en) | Cross-project defect influence analysis method | |
US9122540B2 (en) | Transformation of computer programs and eliminating errors | |
US9710243B2 (en) | Parser that uses a reflection technique to build a program semantic tree | |
US20070266378A1 (en) | Source code generation method, apparatus, and program | |
KR101213890B1 (en) | Using strong data types to express speech recognition grammars in software programs | |
CN106843849B (en) | Automatic synthesis method of code model based on library function of document | |
CN106371997B (en) | Code checking method and device | |
CN110673854A (en) | SAS language compiling method, device, equipment and readable storage medium | |
CN103150200A (en) | Computer language transformation system and transformation method from C language to MSVL (Modeling, Simulation and Verification Language) | |
Pérez-Castillo et al. | Reengineering technologies | |
CN114547619A (en) | Vulnerability repairing system and method based on tree | |
CN111176993A (en) | Code static detection method based on abstract syntax tree | |
Youn et al. | Bringing the webassembly standard up to speed with spectec | |
JP2010140407A (en) | Source code inspection device | |
Rajbhoj et al. | DocToModel: automated authoring of models from diverse requirements specification documents | |
CN111414632B (en) | AST self-synthesis-based embedded program data stream security verification method | |
CN117608656A (en) | Mixed front end frame migration method based on AST and LLM | |
CN115268918A (en) | Automatic conversion method from C + + code to C code based on rule template | |
CN115794119A (en) | Case automatic analysis method and device | |
CN107291435A (en) | AADL models are blended together under a kind of Uncertain environments and quantify analysis method | |
US20140143261A1 (en) | Automated semantic enrichment of data | |
Grigorev et al. | String-embedded language support in integrated development environment | |
Bangare et al. | Code parser for object Oriented software Modularization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200519 |