US9710243B2 - Parser that uses a reflection technique to build a program semantic tree - Google Patents
Parser that uses a reflection technique to build a program semantic tree Download PDFInfo
- Publication number
- US9710243B2 US9710243B2 US14/074,444 US201314074444A US9710243B2 US 9710243 B2 US9710243 B2 US 9710243B2 US 201314074444 A US201314074444 A US 201314074444A US 9710243 B2 US9710243 B2 US 9710243B2
- Authority
- US
- United States
- Prior art keywords
- token
- parser
- semantic tree
- tokens
- programming language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
Definitions
- a parser When parsing, a parser analyzes a string of symbols within source code in accordance with the rules of a language within a grammar. On the basis of the analysis, the parser produces, for example, abstract syntax trees (AST). Based on the information within the abstract syntax trees, a semantic analyzer creates a database that includes data flow (typically in the form of symbol tables) and control flow information (indicating, for example, who calls whom). An analysis tool can be used to traverse the abstract syntax trees looking for specific named entities. The analysis tools depend on the names of entities as listed in the grammar. If someone changes any name listed in the grammar, it can cause a problem for the analysis tool searching in the grammar for that old name for that entity.
- AST abstract syntax trees
- grammars often are changed to take into account variations in hardware, operating systems and business-specific conventions.
- the analysis tools need to keep apprised of these changes. If changes in the grammar are not properly communicated and taken into account in operation of the analysis tools, this can raise serious difficulties for correct analysis of the original source programs.
- analysis tools There is ample opportunity for analysis tools to get out of sync with a grammar when many changes are made to the grammar by many different people. For this reason, in general, having only a few people maintain a grammar, a parser and associated analysis tools can help to decrease the possibility of a loss of synchronization between the grammar and the analysis tools.
- a parser and associated analysis tools this makes it difficult to scale up to millions of lines of source code.
- FIG. 1 shows a parser analyzing source code to produce program semantic trees in accordance with an embodiment.
- FIG. 2 illustrates token types in accordance with an embodiment.
- FIG. 3 shows top level parser flow in accordance with an embodiment.
- FIG. 4 shows logic flow for a token sequence parser in accordance with an embodiment.
- FIG. 5 shows a tree structure generated from terminal tokens in accordance with an embodiment.
- FIG. 6 shows logic flow for a token list parser in accordance with an embodiment.
- FIG. 7 shows logic flow for a token chooser parser in accordance with an embodiment.
- FIG. 8 shows logic flow for a precedence chooser in accordance with an embodiment.
- FIG. 9 illustrates creating, saving and restoring program semantic trees in accordance with another embodiment.
- Computer programming languages can be parsed without a traditional grammar, by exploiting a programming technique called reflection.
- reflection programming technique a computer program examines and modifies the structure and behavior of an object at runtime.
- a parser can build program semantic trees (PSTs) where the semantics of legacy programming language can be captured, in addition to just the syntax as in traditional abstract syntax trees (ASTs).
- PSTs program semantic trees
- object-oriented classes are used to represent the grammar of a programming language, such as COBOL or Fortran or even a modern language like C#.
- the object-oriented classes are, for example, represented in a modern programming language like Java.
- the PERFORM verb in COBOL can be expressed as a Java class, and the elements within the PERFORM statement will contain references to the paragraphs and variables used.
- the reflection programming technique is used to make the object-oriented classes work as a grammar used for parsing.
- additional semantic information can be represented in the same object-oriented classes.
- ” represents a logical “OR” and an asterisk “*” indicates zero or more occurrences.
- @OPT indicates an optional token or element.
- the representation of the PERFORM verb shown in Table 3 serves two distinct purposes. First, it can be considered a template for defining the language (such as the PERFORM verb in COBOL), describing all the different ways the language can be used. Second, it can be populated with values as a result of the parsing process. That is, as discussed above, the output representation from the parsing process is a program semantic tree (PST) representation instead of an application syntax tree (AST).
- PST program semantic tree
- AST application syntax tree
- FIG. 1 illustrates the process where a parser 12 within a computing system 10 parses source code 11 to produce program semantic trees 13 .
- Semantic analyzer 14 can be used to perform semantic analysis of program semantic trees 13 .
- Analysis tools 15 can be used to perform further analysis and processing of program semantic trees 13 built by parser 15 .
- analysis tools 15 can be within computing system 10 or another computing system.
- a PST is scalable. Dependencies are caught automatically. That is when using member fields and data types of object-oriented classes of an object oriented language, any change in definition will be detected immediately. If someone were to change the name of an element to a new name not within the current member fields and data types of any object oriented class, all references to the new name would be marked as invalid until they were changed accordingly in the current member fields and data types.
- COBOL_paragraph (see Table 3) has a direct reference to that paragraph, including all of its statements, line numbers, references, etc. This greatly simplifies tool writing. Much of the work in connecting references to definitions can be done as part of the parse process.
- AST version see Table 2
- a cParagraph is just an identifier with a name. There is no further information attached to it. If one writes a tool to analyze or transform a COBOL program, it is necessary to search the rest of the AST to find out what is in that other paragraph.
- PSTs allows the use of modern programming language methodologies. Because the PST itself is represented in a modern programming language like Java or C#, the methodologies of that language can be used in the program definition. Annotation, for example, is used for better output formatting, and is also used for external language documentation.
- PSTs allow for abstraction.
- the components common to all variations of a particular programming language can be placed into an abstract language definition.
- RPG Report Program Generator
- the File specification has a similar meaning across each of them, so only minor syntactic variations need to be included in each version of RPG.
- a traditional grammar is typically tailored for just one version of one programming language.
- object oriented language in PSTs means the full power of the programming language (e.g., Java) is available for representing complicated issues.
- An example of a complicated issue where logic is helpful to assist the parsing process to build the correct hierarchy is in managing the data division level numbers in COBOL.
- reflection is used to populate member fields and data types of object-oriented classes of an object oriented language based on the source code.
- Parser 12 performs token sequence parsing on source code 11 after source code 11 is represented as a sequence of tokens in PST 13 .
- FIG. 2 illustrates the token types.
- every element is an abstract token 101 .
- a terminal token 105 can be, as illustrated by block 107 , a comment, an identifier, a keyword, a literal, a number, a picture, punctuation, or some other lowest level token that does not contain other abstract tokens.
- a token list 104 is one or more of the specified abstract tokens. The number can be zero or more if the token list is marked as optional. For a token chooser 103 , exactly one sub-element is present. Precedence chooser 106 handles arithmetic operator precedence rules.
- Token sequence 102 consists of a sequence of abstract tokens, each of which may be optional (marked with @OPT). In a token sequence, the elements must appear in the specified order, and all of the elements must be present unless marked optional. An unparsed token 108 variation of token sequence 102 is used to report that a small section of the source code was skipped over.
- FIG. 3 shows top level parser flow for parser 12 .
- current position is saved in case the parse fails.
- a check is made for the new highest position in the file holding the source code.
- a token specific parser is called to parse the token in the highest position in the file.
- Token specific parsers are one of terminal token parser 24 , token chooser parser 25 , precedence chooser parser 26 and token sequence parser 27 . Token lists are processed inside token sequence parser 27 .
- a block 28 if the parse failed, the position saved in block 21 is restored.
- FIG. 4 shows logic flow for token sequence parser 27 .
- Block 31 is repeated for each subtoken defined in the token sequence. Where there are not more subtokens, success is reached in a block 32 and token sequence parser 27 returns.
- a check is determined whether the subtoken is a token list. If not, the top level parser, shown in FIG. 3 , is recursively called. If the top level parser returns successfully, control is returned to block 31 . If the top level parser returns unsuccessfully, in a block 36 , a parsing failure is recorded and token sequence parser 27 returns.
- a block 35 the token list parser is called. If the token list parser returns successfully, control is returned to block 31 . If the token list parser returns unsuccessfully, in block 36 , a parsing failure is recorded and token sequence parser 27 returns.
- Table 4 sets out an example from a Javascript program, where there are five required elements and one optional element at the end.
- terminal token 105 is a lowest level token that does not contain other abstract tokens.
- terminal tokens typically include comments, identifiers keywords, pictures, punctuation, and literals.
- Each terminal token type has many variations.
- Literals for example, may be delineated in source code using single quotes (‘), double quotes (“) or both. Literals may allow an escape character before quotes ( ⁇ ”) or two quotes to mean just one quote. Literals also may be allowed to span line boundaries. Numbers may use different notation to represent hexadecimal numbers, floating point numbers, and so on.
- Identifiers and keywords can also be difficult to accurately represent as abstract tokens. In some languages both identifiers and keywords are case sensitive. In other languages, neither identifiers nor keywords are case sensitive. Some languages have reserved keywords that cannot be used as identifiers.
- terminal tokens are often difficult to express due to column constraints, end-of-line issues, and so on.
- the grammar of a programming language is represented in the member fields and data types of object-oriented classes, the full power of a programming language is available for parsing, along with contextual information.
- a block 41 , a block 42 , a block 43 , a block 44 , a block 45 , a block 46 and a block 47 are arranged to reflect both the content and structure of the code set out in Table 5.
- FIG. 6 shows logic flow for a token list parser.
- the item list is initialized as being empty.
- the top level parser is called recursively to handle the next item in the token list. If the top level parser is successful, in a block 53 , the result from the top level parser is added to the items in the list and control is returned to block 52 . If in block 52 , the top level parser is not successful, in a block 54 , a check is made as to the whether this item list is still empty. If so, in a block 55 a parsing failure is recorded and the token list parser returns. If in block 54 , the check shows the item list is not empty, in a block 56 a parsing success is recorded and the token list parser returns.
- Token Lists are used for sequences of one or more of any other token. They are often marked as optional, with @OPT, to indicate that there may be zero or more. They are “greedy” in the sense that they will try to match as many elements as possible. Table 6 below provides a sample from the DOS command prompt program. It includes two Token Lists, the first is optional but the second is not.
- FIG. 7 shows logic flow for token chooser parser 103 .
- each subclass defined in the token chooser is considered in order. Where there are more subclasses, in a block 62 , the top level parser is recursively called. If the top level parser returns successfully with a successful parse, in a block 63 , token chooser parser 103 returns a success.
- the top level parser returns with a failure, control is returned to block 61 and the next subclass is considered. If in block 61 , there are no more subclasses to be considered, in a block 64 , each subclass defined in the token chooser is considered in order. Where there are more subclasses, in a block 65 , the top level parser is recursively called. If the top level parser returns successfully with a successful parse, in a block 66 , token chooser parser 103 returns a success.
- Table 7 below provides an example from COBOL of tokens handled by token chooser parser 103 . There is a list of one or more COBOL_DataSection's and the program allows them in any order.
- FIG. 8 shows a shows logic flow for precedence chooser 106 .
- the precedence chooser solves the problem of left side recursion, which must be solved to properly parse programming syntax involving operators, such as mathematical operators.
- the problem of left side recursion occurs because some patterns must expand on the left side of the operator. It is recursive, in that the same pattern is used for the expansion.
- the “Precedence Chooser” is a solution to this problem. It is a class derived from Token Chooser which extends it to include two different lists of choices. One of these lists contains the “primary choices”, which do not involve left-side-recursion, and the other list of choices holds those patterns that do.
- a block 71 an attempt is made to match any of the primary choices. If there is not a match, in a block 72 , precedence chooser 106 returns with an indication of no match.
- a block 73 the match so far is recorded.
- the recorded match is used as the first part of a left-side recursive pattern.
- a block 75 an attempt is made to match the rest of the left side recursive pattern.
- precedence chooser 106 returns with the match so far. If in a block 75 there is a match, in a block 77 the longer match is recorded as the match so far and control is returned to block 74 .
- Table 8 presents an example of the member fields and data types of object-oriented classes setting out precedence rules for a Delphi program, where Delphi_Multiplicative_Expression has already been declared to have a higher precedence than Delphi_Additive_Expression.
- FIG. 9 is a flow diagram that that illustrates a typical parsing scenario. The flow goes from original parsing 82 to program semantic tree 81 to saving and restoring 83 .
- a project settings file 85 contains information about where each source file is, how to decide what programming language it is in, and so on.
- One source file 86 at a time is parsed, generating a program semantic tree 81 from source file 86 , project settings 85 and empty program semantic tree 84 .
- Program semantic tree 81 can be saved, for example, to an XML file 87 , or compiled into a file 88 containing a program, written in a programming language such as Java. Reporting programs read program semantic trees 81 to determine overall parsing progress.
- the PST can also be manipulated by other analysis tools. Many analysis steps such as inter-module dependencies can't complete until all the source files have been parsed.
- source files can be pre-edited, for example, to remove errors from source files.
- Source files arrive with errors, for example, because source files may be obsolete, may be rarely used, may be in the middle of updates, or any of a host of other reasons.
- the pre-editing is performed on the in-memory version of the source file allowing the original version of source files to be left intact for the entire parsing process.
- the program semantic tree provides a special unparsed token 108 for this purpose. It is processed identically to token sequence 102 except it is reported separately. It will collect all characters until parsing can recover and resume. Typically this will be until the end of the line, or until a special character like a semicolon (;) is reached. Parsing can still be considered successful, as long as the rest of the source file has been parsed so that a program semantic tree (PST) is complete. After parsing is complete, a report can be generated that indicates the unparsed elements in source code. The report can include the troublesome snippets of code that were not parsed.
- a PST in a format such as that shown in Table 3 can be used to generate a traditional grammar in a format such as that shown in Table 2, to interface with other program analysis tools. None of the semantic information in a PST can be stored in a traditional grammar, which only contains syntactic information.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
A grammar of a first programming language is represented in member fields and data types of object-oriented classes of a second programming language as an empty program semantic tree. A parser builds a new program semantic tree that represents source code written in the first programming language. The new program semantic tree is built by a reflection technique in which the member fields and data types of the object-oriented classes of the second programming language as set out in the empty program semantic tree are modified during the building of the new program semantic tree.
Description
American and European businesses have billions of lines of production software that are written in legacy computer languages like COBOL, RPG, PL/I, Fortran and Natural. These businesses are highly motivated to modernize their software, but the process is often either extremely expensive or extremely low quality. The available tools are often not optimized for complex software systems that can have tens of millions of lines of code. The first step in an application modernization project is parsing and analyzing all the existing software.
When parsing, a parser analyzes a string of symbols within source code in accordance with the rules of a language within a grammar. On the basis of the analysis, the parser produces, for example, abstract syntax trees (AST). Based on the information within the abstract syntax trees, a semantic analyzer creates a database that includes data flow (typically in the form of symbol tables) and control flow information (indicating, for example, who calls whom). An analysis tool can be used to traverse the abstract syntax trees looking for specific named entities. The analysis tools depend on the names of entities as listed in the grammar. If someone changes any name listed in the grammar, it can cause a problem for the analysis tool searching in the grammar for that old name for that entity.
Unfortunately, grammars often are changed to take into account variations in hardware, operating systems and business-specific conventions. To avoid problems, the analysis tools need to keep apprised of these changes. If changes in the grammar are not properly communicated and taken into account in operation of the analysis tools, this can raise serious difficulties for correct analysis of the original source programs. There is ample opportunity for analysis tools to get out of sync with a grammar when many changes are made to the grammar by many different people. For this reason, in general, having only a few people maintain a grammar, a parser and associated analysis tools can help to decrease the possibility of a loss of synchronization between the grammar and the analysis tools. However, when only a few people maintain a grammar, a parser and associated analysis tools, this makes it difficult to scale up to millions of lines of source code.
Computer programming languages can be parsed without a traditional grammar, by exploiting a programming technique called reflection. In the reflection programming technique, a computer program examines and modifies the structure and behavior of an object at runtime.
Using reflection, a parser can build program semantic trees (PSTs) where the semantics of legacy programming language can be captured, in addition to just the syntax as in traditional abstract syntax trees (ASTs).
In order to accomplish this, the member fields and data types of object-oriented classes are used to represent the grammar of a programming language, such as COBOL or Fortran or even a modern language like C#. The object-oriented classes are, for example, represented in a modern programming language like Java.
For example, the PERFORM verb in COBOL can be expressed as a Java class, and the elements within the PERFORM statement will contain references to the paragraphs and variables used.
Once the grammar of a programming language is represented in the member fields and data types of object-oriented classes, the reflection programming technique is used to make the object-oriented classes work as a grammar used for parsing.
For example, additional semantic information (such as symbol tables and control flow information) can be represented in the same object-oriented classes.
To illustrate how representing the grammar of a programming language in the member fields and data types of object-oriented classes is different than how grammar is represented in a traditional grammar, consider the following example.
In table 1 below is set out an example of COBOL programming code.
TABLE 1 | |
000160 | READ-SHARED-LOCK. |
000170 | READ SHARED WITH LOCK. |
000180 | IF WS-STATUS = “00” |
000190 | GO TO READ-SHARED-EXIT. |
000200 | IF WS-STAT1 = “2” OR “3” OR “4” |
000210 | |
000220 | PERFORM READ-ERROR. |
000230 | IF RECORD-LOCKED |
COMMIT | |
PERFORM LOCK-USERS-REC THRU LOCK-REC-EXIT | |
WS-COUNT TIMES | |
ADD 1 TO WS-COUNT | |
IF WS-COUNT > 25 | |
MOVE 1 TO WS-COUNT | |
END-IF | |
000250 | GO TO READ-SHARED-LOCK |
ELSE | |
MOVE W02-SHARED | |
TO WS-FILE | |
MOVE WS-SHARED | |
TO WS-KEY | |
000240 | PERFORM LOCKED-RECORD WS-COUNT TIMES |
ADD 1 TO WS-COUNT | |
IF WS-COUNT > 20 | |
MOVE 1 TO WS-COUNT | |
END-IF | |
000250 | GO TO READ-SHARED-LOCK. |
000290 | MOVE 2 TO WS-F-ERROR. |
000300 | PERFORM READ-ERROR. |
000320 | READ-SHARED-EXIT. |
000330 | EXIT. |
Note the “PERFORM” statements in lines 000220 and 000240 of the COBOL code set out in table 1. In a traditional grammar, a (greatly simplified) PERFORM verb in COBOL might be expressed as set out in table 2 below:
TABLE 2 | |
cPerform | := “PERFORM” cParagraph [(“THROUGH” | “THRU”) |
cParagraph] [cPerfTimes]; | |
cPerfTimes | := cExpression “TIMES”; |
cParagraph | := cIdentifier; |
cExpression | := cIdentifier | cNumber; |
cIdentifier | := cLetter (cLetter | cDigit | “-”)*; |
cNumber | := cDigit cDigit*; |
cLetter | := “A”.. “Z”; |
cDigit | : “0”..“9” |
In the expression set out in table 2, a vertical bar “|” represents a logical “OR” and an asterisk “*” indicates zero or more occurrences.
The PERFORM verb in the member fields and data types of object-oriented classes of an object oriented language, such as Java, might be expressed (again greatly simplified) as set out in table 3 below:
TABLE 3 | |
class COBOL_Perform extends COBOL_AbstractStatement { | |
COBOL_Keyword PERFORM; | |
COBOL_Paragraph startPara; | |
@OPT COBOL_PerformThrough through; | |
@OPT COBOL_PerformTimes times; | |
class COBOL_PerformThrough extends TokenSequence { | |
COBOL_KeywordList THRU = new | |
COBOL_KeywordList(“THRU”, “THROUGH”); | |
COBOL_Paragraph endPara; | |
} | |
class COBOL_PerformTimes extends TokenSequence { | |
COBOL_Expression number; | |
COBOL_Keyword TIMES; | |
} | |
} | |
In table 3 and elsewhere herein, @OPT indicates an optional token or element. The representation of the PERFORM verb shown in Table 3 serves two distinct purposes. First, it can be considered a template for defining the language (such as the PERFORM verb in COBOL), describing all the different ways the language can be used. Second, it can be populated with values as a result of the parsing process. That is, as discussed above, the output representation from the parsing process is a program semantic tree (PST) representation instead of an application syntax tree (AST).
Using a PST as the output of a parsing process, rather than an AST has several advantages. A PST is scalable. Dependencies are caught automatically. That is when using member fields and data types of object-oriented classes of an object oriented language, any change in definition will be detected immediately. If someone were to change the name of an element to a new name not within the current member fields and data types of any object oriented class, all references to the new name would be marked as invalid until they were changed accordingly in the current member fields and data types.
Further, in PST output all the information for the source code is directly referenced by objects within the PST. For example, within the PST, COBOL_paragraph (see Table 3) has a direct reference to that paragraph, including all of its statements, line numbers, references, etc. This greatly simplifies tool writing. Much of the work in connecting references to definitions can be done as part of the parse process. By contrast, in the AST version (see Table 2), a cParagraph is just an identifier with a name. There is no further information attached to it. If one writes a tool to analyze or transform a COBOL program, it is necessary to search the rest of the AST to find out what is in that other paragraph.
Additionally, using PSTs allows the use of modern programming language methodologies. Because the PST itself is represented in a modern programming language like Java or C#, the methodologies of that language can be used in the program definition. Annotation, for example, is used for better output formatting, and is also used for external language documentation.
Also, use of PSTs allows for abstraction. When using PSTs, the components common to all variations of a particular programming language can be placed into an abstract language definition. For example, there are many major variations of languages like Report Program Generator (RPG). The File specification has a similar meaning across each of them, so only minor syntactic variations need to be included in each version of RPG. By contrast, a traditional grammar is typically tailored for just one version of one programming language.
Use of object oriented language in PSTs also allows for taking advantage of inheritance. Frequently, there are variations on a computer programming language. With a traditional grammar, the whole grammar gets copied and edited for each variation. In PST, which allows for program inheritance, only the local changes need to be considered and the rest can be inherited from the main program.
Use of object oriented language in PSTs also allows for encapsulation. Some computer languages, such as HTML used for web pages, often include other languages inside of them, such as Javascript or PHP. In a traditional grammar, these are normally combined into a monolithic grammar covering all sub-languages. With program encapsulation, the main program (e.g., HTML) can simply reference the other program (e.g., Javascript or PHP).
Use of object oriented language in PSTs means the full power of the programming language (e.g., Java) is available for representing complicated issues. An example of a complicated issue where logic is helpful to assist the parsing process to build the correct hierarchy is in managing the data division level numbers in COBOL. By contrast, it can be difficult to mix procedural logic with a grammar that is declarative.
During the parsing process, reflection is used to populate member fields and data types of object-oriented classes of an object oriented language based on the source code. There is no separate grammar (other than the member fields and data types of object-oriented classes), and there is no AST. The result is a robust representation of the original source code.
Various modern computer languages can be used to generate program semantic trees. For example, IBM Assembler, Fortran, PL/I, RPG, Java, Visual Basic, Delphi, DOS, SQL, and many more programming languages have been parsed and analyzed using this technique.
A token list 104 is one or more of the specified abstract tokens. The number can be zero or more if the token list is marked as optional. For a token chooser 103, exactly one sub-element is present. Precedence chooser 106 handles arithmetic operator precedence rules.
In a block 28, if the parse failed, the position saved in block 21 is restored.
In a block 33, a check is determined whether the subtoken is a token list. If not, the top level parser, shown in FIG. 3 , is recursively called. If the top level parser returns successfully, control is returned to block 31. If the top level parser returns unsuccessfully, in a block 36, a parsing failure is recorded and token sequence parser 27 returns.
If in block 33, the check determines the subtoken is a token list, a block 35 the token list parser is called. If the token list parser returns successfully, control is returned to block 31. If the token list parser returns unsuccessfully, in block 36, a parsing failure is recorded and token sequence parser 27 returns.
As pointed out above, for a token sequence, the elements must appear in the specified order, and all of the elements must be present unless marked optional.
Table 4 below sets out an example from a Javascript program, where there are five required elements and one optional element at the end.
TABLE 4 |
public class Javascript_IfStatement extends TokenSequence { |
public Javascript_Keyword IF = new Javascript_Keyword(”if”); |
public Javascript_Punctuation leftParen = new |
Javascript_Punctuation(‘(‘); |
public Javascript_Expression condition; |
public Javascript_Punctuation rightParen = new |
Javascript_Punctuation(‘)’); |
public Javascript_Statement thenStatement; |
public @OPT Javascript_IfElseClause elseClause; |
} |
As discussed above, a terminal token 105 is a lowest level token that does not contain other abstract tokens. In the source code, terminal tokens typically include comments, identifiers keywords, pictures, punctuation, and literals. Each terminal token type has many variations. Literals, for example, may be delineated in source code using single quotes (‘), double quotes (“) or both. Literals may allow an escape character before quotes (\”) or two quotes to mean just one quote. Literals also may be allowed to span line boundaries. Numbers may use different notation to represent hexadecimal numbers, floating point numbers, and so on.
Identifiers and keywords can also be difficult to accurately represent as abstract tokens. In some languages both identifiers and keywords are case sensitive. In other languages, neither identifiers nor keywords are case sensitive. Some languages have reserved keywords that cannot be used as identifiers.
In a traditional grammar, terminal tokens are often difficult to express due to column constraints, end-of-line issues, and so on. However, when the grammar of a programming language is represented in the member fields and data types of object-oriented classes, the full power of a programming language is available for parsing, along with contextual information.
For example, consider the COBOL code set out in table 5:
TABLE 5 | ||||
01 | PTR-ITEMS. | |||
02 | UNSTR-PTR | PIC 99. | ||
88 END-OF- | VALUE | 61. | ||
88 END-OF- | VALUE | 36. | ||
02 | STR-PTR | PIC 99. | ||
02 | NAME-END | PIC 99. | ||
When the full power of a programming language is available for parsing the code in Table 5, the structure in the code in Table 5 can be preserved in a program semantic tree, as shown in FIG. 5 . That is, in FIG. 5 , a block 41, a block 42, a block 43, a block 44, a block 45, a block 46 and a block 47 are arranged to reflect both the content and structure of the code set out in Table 5.
Token Lists are used for sequences of one or more of any other token. They are often marked as optional, with @OPT, to indicate that there may be zero or more. They are “greedy” in the sense that they will try to match as many elements as possible. Table 6 below provides a sample from the DOS command prompt program. It includes two Token Lists, the first is optional but the second is not.
TABLE 6 |
public class CMD_For_Statement extends TokenSequence { |
public CMD_Keyword FOR = new CMD_Keyword(“FOR”); |
public @OPT TokenList <CMD_For_Option> opts; |
public CMD_Percent_Variable var; |
public CMD_Keyword IN = new CMD_Keyword(“IN”); |
public TokenList <CMD_For_Argument> args; |
public CMD_Keyword DO = new CMD_Keyword(“DO”); |
public CMD_Statement stmt; |
} |
If in block 62, the top level parser returns with a failure, control is returned to block 61 and the next subclass is considered. If in block 61, there are no more subclasses to be considered, in a block 64, each subclass defined in the token chooser is considered in order. Where there are more subclasses, in a block 65, the top level parser is recursively called. If the top level parser returns successfully with a successful parse, in a block 66, token chooser parser 103 returns a success.
If in block 65, the top level parser returns with a failure, control is returned to block 64 and the next subclass is considered. If in block 64, there are no more subclasses to be considered, in a block 67 a parse failure is returned.
Table 7 below provides an example from COBOL of tokens handled by token chooser parser 103. There is a list of one or more COBOL_DataSection's and the program allows them in any order.
TABLE 7 |
public class COBOL_DataDivision extends TokenSequence { |
public COBOL_Keyword DATA = new COBOL_Keyword(”DATA”); |
public COBOL_Keyword DIVISION = new |
COBOL_Keyword(”DIVISION”); |
public COBOL_Punctuation dot = new COBOL_Punctuation(‘.’); |
public TokenList <COBOL_DataSection> sections; |
} |
public class COBOL_DataSection extends TokenChooser { |
public COBOL_FileSection fileSection; |
public COBOL_WorkingStorageSection workingStorageSection; |
public COBOL_ScreenSection screenSection; |
public COBOL_LinkageSection linkageSection; |
public COBOL_ReportSection reportSection; |
} |
For example, in the mathematical statement “x-y-z”, the correct order of operations can be described as “(x-y)-z” where the text inside the parenthesis is matched by the same pattern that matches the full statement. This creates a problem because the pattern must first be matched on the left-side of the operator, and then used again to match a longer pattern.
The “Precedence Chooser” is a solution to this problem. It is a class derived from Token Chooser which extends it to include two different lists of choices. One of these lists contains the “primary choices”, which do not involve left-side-recursion, and the other list of choices holds those patterns that do.
In a block 71, an attempt is made to match any of the primary choices. If there is not a match, in a block 72, precedence chooser 106 returns with an indication of no match.
If in block 71, there is a match, in a block 73 the match so far is recorded. In a block 74, the recorded match is used as the first part of a left-side recursive pattern. In a block 75, an attempt is made to match the rest of the left side recursive pattern. When there are no more matches, in a block 76, precedence chooser 106 returns with the match so far. If in a block 75 there is a match, in a block 77 the longer match is recorded as the match so far and control is returned to block 74.
Table 8 presents an example of the member fields and data types of object-oriented classes setting out precedence rules for a Delphi program, where Delphi_Multiplicative_Expression has already been declared to have a higher precedence than Delphi_Additive_Expression.
TABLE 8 |
public class Delphi_Additive_Expression extends TokenSequence { |
public Delphi_Expression addend1 = new Delphi_Expression( |
AllowedPrecedence.ATLEAST, Delphi_Additive_Expression.class); |
public Delphi_Additive_Operator addOp; |
public Delphi_Expression addend2 = new Delphi_Expression( |
AllowedPrecedence.HIGHER, Delphi_Additive_Expression.class); |
} |
public class Delphi_Additive_Operator extends TokenChooser { |
public Delphi_Punctuation plus = new Delphi_Punctuation(‘+’); |
public Delphi_Punctuation minus = new Delphi_Punctuation(‘−’); |
public Delphi_Keyword OR = new Delphi_Keyword(”Or”); |
public Delphi_Keyword XOR = new Delphi_Keyword(”Xor”); |
} |
public class Delphi_Multiplicative_Expression extends TokenSequence { |
public Delphi_Expression factor1 = new Delphi_Expression( |
AllowedPrecedence.ATLEAST, Delphi_Multiplicative_Expression. |
class); |
public Delphi_Multiplicative_Operator multOp; |
public Delphi_Expression factor2 = new Delphi_Expression( |
AllowedPrecedence.HIGHER, Delphi_Multiplicative_Expression. |
class); |
} |
public class Delphi_Multiplicative_Operator extends TokenChooser { |
public Delphi_Punctuation times = new Delphi_Punctuation(‘*’); |
public Delphi_Punctuation divide = new Delphi_Punctuation(‘/’); |
public Delphi_Keyword DIV = new Delphi_Keyword(”Div”); |
public Delphi_Keyword MOD = new Delphi_Keyword(”Mod”); |
public Delphi_Keyword AND = new Delphi_Keyword(”And”); |
} |
With the Precedence Chooser, it is much easier to write the grammatical rules for expressions, and the resulting PST is very compact, without unnecessary intermediate levels.
The PST can also be manipulated by other analysis tools. Many analysis steps such as inter-module dependencies can't complete until all the source files have been parsed.
When starting the parsing process, source files can be pre-edited, for example, to remove errors from source files. Source files arrive with errors, for example, because source files may be obsolete, may be rarely used, may be in the middle of updates, or any of a host of other reasons.
The pre-editing is performed on the in-memory version of the source file allowing the original version of source files to be left intact for the entire parsing process.
Occasionally there are unparsable elements. This can result because grammars and programs are rarely perfect. Both evolve over time. It is often useful to defer difficult parsing issues, or to skip over particularly difficult lines of source code. The program semantic tree provides a special unparsed token 108 for this purpose. It is processed identically to token sequence 102 except it is reported separately. It will collect all characters until parsing can recover and resume. Typically this will be until the end of the line, or until a special character like a semicolon (;) is reached. Parsing can still be considered successful, as long as the rest of the source file has been parsed so that a program semantic tree (PST) is complete. After parsing is complete, a report can be generated that indicates the unparsed elements in source code. The report can include the troublesome snippets of code that were not parsed.
A PST in a format such as that shown in Table 3 can be used to generate a traditional grammar in a format such as that shown in Table 2, to interface with other program analysis tools. None of the semantic information in a PST can be stored in a traditional grammar, which only contains syntactic information.
When modernizing a legacy application, it is important to know how frequently each of the language elements are used. Those that are used frequently should be transformed using automation, while those with just a few instances are candidates for manual transformation. Given particular project settings, such as that stored in project settings file 85, a report can be generated that shows frequency counts for all elements in an empty program semantic tree, such as empty program semantic tree 84.
The foregoing discussion discloses and describes merely exemplary methods and implementations. As will be understood by those familiar with the art, the disclosed subject matter may be embodied in other specific forms without departing from the spirit or characteristics thereof. Accordingly, the present disclosure is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
Claims (12)
1. A computer implemented parsing method, comprising:
representing a grammar of a first programming language in member fields and data types of object-oriented classes of a second programming language as an empty program semantic tree; and,
building a new program semantic tree that represents source code written in the first programming language, the new program semantic tree being built by a reflection technique in which the member fields and data types of the object-oriented classes of the second programming language as set out in the empty program semantic tree are modified during the building of the new program semantic tree, wherein building the new program semantic tree includes utilizing one or more of:
a top level parsing routine to call token specific parsers,
a parser to handle tokens in a token sequence, and/or
a precedence chooser parser to parse programming syntax involving mathematical operators.
2. A computer implemented parsing method as in claim 1 wherein building the new program semantic tree includes:
converting the source code into abstract tokens, the abstract tokens including:
token sequences;
token choosers;
precedence tokens;
unparsed tokens;
token lists; and
terminal tokens.
3. A computer implemented parsing method as in claim 1 wherein the token sequence parser calls a token list parser when there is a token list.
4. A computer implemented parsing method as in claim 3 wherein the token list parser recursively calls the top level parser for each token in the token list.
5. A computing device comprising:
hardware for running computer programs;
memory, for storing computer programs and data;
a grammar of a first programming language represented in member fields and data types of object-oriented classes of a second programming language as an empty program semantic tree, the grammar being stored in the memory; and,
a parser, run on the hardware, that builds a new program semantic tree that represents source code written in the first programming language, the new program semantic tree being built by a reflection technique in which the member fields and data types of the object-oriented classes of the second programming language as set out in the empty program semantic tree are modified during the building of the new program semantic tree, wherein the parser includes one or more of:
a top level parsing routine to call token specific parsers,
a parser to handle tokens in a token sequence, and/or
a precedence chooser parser to parse programming syntax involving mathematical operators.
6. A computing device as in claim 5 wherein the parsing program converts the source code into abstract tokens, the abstract tokens including:
token sequences;
token choosers;
precedence tokens;
unparsed tokens;
token lists; and
terminal tokens.
7. A computing device as in claim 5 wherein the token specific parsers include a token list parser called by the token sequence parser calls when there is a token list.
8. A computing device as in claim 7 wherein the token list parser recursively calls the top level parser for each token in the token list.
9. Non-transient storage media that stores software which when run on a computer performs a computer implemented parsing method, comprising:
representing a grammar of a first programming language in member fields and data types of object-oriented classes of a second programming language as an empty program semantic tree; and,
building a new program semantic tree that represents source code written in the first programming language, the new program semantic tree being built by a reflection technique in which the member fields and data types of the object-oriented classes of the second programming language as set out in the empty program semantic tree are modified during the building of the new program semantic tree, wherein building the new program semantic tree includes utilizing one or more of:
a top level parsing routine to call token specific parsers,
a parser to handle tokens in a token sequence, and/or
a precedence chooser parser to parse programming syntax involving mathematical operators.
10. Non-transient storage media stores software as in claim 9 wherein building the new program semantic tree includes:
converting the source code into abstract tokens, the abstract tokens including:
token sequences;
token choosers;
precedence tokens;
unparsed tokens;
token lists; and
terminal tokens.
11. Non-transient storage media stores software as in claim 9 wherein the token sequence parser calls a token list parser when there is a token list.
12. Non-transient storage media stores software as in claim 11 wherein the token list parser recursively calls the top level parser for each token in the token list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/074,444 US9710243B2 (en) | 2013-11-07 | 2013-11-07 | Parser that uses a reflection technique to build a program semantic tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/074,444 US9710243B2 (en) | 2013-11-07 | 2013-11-07 | Parser that uses a reflection technique to build a program semantic tree |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150128114A1 US20150128114A1 (en) | 2015-05-07 |
US9710243B2 true US9710243B2 (en) | 2017-07-18 |
Family
ID=53008044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/074,444 Active 2034-08-24 US9710243B2 (en) | 2013-11-07 | 2013-11-07 | Parser that uses a reflection technique to build a program semantic tree |
Country Status (1)
Country | Link |
---|---|
US (1) | US9710243B2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180300313A1 (en) * | 2017-04-05 | 2018-10-18 | Voicebox Technologies Corporation | System and method for generating a multi-lingual and multi-intent capable semantic parser based on automatically generated operators and user-designated utterances relating to the operators |
CN109753283A (en) * | 2018-12-29 | 2019-05-14 | 北京辰安科技股份有限公司 | Handle the authority control method and device of front end page |
US20210208857A1 (en) * | 2020-01-08 | 2021-07-08 | Fujitsu Limited | Parsability of code snippets |
US11068244B2 (en) * | 2019-10-01 | 2021-07-20 | Salesforce.Com, Inc. | Optimized transpilation |
US11182565B2 (en) * | 2018-02-23 | 2021-11-23 | Samsung Electronics Co., Ltd. | Method to learn personalized intents |
US11314940B2 (en) | 2018-05-22 | 2022-04-26 | Samsung Electronics Co., Ltd. | Cross domain personalized vocabulary learning in intelligent assistants |
US20220357934A1 (en) * | 2021-05-05 | 2022-11-10 | Michael Ling | Methods, devices, and media for two-pass source code transformation |
US20230289523A1 (en) * | 2022-03-11 | 2023-09-14 | Microsoft Technology Licensing, Llc | Language-agnostic computer program repair engine generator |
US12141560B2 (en) | 2022-11-15 | 2024-11-12 | Bank Of America Corporation | Hybrid-feedback driven transpiler system |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9361075B2 (en) | 2014-11-12 | 2016-06-07 | International Business Machines Corporation | Contraction aware parsing system for domain-specific languages |
US10282400B2 (en) * | 2015-03-05 | 2019-05-07 | Fujitsu Limited | Grammar generation for simple datatypes |
US10311137B2 (en) * | 2015-03-05 | 2019-06-04 | Fujitsu Limited | Grammar generation for augmented datatypes for efficient extensible markup language interchange |
DE102015105436A1 (en) * | 2015-04-09 | 2016-10-13 | Beckhoff Automation Gmbh | Translation module, processing module and control system |
US10072985B2 (en) * | 2015-10-26 | 2018-09-11 | Bosch Security Systems, Inc. | Detector housing assembly |
CN107291521B (en) * | 2016-03-31 | 2020-12-04 | 阿里巴巴集团控股有限公司 | Method and apparatus for compiling computer language |
US10379825B2 (en) | 2017-05-22 | 2019-08-13 | Ab Initio Technology Llc | Automated dependency analyzer for heterogeneously programmed data processing system |
US10733075B2 (en) * | 2018-08-22 | 2020-08-04 | Fujitsu Limited | Data-driven synthesis of fix patterns |
CN110018829B (en) * | 2019-04-01 | 2022-11-11 | 北京东方国信科技股份有限公司 | Method and device for improving execution efficiency of PL/SQL language interpreter |
CN110928547A (en) * | 2019-10-16 | 2020-03-27 | 平安普惠企业管理有限公司 | Public file extraction method, device, terminal and storage medium |
CN112698825B (en) * | 2021-01-08 | 2024-04-02 | 乐聚(深圳)机器人技术有限公司 | Programming building block conversion method, device, processing equipment and storage medium |
CN113110947B (en) * | 2021-04-16 | 2024-04-02 | 中国工商银行股份有限公司 | Program call chain generation method, system, electronic device and medium |
Citations (62)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5408665A (en) | 1993-04-30 | 1995-04-18 | Borland International, Inc. | System and methods for linking compiled code with extended dictionary support |
US5493678A (en) * | 1988-09-26 | 1996-02-20 | International Business Machines Corporation | Method in a structure editor |
JPH10154079A (en) | 1996-09-30 | 1998-06-09 | Hitachi Software Eng Co Ltd | Program conversion device and storage medium |
US5812853A (en) * | 1994-04-11 | 1998-09-22 | Lucent Technologies Inc. | Method and apparatus for parsing source code using prefix analysis |
US5857212A (en) | 1995-07-06 | 1999-01-05 | Sun Microsystems, Inc. | System and method for horizontal alignment of tokens in a structural representation program editor |
US20030084424A1 (en) * | 2001-07-26 | 2003-05-01 | Reddy Sreedhar Sannareddy | Pattern-based comparison and merging of model versions |
US20030101195A1 (en) | 2001-08-14 | 2003-05-29 | Christian Linhart | Symbol repository |
US20030196195A1 (en) * | 2002-04-15 | 2003-10-16 | International Business Machines Corporation | Parsing technique to respect textual language syntax and dialects dynamically |
US20040194072A1 (en) | 2003-03-25 | 2004-09-30 | Venter Barend H. | Multi-language compilation |
US20040225999A1 (en) * | 2003-05-06 | 2004-11-11 | Andrew Nuss | Grammer for regular expressions |
US20050005266A1 (en) * | 1997-05-01 | 2005-01-06 | Datig William E. | Method of and apparatus for realizing synthetic knowledge processes in devices for useful applications |
US20050015753A1 (en) * | 2003-07-18 | 2005-01-20 | Erik Meijer | Virtual method protection |
US20050050525A1 (en) * | 2003-08-25 | 2005-03-03 | Chittar Rajendra S. | System and method of universal programming language conversion |
US20050246685A1 (en) * | 2000-12-30 | 2005-11-03 | Braddock Daniel M Jr | Object oriented ADN and method of converting a non-object oriented computer language to an object oriented computer language |
US20060005174A1 (en) * | 2004-07-01 | 2006-01-05 | International Business Machines Corporation | Defining hierarchical structures with markup languages and reflection |
US20060009962A1 (en) * | 2004-07-09 | 2006-01-12 | Microsoft Corporation | Code conversion using parse trees |
US20060031820A1 (en) * | 2004-08-09 | 2006-02-09 | Aizhong Li | Method for program transformation and apparatus for COBOL to Java program transformation |
US20060047691A1 (en) | 2004-08-31 | 2006-03-02 | Microsoft Corporation | Creating a document index from a flex- and Yacc-generated named entity recognizer |
US20060117307A1 (en) * | 2004-11-24 | 2006-06-01 | Ramot At Tel-Aviv University Ltd. | XML parser |
US20060143597A1 (en) * | 2004-12-29 | 2006-06-29 | Eyal Alaluf | Method and a software product for adapting a .NET framework compliant reflection mechanism to a java environment |
US20060206860A1 (en) * | 1999-05-17 | 2006-09-14 | Invensys Systems, Inc. | Process control configuration system with connection validation and configuration |
US20070011669A1 (en) * | 2005-07-06 | 2007-01-11 | International Business Machines Corporation | Software migration |
US20070022414A1 (en) * | 2005-07-25 | 2007-01-25 | Hercules Software, Llc | Direct execution virtual machine |
US20070044066A1 (en) * | 2005-08-19 | 2007-02-22 | Microsoft Corporation | Embedded multi-language programming |
US20070050760A1 (en) * | 2005-08-30 | 2007-03-01 | Erxiang Liu | Generation of application specific xml parsers using jar files with package paths that match the xml xpaths |
US20070113221A1 (en) * | 2005-08-30 | 2007-05-17 | Erxiang Liu | XML compiler that generates an application specific XML parser at runtime and consumes multiple schemas |
US20070226708A1 (en) * | 2006-03-24 | 2007-09-27 | International Business Machines Corporation | Source-to-source transformation for language dialects |
US7302383B2 (en) * | 2002-09-12 | 2007-11-27 | Luis Calixto Valles | Apparatus and methods for developing conversational applications |
US20070300212A1 (en) * | 2006-06-26 | 2007-12-27 | Kersters Christian J | Modifying a File Written in a Formal Language |
US20080091409A1 (en) * | 2006-10-16 | 2008-04-17 | Microsoft Corporation | Customizable mathematic expression parser and evaluator |
US20080189683A1 (en) * | 2007-02-02 | 2008-08-07 | Microsoft Corporation | Direct Access of Language Metadata |
US20080201355A1 (en) | 2007-02-16 | 2008-08-21 | Microsoft Corporation | Easily queriable software repositories |
US7526755B2 (en) | 2003-10-08 | 2009-04-28 | Microsoft Corporation | Plug-in pre- and postconditions for static program analysis |
US20090222799A1 (en) * | 2008-02-29 | 2009-09-03 | Neil Stewart | System representation and handling techniques |
US20100162204A1 (en) * | 2008-12-22 | 2010-06-24 | International Business Machines Corporation | Method and system for automatically adding generic change log to legacy application |
US7774746B2 (en) * | 2006-04-19 | 2010-08-10 | Apple, Inc. | Generating a format translator |
US20110138373A1 (en) * | 2009-12-08 | 2011-06-09 | American National Laboratories, Inc. | Method and apparatus for globally optimizing instruction code |
US20110161940A1 (en) | 2009-12-28 | 2011-06-30 | Frank Brunswig | Multi-language support for service adaptation |
US20110167088A1 (en) * | 2010-01-07 | 2011-07-07 | Microsoft Corporation | Efficient immutable syntax representation with incremental change |
US8027946B1 (en) * | 2006-12-22 | 2011-09-27 | Avaya Inc. | Higher order logic applied to expert systems for alarm analysis, filtering, correlation and root cause |
US8060857B2 (en) * | 2009-01-31 | 2011-11-15 | Ted J. Biggerstaff | Automated partitioning of a computation for parallel or other high capability architecture |
US20110283270A1 (en) * | 2010-05-11 | 2011-11-17 | Albrecht Gass | Systems and methods for analyzing changes in application code from a previous instance of the application code |
US20110283269A1 (en) * | 2010-05-11 | 2011-11-17 | Albrecht Gass | Systems and methods for applying rules to transform objects of an application |
US20110296391A1 (en) * | 2010-05-28 | 2011-12-01 | Albrecht Gass | Systems and Methods for Dynamically Replacing Code Objects Via Conditional Pattern Templates |
US8132156B2 (en) * | 2007-06-14 | 2012-03-06 | Red Hat, Inc. | Methods and systems for testing tool with comparative testing |
US8166462B2 (en) * | 2006-09-07 | 2012-04-24 | Oracle America, Inc. | Method and apparatus for sorting and displaying costs in a data space profiler |
US8176475B2 (en) * | 2006-10-31 | 2012-05-08 | Oracle America, Inc. | Method and apparatus for identifying instructions associated with execution events in a data space profiler |
US8181167B2 (en) | 2008-01-09 | 2012-05-15 | Kan Zhao | Method and system for presenting and analyzing software source code through intermediate representation |
US20120144376A1 (en) * | 2009-06-02 | 2012-06-07 | Vector Fabrics B.V. | Embedded system development |
US20120191446A1 (en) * | 2009-07-15 | 2012-07-26 | Proviciel - Mlstate | System and method for creating a parser generator and associated computer program |
US8302085B2 (en) * | 2005-02-16 | 2012-10-30 | University College Cork—National University of Ireland | Method for developing software code and estimating processor execution time |
US20120323863A1 (en) * | 2011-06-16 | 2012-12-20 | Microsoft Corporation | Semantic reflection storage and automatic reconciliation of hierarchical messages |
US20130124545A1 (en) | 2011-11-15 | 2013-05-16 | Business Objects Software Limited | System and method implementing a text analysis repository |
US8453126B1 (en) | 2008-07-30 | 2013-05-28 | Dulles Research LLC | System and method for converting base SAS runtime macro language scripts to JAVA target language |
US20130159976A1 (en) * | 2011-12-16 | 2013-06-20 | Microsoft Corporation | Abstract syntax tree transformation |
US8479178B2 (en) * | 2008-06-27 | 2013-07-02 | Microsoft Corporation | Compiler in a managed application context |
US20130174131A1 (en) * | 2012-01-04 | 2013-07-04 | International Business Machines Corporation | Code converting method, program, and system |
US20130326204A1 (en) * | 2012-05-31 | 2013-12-05 | New York University | Configuration-Preserving Preprocessor and Configuration-Preserving Parser |
US20140149970A1 (en) * | 2012-11-29 | 2014-05-29 | International Business Machines Corporation | Optimising a compilation parser for parsing computer program code in arbitrary applications |
US8924924B2 (en) * | 2010-03-29 | 2014-12-30 | Microsoft Corporation | Representing the structure of a data format using a class-based representation |
US9182962B2 (en) * | 2010-12-09 | 2015-11-10 | Todd Bradley KNEISEL | Method for translating a cobol source program into readable and maintainable program code in an object oriented second programming language |
US9519465B2 (en) * | 2010-10-28 | 2016-12-13 | Innowake Gmbh | Method and system for generating code |
-
2013
- 2013-11-07 US US14/074,444 patent/US9710243B2/en active Active
Patent Citations (68)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5493678A (en) * | 1988-09-26 | 1996-02-20 | International Business Machines Corporation | Method in a structure editor |
US5408665A (en) | 1993-04-30 | 1995-04-18 | Borland International, Inc. | System and methods for linking compiled code with extended dictionary support |
US5812853A (en) * | 1994-04-11 | 1998-09-22 | Lucent Technologies Inc. | Method and apparatus for parsing source code using prefix analysis |
US5857212A (en) | 1995-07-06 | 1999-01-05 | Sun Microsystems, Inc. | System and method for horizontal alignment of tokens in a structural representation program editor |
JPH10154079A (en) | 1996-09-30 | 1998-06-09 | Hitachi Software Eng Co Ltd | Program conversion device and storage medium |
US20050005266A1 (en) * | 1997-05-01 | 2005-01-06 | Datig William E. | Method of and apparatus for realizing synthetic knowledge processes in devices for useful applications |
US20070219933A1 (en) * | 1997-05-01 | 2007-09-20 | Datig William E | Method of and apparatus for realizing synthetic knowledge processes in devices for useful applications |
US20060206860A1 (en) * | 1999-05-17 | 2006-09-14 | Invensys Systems, Inc. | Process control configuration system with connection validation and configuration |
US20050246685A1 (en) * | 2000-12-30 | 2005-11-03 | Braddock Daniel M Jr | Object oriented ADN and method of converting a non-object oriented computer language to an object oriented computer language |
US20030084424A1 (en) * | 2001-07-26 | 2003-05-01 | Reddy Sreedhar Sannareddy | Pattern-based comparison and merging of model versions |
US20030101195A1 (en) | 2001-08-14 | 2003-05-29 | Christian Linhart | Symbol repository |
US20030196195A1 (en) * | 2002-04-15 | 2003-10-16 | International Business Machines Corporation | Parsing technique to respect textual language syntax and dialects dynamically |
US7302383B2 (en) * | 2002-09-12 | 2007-11-27 | Luis Calixto Valles | Apparatus and methods for developing conversational applications |
US20040194072A1 (en) | 2003-03-25 | 2004-09-30 | Venter Barend H. | Multi-language compilation |
US7219338B2 (en) | 2003-03-25 | 2007-05-15 | Microsoft Corporation | Multi-language compilation |
US20040225999A1 (en) * | 2003-05-06 | 2004-11-11 | Andrew Nuss | Grammer for regular expressions |
US20050015753A1 (en) * | 2003-07-18 | 2005-01-20 | Erik Meijer | Virtual method protection |
US20050050525A1 (en) * | 2003-08-25 | 2005-03-03 | Chittar Rajendra S. | System and method of universal programming language conversion |
US7526755B2 (en) | 2003-10-08 | 2009-04-28 | Microsoft Corporation | Plug-in pre- and postconditions for static program analysis |
US20060005174A1 (en) * | 2004-07-01 | 2006-01-05 | International Business Machines Corporation | Defining hierarchical structures with markup languages and reflection |
US20060009962A1 (en) * | 2004-07-09 | 2006-01-12 | Microsoft Corporation | Code conversion using parse trees |
US20060031820A1 (en) * | 2004-08-09 | 2006-02-09 | Aizhong Li | Method for program transformation and apparatus for COBOL to Java program transformation |
US20060047691A1 (en) | 2004-08-31 | 2006-03-02 | Microsoft Corporation | Creating a document index from a flex- and Yacc-generated named entity recognizer |
US20060117307A1 (en) * | 2004-11-24 | 2006-06-01 | Ramot At Tel-Aviv University Ltd. | XML parser |
US20060143597A1 (en) * | 2004-12-29 | 2006-06-29 | Eyal Alaluf | Method and a software product for adapting a .NET framework compliant reflection mechanism to a java environment |
US8302085B2 (en) * | 2005-02-16 | 2012-10-30 | University College Cork—National University of Ireland | Method for developing software code and estimating processor execution time |
US20070011669A1 (en) * | 2005-07-06 | 2007-01-11 | International Business Machines Corporation | Software migration |
US20070022414A1 (en) * | 2005-07-25 | 2007-01-25 | Hercules Software, Llc | Direct execution virtual machine |
US20070044066A1 (en) * | 2005-08-19 | 2007-02-22 | Microsoft Corporation | Embedded multi-language programming |
US20070050760A1 (en) * | 2005-08-30 | 2007-03-01 | Erxiang Liu | Generation of application specific xml parsers using jar files with package paths that match the xml xpaths |
US20070113221A1 (en) * | 2005-08-30 | 2007-05-17 | Erxiang Liu | XML compiler that generates an application specific XML parser at runtime and consumes multiple schemas |
US20070226708A1 (en) * | 2006-03-24 | 2007-09-27 | International Business Machines Corporation | Source-to-source transformation for language dialects |
US8015554B2 (en) * | 2006-03-24 | 2011-09-06 | International Business Machines Corporation | Source-to-source transformation for language dialects |
US7774746B2 (en) * | 2006-04-19 | 2010-08-10 | Apple, Inc. | Generating a format translator |
US20070300212A1 (en) * | 2006-06-26 | 2007-12-27 | Kersters Christian J | Modifying a File Written in a Formal Language |
US8166462B2 (en) * | 2006-09-07 | 2012-04-24 | Oracle America, Inc. | Method and apparatus for sorting and displaying costs in a data space profiler |
US20080091409A1 (en) * | 2006-10-16 | 2008-04-17 | Microsoft Corporation | Customizable mathematic expression parser and evaluator |
US8176475B2 (en) * | 2006-10-31 | 2012-05-08 | Oracle America, Inc. | Method and apparatus for identifying instructions associated with execution events in a data space profiler |
US8219512B2 (en) * | 2006-12-22 | 2012-07-10 | Avaya Inc. | Higher order logic applied to expert systems for alarm analysis, filtering, correlation and root causes which converts a specification proof into a program language |
US8027946B1 (en) * | 2006-12-22 | 2011-09-27 | Avaya Inc. | Higher order logic applied to expert systems for alarm analysis, filtering, correlation and root cause |
US20080189683A1 (en) * | 2007-02-02 | 2008-08-07 | Microsoft Corporation | Direct Access of Language Metadata |
US20080201355A1 (en) | 2007-02-16 | 2008-08-21 | Microsoft Corporation | Easily queriable software repositories |
US8132156B2 (en) * | 2007-06-14 | 2012-03-06 | Red Hat, Inc. | Methods and systems for testing tool with comparative testing |
US8181167B2 (en) | 2008-01-09 | 2012-05-15 | Kan Zhao | Method and system for presenting and analyzing software source code through intermediate representation |
US20090222799A1 (en) * | 2008-02-29 | 2009-09-03 | Neil Stewart | System representation and handling techniques |
US8479178B2 (en) * | 2008-06-27 | 2013-07-02 | Microsoft Corporation | Compiler in a managed application context |
US8453126B1 (en) | 2008-07-30 | 2013-05-28 | Dulles Research LLC | System and method for converting base SAS runtime macro language scripts to JAVA target language |
US20100162204A1 (en) * | 2008-12-22 | 2010-06-24 | International Business Machines Corporation | Method and system for automatically adding generic change log to legacy application |
US8060857B2 (en) * | 2009-01-31 | 2011-11-15 | Ted J. Biggerstaff | Automated partitioning of a computation for parallel or other high capability architecture |
US20120144376A1 (en) * | 2009-06-02 | 2012-06-07 | Vector Fabrics B.V. | Embedded system development |
US20120191446A1 (en) * | 2009-07-15 | 2012-07-26 | Proviciel - Mlstate | System and method for creating a parser generator and associated computer program |
US20110138373A1 (en) * | 2009-12-08 | 2011-06-09 | American National Laboratories, Inc. | Method and apparatus for globally optimizing instruction code |
US20110161940A1 (en) | 2009-12-28 | 2011-06-30 | Frank Brunswig | Multi-language support for service adaptation |
US20110167088A1 (en) * | 2010-01-07 | 2011-07-07 | Microsoft Corporation | Efficient immutable syntax representation with incremental change |
US8924924B2 (en) * | 2010-03-29 | 2014-12-30 | Microsoft Corporation | Representing the structure of a data format using a class-based representation |
US20110283270A1 (en) * | 2010-05-11 | 2011-11-17 | Albrecht Gass | Systems and methods for analyzing changes in application code from a previous instance of the application code |
US8898627B2 (en) * | 2010-05-11 | 2014-11-25 | Smartshift Gmbh | Systems and methods for applying rules to transform objects of an application |
US20110283269A1 (en) * | 2010-05-11 | 2011-11-17 | Albrecht Gass | Systems and methods for applying rules to transform objects of an application |
US8739150B2 (en) * | 2010-05-28 | 2014-05-27 | Smartshift Gmbh | Systems and methods for dynamically replacing code objects via conditional pattern templates |
US20110296391A1 (en) * | 2010-05-28 | 2011-12-01 | Albrecht Gass | Systems and Methods for Dynamically Replacing Code Objects Via Conditional Pattern Templates |
US9519465B2 (en) * | 2010-10-28 | 2016-12-13 | Innowake Gmbh | Method and system for generating code |
US9182962B2 (en) * | 2010-12-09 | 2015-11-10 | Todd Bradley KNEISEL | Method for translating a cobol source program into readable and maintainable program code in an object oriented second programming language |
US20120323863A1 (en) * | 2011-06-16 | 2012-12-20 | Microsoft Corporation | Semantic reflection storage and automatic reconciliation of hierarchical messages |
US20130124545A1 (en) | 2011-11-15 | 2013-05-16 | Business Objects Software Limited | System and method implementing a text analysis repository |
US20130159976A1 (en) * | 2011-12-16 | 2013-06-20 | Microsoft Corporation | Abstract syntax tree transformation |
US20130174131A1 (en) * | 2012-01-04 | 2013-07-04 | International Business Machines Corporation | Code converting method, program, and system |
US20130326204A1 (en) * | 2012-05-31 | 2013-12-05 | New York University | Configuration-Preserving Preprocessor and Configuration-Preserving Parser |
US20140149970A1 (en) * | 2012-11-29 | 2014-05-29 | International Business Machines Corporation | Optimising a compilation parser for parsing computer program code in arbitrary applications |
Non-Patent Citations (8)
Title |
---|
A Three-Phase Approach to Efficiently Transform C# into KDM Christian Wulf, S{umlaut over ( )}oren Frey, and Wilhelm Hasselbring-Software Engineering Group, University of Kiel, Germany-Aug. 24, 2012. * |
A Three-Phase Approach to Efficiently Transform C# into KDM Christian Wulf, S{umlaut over ( )}oren Frey, and Wilhelm Hasselbring—Software Engineering Group, University of Kiel, Germany—Aug. 24, 2012. * |
Analysis and Code Model Extraction for C/C++ Source Code-Christian Wagner, Tiziana Margariay and Hans-Georg Pagendarm-German-Dutch Wind Tunnels, G{umlaut over ( )}ottingen, Germany and Chair of Service and Software Engineering, University of Potsdam, Germany-2009 14th IEEE International Conference on Engineering of Complex Computer Systems. * |
Analysis and Code Model Extraction for C/C++ Source Code—Christian Wagner, Tiziana Margariay and Hans-Georg Pagendarm—German-Dutch Wind Tunnels, G{umlaut over ( )}ottingen, Germany and Chair of Service and Software Engineering, University of Potsdam, Germany—2009 14th IEEE International Conference on Engineering of Complex Computer Systems. * |
Case study: Re-engineering C++ component models via automatic program transformation-Robert L. Akers,lra D. Baxter,Michael Mehlich-Semantic Designs Inc., USA; Brian J. Ellis, Kenn R. Luecke-The Boeing Company, USA-R.L. Akers et al. / Information and Software Technology 49 (2007) 275-291. * |
Case study: Re-engineering C++ component models via automatic program transformation—Robert L. Akers,lra D. Baxter,Michael Mehlich—Semantic Designs Inc., USA; Brian J. Ellis, Kenn R. Luecke—The Boeing Company, USA—R.L. Akers et al. / Information and Software Technology 49 (2007) 275-291. * |
Program Transformation with Stratego/XT Rules, Strategies, Tools, and Systems in Stratego/XT 0.9-Eelco Visser Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands-C. Lengauer et al. (Eds.): Domain-Specific Program Generation, LNCS 3016, pp. 216-238, 2004. * |
Program Transformation with Stratego/XT Rules, Strategies, Tools, and Systems in Stratego/XT 0.9-Eelco Visser Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands—C. Lengauer et al. (Eds.): Domain-Specific Program Generation, LNCS 3016, pp. 216-238, 2004. * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180300313A1 (en) * | 2017-04-05 | 2018-10-18 | Voicebox Technologies Corporation | System and method for generating a multi-lingual and multi-intent capable semantic parser based on automatically generated operators and user-designated utterances relating to the operators |
US10579738B2 (en) * | 2017-04-05 | 2020-03-03 | Voicebox Technologies Corporation | System and method for generating a multi-lingual and multi-intent capable semantic parser based on automatically generated operators and user-designated utterances relating to the operators |
US11182565B2 (en) * | 2018-02-23 | 2021-11-23 | Samsung Electronics Co., Ltd. | Method to learn personalized intents |
US11314940B2 (en) | 2018-05-22 | 2022-04-26 | Samsung Electronics Co., Ltd. | Cross domain personalized vocabulary learning in intelligent assistants |
CN109753283A (en) * | 2018-12-29 | 2019-05-14 | 北京辰安科技股份有限公司 | Handle the authority control method and device of front end page |
US11068244B2 (en) * | 2019-10-01 | 2021-07-20 | Salesforce.Com, Inc. | Optimized transpilation |
US20210208857A1 (en) * | 2020-01-08 | 2021-07-08 | Fujitsu Limited | Parsability of code snippets |
US11119740B2 (en) * | 2020-01-08 | 2021-09-14 | Fujitsu Limited | Parsability of code snippets |
US20220357934A1 (en) * | 2021-05-05 | 2022-11-10 | Michael Ling | Methods, devices, and media for two-pass source code transformation |
US20230289523A1 (en) * | 2022-03-11 | 2023-09-14 | Microsoft Technology Licensing, Llc | Language-agnostic computer program repair engine generator |
US12141560B2 (en) | 2022-11-15 | 2024-11-12 | Bank Of America Corporation | Hybrid-feedback driven transpiler system |
Also Published As
Publication number | Publication date |
---|---|
US20150128114A1 (en) | 2015-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9710243B2 (en) | Parser that uses a reflection technique to build a program semantic tree | |
CN106919434B (en) | Code generation method and device | |
US8286132B2 (en) | Comparing and merging structured documents syntactically and semantically | |
US7958493B2 (en) | Type inference system and method | |
US10042637B1 (en) | Computerized software development environment including customized presentation of source code | |
US9122540B2 (en) | Transformation of computer programs and eliminating errors | |
Burke et al. | A practical method for LR and LL syntactic error diagnosis and recovery | |
JPS6375835A (en) | Apparatus for generating intended code, program, list and design document | |
US20060212859A1 (en) | System and method for generating XML-based language parser and writer | |
US9304893B1 (en) | Integrated software development and test case management system | |
US11294665B1 (en) | Computerized software version control with a software database and a human database | |
CN108595334B (en) | Method and device for calculating dynamic slices of Java program and readable storage medium | |
US9311077B2 (en) | Identification of code changes using language syntax and changeset data | |
Uhl et al. | An attribute grammar for the semantic analysis of Ada | |
Kuramitsu | Nez: practical open grammar language | |
US9495638B2 (en) | Scalable, rule-based processing | |
US9436664B2 (en) | Performing multiple scope based search and replace within a document | |
CN113254023B (en) | Object reading method and device and electronic equipment | |
US20040003373A1 (en) | Token-oriented representation of program code with support for textual editing thereof | |
Cameron | Rex: Xml shallow parsing with regular expressions | |
JP2879099B1 (en) | Abstract syntax tree processing method, computer readable recording medium recording abstract syntax tree processing program, computer readable recording medium recording abstract syntax tree data, and abstract syntax tree processing device | |
KR20090011974A (en) | Method for extracting the target files of compilation | |
JP2010067103A (en) | Error information output device, error information output method and error output program for program | |
US8819645B2 (en) | Application analysis device | |
KR102614967B1 (en) | Automation system and method for extracting intermediate representation based semantics of javascript |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EAGLE LEGACY MODEMIZATION, LLC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:O'HARA, STEVEN ARTHUR;WILKINSON, JEFFREY ALLEN;SIGNING DATES FROM 20131120 TO 20131204;REEL/FRAME:031726/0750 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |