[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2011015222A1 - System and method for creating a parser generator and associated computer program - Google Patents

System and method for creating a parser generator and associated computer program Download PDF

Info

Publication number
WO2011015222A1
WO2011015222A1 PCT/EP2009/059115 EP2009059115W WO2011015222A1 WO 2011015222 A1 WO2011015222 A1 WO 2011015222A1 EP 2009059115 W EP2009059115 W EP 2009059115W WO 2011015222 A1 WO2011015222 A1 WO 2011015222A1
Authority
WO
WIPO (PCT)
Prior art keywords
parser
grammar
module
parsing
semantic
Prior art date
Application number
PCT/EP2009/059115
Other languages
French (fr)
Inventor
Henri Binsztok
Adam Koprowski
Original Assignee
Proviciel - Mlstate
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Proviciel - Mlstate filed Critical Proviciel - Mlstate
Priority to PCT/EP2009/059115 priority Critical patent/WO2011015222A1/en
Priority to EP09780676A priority patent/EP2454661A1/en
Priority to US13/384,326 priority patent/US20120191446A1/en
Publication of WO2011015222A1 publication Critical patent/WO2011015222A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/37Compiler construction; Parser generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis

Definitions

  • the present invention relates to the parsing problem in computer science and electronics. More specifically, the invention relates to methods of generating formally-verified parsers from simple grammar description files.
  • Parsing consists of taking a text, recognizing whether it is correct with respect to the description of the language used to write the text, given by means of a grammar and, if it is, pulling it apart with respect to the structure of the given grammar.
  • Parsing is used extensively in a variety of computer science and electronics field including compilation, network security, data storage, etc.
  • a source code is first parsed then compiled and assembled into an executable. Bugs and anomalies in executables can result in important loss of time, money, data and sometimes lives. Extensive testing is not considered sufficient in critical applications.
  • a second example is dedicated to network security. A message arriving at a network node is parsed and depending on the results of said parsing it is either transmitted or blocked. Said network node in effect works as a kind of "digital diode". XML signatures and XML encryption are growingly used to secure transactions, in particular across mobile networks.
  • a third example applies to data.
  • Stored content is parsed in order to retrieve data of interest.
  • Database queries expressed in query language e.g. SQL also need to be parsed before data are accessed.
  • a fourth example is dedicated to data interpretation. Each web page code is parsed in order to be displayed in a web browser.
  • DSL Domain Specific Languages
  • the parsing process is usually broken up into two steps:
  • a syntax analysis where the sequence of tokens is analysed and the parse tree is build, representing the structural decomposition of the input text with respect to the grammar.
  • Parsing is a crucial step in any interpreter/compiler, where the source code of the program needs to be parsed before being interpreted/transformed into the target language. But it is also an important step in many other programs performing any kind of data manipulation.
  • Parsing technology is a well-studied and well-understood problem in computer science or electronics component design.
  • the typical approach to parsing is to specify the input language using context-free grammars and to use a parser generator.
  • Parser generators are programs that:
  • grammar g belongs to some sub-class of context-free grammars supported by the parser generator, then it automatically constructs a source code for a parser of g, in some programming language of choice, L.
  • CFG context-free grammar
  • V is a single nonterminal symbol
  • w is a string of terminals and/or nonterminals (possibly empty).
  • An object of this invention is to provide a parser generator that will be capable of performing both the lexical analysis and the syntax analysis in an uniform way and that additionally will be correct by construction, i.e., the generated parser will come with total correctness guarantees, as if the generated parser was subject to formal verification using a theorem proving technology.
  • the process of generation of a parser is equivalent to that sketched in the preceding section.
  • the parser generator will be a single executable, functionally equivalent to the traditional parser generator and no use of a theorem prover will be involved at all; and yet the generated parser will be provably correct by construction, allowing its use in critical systems, requiring strong correctness guarantees.
  • the invention provides a solution that does not have the drawbacks of the prior art. Indeed, the invention concerns a system for creating a parser System for building a parser characterized in that it comprises:
  • a grammar input module for inputting in said parser a grammar expressed in a given formalism
  • semantic action module defining a parsing result depending on at least some expression of said grammar, said semantic action module ensuring that all semantic actions of said grammar are terminating
  • a checking module for checking that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible
  • a proof assistant module for developing said parser with said formalism module and said semantic action module.
  • Parsing is also an important first step in security software's such as firewalls or antivirus.
  • said a formalism module forbids recursion in said grammar.
  • said grammar is a context-free grammar
  • said grammar is a parsing expression grammar
  • the invention also concerns a method for building a formally verified parser generator.
  • said method comprises:
  • a step of obtaining a formally correct parser for Q
  • a step of obtaining a termination checker for semantic actions in Q
  • a step of obtaining a parser generator that will read a description of some grammar G from a text file using said certified parser and, after checking that the grammar belongs to a class for which parser generation is feasible, it will generate a code of the parser in Q.
  • the invention also concerns a computer program product downloadable from a communications network and/or stored on a computer-readable medium and/or executable by a microprocessor.
  • such a computer program product comprises program code instructions for the execution of the building method as described.
  • Figure 1 is a block diagram illustrating the building blocks of a certified parser interpreter of an embodiment of the invention
  • Figure 2 is a block diagram illustrating the Building blocks of a certified parser generator in one embodiment of the invention
  • the invention relates to a system for building a parser.
  • a system for building a parser comprises of:- a grammar input module for inputting in said parser generator a grammar expressed in a given formalism;- a checking module for formally verifying that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible;- a checking module for formally verifying that a grammar expressed in the said formalism is well-formed;- a semantic action module defining a parsing result depending on semantic actions embedded in said grammar, said semantic action module ensuring in a formal way that all semantic actions of said grammar are terminating and- a formal module generating a parser with total correctness guarantees, using said modules to verify that the grammar is well-formed, belongs to a certain class of feasible, terminating grammars and all its semantic actions are terminating.
  • the system and method of the invention allows a user to build a parser generator which generates some parsers which are formally checked and verified. This means that, by using the invention, there's no need to formally verify a generated parser like in the prior art techniques. This is a great feature of the invention because it ensures that, when effectively used, the parser will always lead to a formally checked and verified program (after compilation).
  • PA PA Assistant
  • a parsing expression grammar is a type of analytic formal grammar that describes a formal language in terms of a set of rules for recognizing strings in the language.
  • a parsing expression grammar essentially represents a recursive descent parser in a pure schematic form that expresses only syntax and is independent of the way an actual parser might be implemented or what it might be used for. Parsing expression grammars look similar to regular expressions or context-free grammars (CFG) in Backus-Naur form (BNF) notation, but have a different interpretation.
  • CFG context-free grammars
  • PEGs Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree, so PEGs are particularly well adapted for computer program languages.
  • parser generator of the invention instead of context-free grammars is based on the formalism of parsing expression grammars (PEGs). Below we shortly summarize this formalism for the purposes of the disclosure. Let fix a finite set of non-terminals, (sometimes we will also refer to
  • Definition 1 Let define the set of parsing expressions, ⁇ , over non-terminals V
  • the any-character expression [ ⁇ ] consumes arbitrary character and succeeds; it fails on empty input.
  • a terminal a checks the first character of the input string; if it is equal to a then it is consumed and parsing succeeds, if it is different than a or the input string is empty then parsing fails.
  • Parsing of a non-terminal A amounts to parsing the expression associated with A, i.e., P .
  • Parsing the sequence expression amounts to parsing e on the
  • Parsing the choice expression first parses e on the input string
  • Parsing the zero-or-more repetition expression e* tries to parse e; if that fails then parsing of e* succeeds without consuming any input; if it succeeds then we proceed with parsing e* on the remaining input.
  • Example 3 As an example let us present a very simple grammar for mathematical expressions with 5 non-terminals and the following productions:
  • parsing expressions as introduced previously can be used for specifying which strings belong to the grammar under consideration.
  • the role of a parser is not merely to recognize whether an input is correct or not but also, given a correct input, to compute its representation in one form or another.
  • grammar expressions with semantic values which are a representation of the result of parsing this expression on (some) input and by extending grammar with semantic actions, which are functions used to produce and manipulate the semantic values.
  • semantic value associated with an expression will be its parse tree so that parsing a correct input will give a parse tree of this input.
  • the inventors had the idea to replace the simple type of parsing expressions ⁇ with a family of types ⁇ ⁇ , where the index ⁇ is the type of semantic values associated with an expression.
  • the inventors also define default semantic actions for all types of expressions and to allow alerting from those default they introduced a new construction to convert semantic value.
  • the inventors use the following types:
  • Type is a universe of types.
  • True is the singleton type with a single value /.
  • char is a type of machine characters. It corresponds to the type of terminals which in concrete parsers generated will always be instantiated by char.
  • - list a is a type of lists of elements of ⁇ for any type ⁇ ,
  • ⁇ * ⁇ is a type of pairs of elements with for any types
  • An empty expression e has a semantic value of type /.
  • a sequence has semantic values of type ⁇ * ⁇ where ⁇ (resp. ⁇ ) is
  • a prioritized choice has a semantic values of type ⁇ where
  • semantic values of both e. and e are required to have type ⁇ .
  • a repetition expression has a semantic value of type list a, where ⁇ is the type of semantic values of e.
  • a not-predicate has a semantic value of type /.
  • the inventors add a new expression / ' which takes an expression e
  • is the set of extended parsing expressions, where the index ⁇ is the type of semantic values of an expression.
  • V. L is the set of non-terminals
  • ⁇ Type is the function giving type of semantic values for
  • EPEG extended parsing expression grammar
  • E is a (non-strict) subexpression relation on parsing expressions.
  • the inventors define three groups of properties over parsing expressions:
  • parsing expression can succeed without consuming any input
  • parsing expression can fail.
  • Example 6 Let us extend the grammar from Example 4.3 with semantic actions.
  • Example 9 After defining appropriate notations and coercions, the transcription of Example 6 in Coq could look as follows:
  • Example 10 We present an alternative version of the grammar from Example 9, where the semantic actions are used to build an abstract syntax tree
  • TRX is a parser generator that on top of the functionality offered by traditional parser generators will provide total correctness guarantees for all generated parsers.
  • the target language of our parser generator is Q. It will be mainly interested in functional programming languages, but most of the ideas presented below can be used for an arbitrary target language Q.
  • PARSER(PEG) as well as a parser for Q, PARSER (Q) , as the productions in the grammar will be expressed as a source code in Q.
  • One way to obtain those parsers is by developing certified interpreters for them using the approach described in
  • the next step is to write a parser generator in Coq.
  • the process of generating a recursive descent parser for a PEG is relatively straightforward.
  • the basic idea is that the set of productions of G is mapped one-to-one to a mutually recursive set of parse functions in Q. Parsing every PEG operand consists of turning operational semantics rules of Annex B2 into an executable code.
  • Example 11 In this example we illustrate a possible concent of the library LIB (Q) , where Q is again taken to be OCaml.
  • Q is again taken to be OCaml.
  • Such a library could consist of the following functions taken from the standard library of OCaml:
  • Example 12 In this example we will present a PEG grammar PEG (G) , equivalent to that from Example 6 but rendered as an ASCII file to be processed by the parser generator.
  • PEG PEG
  • OCaml the target language Q
  • semantic actions of the grammar are expressed as pieces of code in OCaml, where recursion is not allowed.
  • Example 13 We present an alternative version of the grammar from Example 12, where the semantic actions are used to build an abstract syntax tree
  • the parser generator will reject the grammar if it is syntactically incorrect or incorrect with respect to Definition 5 (for instance if it contains references to undefined non-terminals). It will also reject the input if the grammar G is not well-formed, Le., it is left-recursive. This last check is also performed, though its correctness cannot be guaranteed, by some of the existing parser generators based on the PEG formalism.
  • the parser generator will also reject the grammar if the semantic actions contain recursion and hence may be potentially non-terminating (we will only allow calls to a predefined library of recursive functions with some basic combinators for basic data-types, to improve expressivity of acceptable semantic actions). This is the only difference with using TRX compared to other unverified parser generators, which typically do not try to ensure termination of the generated code (in fact they often do not even check whether semantic actions are syntactically correct and just copy it verbatim to the generated parser).
  • TRX will produce a parser for G expressed as a source code in Q, pretty much as any other parser generator would do.
  • the parser generated by TRX is formally proved to be totally correct, i.e., the parser is terminating and correct with respect to the grammar G and the semantics of PEGs.
  • TRX After ensuring that the grammar is correct it is transformed to a recursive descent parser in Q. hi TRX this step will be accompanied by a proof that this transformation produces a terminating parser, which is correct with respect to the grammar G and the semantics of PEGs (Annex B2). Finally, TRX will be developed using dependent type programming in the proof assistant Coq and then the executable TRX will be extracted from this development using Coq's extraction mechanism.
  • PA a proof assistant used to develop a formally verified parser interpreter/generator. Examples include: Coq, HOL4, HOL Lite, Isabelle, PVS, ....
  • - FPG a formalism for expressing grammars used by the parser interpreter/generator. Examples include: context-free grammars (CFGs) and parsing expression grammars (PEGs).
  • CFGs context-free grammars
  • PEGs parsing expression grammars
  • All the three embodiments use the approach of specifying and developing a parser interpreter/generator in the PA and then extracting a parser interpreter/generator with total correctness guarantees using the extraction mechanism of the PA, hence the parser interpreter/generator is obtained as a source code in a language supported by the extraction capabilities of the PA.
  • This embodiment describes a way to obtain a formally verified parser interpreter, with the following properties:
  • Semantic actions are used to specify a parsing result.
  • the grammar and its semantic actions need to be specified in the specification language of the PA.
  • the interpreter is totally correct due to (A4) and (A5).
  • This embodiment described a way to obtain a formally verified parser generator, with the following properties:
  • the target language of the generator is any language Q. Semantic actions (in Q) are used to specify a parsing result.
  • parser interpreter with parsing traces This embodiment described a way to obtain a formally verified parser interpreter, with the following properties:
  • Parsing tags a simple extension to the parsing grammar formalism, are used to annotate the parts of the grammar that should be collected during parsing to form a parse trace (Le., a simple parse tree in a predefined XML-like format).
  • the grammar and its parsing tags can be specified in a simple text file.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention relates to a system for building a parser. According to the invention, such a system comprises of: - a grammar input module for inputting in said parser generator a grammar expressed in a given formalism; - a checking module for formally verifying that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible; - a checking module for formally verifying that a grammar expressed in the said formalism is well-formed; - a semantic action module defining a parsing result depending on semantic actions embedded in said grammar, said semantic action module ensuring in a formal way that all semantic actions of said grammar are terminating and - a formal module generating a parser with total correctness guarantees, using said modules to verify that the grammar is well-formed, belongs to a certain class of feasible, terminating grammars and all its semantic actions are terminating.

Description

System and method for creating a parser generator and associated computer program.
1 FIELD OF THE INVENTION
The present invention relates to the parsing problem in computer science and electronics. More specifically, the invention relates to methods of generating formally-verified parsers from simple grammar description files.
Parsing consists of taking a text, recognizing whether it is correct with respect to the description of the language used to write the text, given by means of a grammar and, if it is, pulling it apart with respect to the structure of the given grammar.
Parsing is used extensively in a variety of computer science and electronics field including compilation, network security, data storage, etc.
As a first example, during compilation, a source code is first parsed then compiled and assembled into an executable. Bugs and anomalies in executables can result in important loss of time, money, data and sometimes lives. Extensive testing is not considered sufficient in critical applications. A second example is dedicated to network security. A message arriving at a network node is parsed and depending on the results of said parsing it is either transmitted or blocked. Said network node in effect works as a kind of "digital diode". XML signatures and XML encryption are growingly used to secure transactions, in particular across mobile networks.
A third example applies to data. Stored content is parsed in order to retrieve data of interest. Database queries expressed in query language e.g. SQL also need to be parsed before data are accessed.
A fourth example is dedicated to data interpretation. Each web page code is parsed in order to be displayed in a web browser.
A fifth example highlights on Domain Specific Languages (DSL). DSL are programming or specification languages dedicated to a particular solution technique e.g. insurance, finance, construction, combat simulation. Every time a new DSL is created to simplify programming in a given technical field, a new parser needs to be created at the same time.
Other applications of parsers exist and are not detailed here (cryptography, compression, ...).
2 BACKGROUND
The parsing process is usually broken up into two steps:
A lexical analysis where the input text is decomposed into individual tokens; and
A syntax analysis where the sequence of tokens is analysed and the parse tree is build, representing the structural decomposition of the input text with respect to the grammar.
Parsing is a crucial step in any interpreter/compiler, where the source code of the program needs to be parsed before being interpreted/transformed into the target language. But it is also an important step in many other programs performing any kind of data manipulation.
Parsing technology is a well-studied and well-understood problem in computer science or electronics component design. The typical approach to parsing is to specify the input language using context-free grammars and to use a parser generator. Parser generators are programs that:
- take a description of a (context-free) grammar g from a file (in some format).
if grammar g belongs to some sub-class of context-free grammars supported by the parser generator, then it automatically constructs a source code for a parser of g, in some programming language of choice, L.
- the resulting parser of Q can then be used in another program.
Indeed, parsers are usually used within some programs and rarely on their own; the source code obtained in the previous step allows to easily use the generated parser for £ within some program written in L. In formal language theory, a context-free grammar (CFG) is a grammar in which every production rule is of the form:
V→w where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (possibly empty).
The problem of these background techniques is that the formal check of the parser is not formally proven, i.e. it can't be proved that the generated parser will work correctly. Thus, it's not possible to prove that the parser obtained by prior art techniques is correct and will not lead to misinterpret the data in input (A program written by a programmer) and consequently will not lead to error in parsing, compiling or executing some resulting programs.
3 SUMMARY OF THE INVENTION
An object of this invention is to provide a parser generator that will be capable of performing both the lexical analysis and the syntax analysis in an uniform way and that additionally will be correct by construction, i.e., the generated parser will come with total correctness guarantees, as if the generated parser was subject to formal verification using a theorem proving technology.
It is a further object of this invention to make this process completely transparent to the end-user of the parser generator. That means that from the point of view of the user, the process of generation of a parser is equivalent to that sketched in the preceding section. In particular the parser generator will be a single executable, functionally equivalent to the traditional parser generator and no use of a theorem prover will be involved at all; and yet the generated parser will be provably correct by construction, allowing its use in critical systems, requiring strong correctness guarantees. The invention provides a solution that does not have the drawbacks of the prior art. Indeed, the invention concerns a system for creating a parser System for building a parser characterized in that it comprises:
a grammar input module for inputting in said parser a grammar expressed in a given formalism;
a formalism module for expressing grammars used by said parser generator, said formalism module proving that said grammar G is well- formed;
a semantic action module defining a parsing result depending on at least some expression of said grammar, said semantic action module ensuring that all semantic actions of said grammar are terminating
a checking module for checking that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible;
and, if checking module concludes that said grammar belongs to said
a proof assistant module for developing said parser with said formalism module and said semantic action module.
Thus, in the previous fields of technologies already presented, fields, employing parsers constructed using the system of the invention results in better quality thanks to increased security. Security issues in software and electronic components do clearly lead to technical problems that affect the physical world. Formal proof methods can be used to check the conformity of a program with the specifications. Certified compilers have also been described see e.g. Compcert (Xavier Leroy) but the correction of the first step, parsing, is not formally proven.
Having a proven parser is therefore essential for network security. Parsing is also an important first step in security software's such as firewalls or antivirus.
Having a proven parser brings guarantees on the ability to retrieve data and the quality of the parser of the invention impacts on that of the displayed page.
According to one particular characteristic of the invention, said a formalism module forbids recursion in said grammar. According to one particular characteristic of the invention said grammar is a context-free grammar;
According to one particular characteristic of the invention said grammar is a parsing expression grammar;
The invention also concerns a method for building a formally verified parser generator.
According to the invention, said method comprises:
A step of formalizing an expression of a grammar G and its semantics;
A step of checking that said grammar G belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible;
A step of defining a target language Q of said parser generator and its formal semantics;
A step of obtaining a library of basic datatypes of Q and functions over them and proving that they are all terminating;
A step of obtaining a formally correct parser for Q;
A step of obtaining a formally correct parser for a grammar in FPG format, said including semantic actions in Q;
A step of obtaining a termination checker for semantic actions in Q;
- A step of obtaining a parser generator, that will read a description of some grammar G from a text file using said certified parser and, after checking that the grammar belongs to a class for which parser generation is feasible, it will generate a code of the parser in Q.
A step of obtaining, from a proving module, that the code generated in is correct with respect to the given grammar G, the semantics of parsing grammars and the formal semantics of Q.
A step of obtaining, from a proving module, that the code generated in will always terminate. In another embodiment, the invention also concerns a computer program product downloadable from a communications network and/or stored on a computer-readable medium and/or executable by a microprocessor.
According to the invention, in another embodiment, such a computer program product comprises program code instructions for the execution of the building method as described.
4 BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the invention shall appear more clearly from the following description of a preferred embodiment, given by way of a simple, illustrative and non-exhaustive example, and from the appended drawings, of which:
Figure 1 is a block diagram illustrating the building blocks of a certified parser interpreter of an embodiment of the invention;
Figure 2 is a block diagram illustrating the Building blocks of a certified parser generator in one embodiment of the invention;
5 DETAILLED DESCRIPTION OF THE INVENTION
5.1 Generals principles of the invention
The invention relates to a system for building a parser. According to the invention, such a system comprises of:- a grammar input module for inputting in said parser generator a grammar expressed in a given formalism;- a checking module for formally verifying that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible;- a checking module for formally verifying that a grammar expressed in the said formalism is well-formed;- a semantic action module defining a parsing result depending on semantic actions embedded in said grammar, said semantic action module ensuring in a formal way that all semantic actions of said grammar are terminating and- a formal module generating a parser with total correctness guarantees, using said modules to verify that the grammar is well-formed, belongs to a certain class of feasible, terminating grammars and all its semantic actions are terminating. The system and method of the invention allows a user to build a parser generator which generates some parsers which are formally checked and verified. This means that, by using the invention, there's no need to formally verify a generated parser like in the prior art techniques. This is a great feature of the invention because it ensures that, when effectively used, the parser will always lead to a formally checked and verified program (after compilation).
Such a result is achieved, in at least one embodiment of the invention, firstly by extending the grammar with semantic action, as shown above and secondly by proving the termination of the grammar, and in particular addressing the problems of left-recursive grammar. Then these rules are introduced in a Proof
Assistant (PA) to develop a formally verified parser interpreter/generator.
The proves of the theorems and lemma which are presented below aims at showing that the realization of the embodiments of the invention are technically possible if the input grammar follows some requirements fixed.
For the purposes of proves below, the invention is using, in at least one embodiments, some Parsing Expression Grammars (PEGs). A parsing expression grammar, or PEG, is a type of analytic formal grammar that describes a formal language in terms of a set of rules for recognizing strings in the language. A parsing expression grammar essentially represents a recursive descent parser in a pure schematic form that expresses only syntax and is independent of the way an actual parser might be implemented or what it might be used for. Parsing expression grammars look similar to regular expressions or context-free grammars (CFG) in Backus-Naur form (BNF) notation, but have a different interpretation.
Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree, so PEGs are particularly well adapted for computer program languages.
The parser generator of the invention, instead of context-free grammars is based on the formalism of parsing expression grammars (PEGs). Below we shortly summarize this formalism for the purposes of the disclosure. Let fix a finite set of non-terminals, (sometimes we will also refer to
Figure imgf000009_0003
them as productions), and a finite set of terminal symbols
Figure imgf000009_0005
We will denote the elements of by p,q and elements of
Figure imgf000009_0004
Figure imgf000009_0002
By string, S, we mean a list of terminal symbols and we will be using a list notation where
Figure imgf000009_0006
denotes an element x followed by a list xs and [] denotes an empty list. We will use a notation \x\ to denote length of string/list x.
Definition 1: Let define the set of parsing expressions, Δ, over non-terminals V
Figure imgf000009_0001
The informal semantics of parsing expressions is as follows:
The empty expression [e] always succeeds without consuming any input.
The any-character expression [■] consumes arbitrary character and succeeds; it fails on empty input. A terminal a checks the first character of the input string; if it is equal to a then it is consumed and parsing succeeds, if it is different than a or the input string is empty then parsing fails.
Parsing of a non-terminal A amounts to parsing the expression associated with A, i.e., P .
Figure imgf000010_0009
Parsing the sequence expression amounts to parsing e on the
Figure imgf000010_0004
input string. If that fails then parsing of fails; otherwise it is the
Figure imgf000010_0005
result of parsing e on the remaining input.
Figure imgf000010_0008
Parsing the choice expression first parses e on the input string
Figure imgf000010_0006
and if that succeeds then this is the final result. Otherwise it is the result of parsing e on the initial input string.
Figure imgf000010_0007
Parsing the zero-or-more repetition expression e* tries to parse e; if that fails then parsing of e* succeeds without consuming any input; if it succeeds then we proceed with parsing e* on the remaining input.
- Parsing the non-predicate expression \e parses e on the input string; if that fails then !e succeeds without consuming any input; otherwise !e fails.
The formal description is as follows. The parsing of an expression e∈Δ on a string s∈S yields a result r∈R, denoted by where the set of results R,
Figure imgf000010_0001
is a set defined inductively as:
Figure imgf000010_0003
1 indicating that parsing failed,
for s∈S, indicating that parsing was successful and the suffix that
Figure imgf000010_0002
remains to be parsed is s.
The formal semantics of parsing expressions is presented in annex A, which is fully included in the present disclosure.
As an example let us present a very simple grammar for mathematical expressions with 5 non-terminals and the following productions:
Figure imgf000011_0001
Example 3 As an example let us present a very simple grammar for mathematical expressions with 5 non-terminals and the following productions:
Figure imgf000011_0002
Here has been described the formalism of PEG which is used as an input grammar in at least one embodiment of the invention. While such this formalism is not one part of the invention, it is important for the disclosure because it helps the skilled in the art to understand the following work which has been
5.2 Description of some embodiments
In the present section a system/method for creating a parser generator of the invention is presented. Firstly a way to extend PEGs with semantics action is presented and secondly the demonstration for the termination of PEG is given on the basis of some hypothesis, then the use of such a grammar (extended and proved) is shown in an interpreter and in a parser generator.
5.2.1 Extending PEGs with Semantics Actions
The parsing expressions, as introduced previously can be used for specifying which strings belong to the grammar under consideration. However the role of a parser is not merely to recognize whether an input is correct or not but also, given a correct input, to compute its representation in one form or another.
This is typically done by extending grammar expressions with semantic values, which are a representation of the result of parsing this expression on (some) input and by extending grammar with semantic actions, which are functions used to produce and manipulate the semantic values. Typically a semantic value associated with an expression will be its parse tree so that parsing a correct input will give a parse tree of this input. In order to deal with this extension the inventors had the idea to replace the simple type of parsing expressions Δ with a family of types Δ α , where the index α is the type of semantic values associated with an expression.
The inventors also define default semantic actions for all types of expressions and to allow alerting from those default they introduced a new construction to convert semantic value.
The inventors use the following types:
- Type is a universe of types.
True is the singleton type with a single value /.
char is a type of machine characters. It corresponds to the type of terminals
Figure imgf000012_0001
which in concrete parsers generated will always be instantiated by char.
- list a is a type of lists of elements of α for any type α,
α*β is a type of pairs of elements
Figure imgf000012_0009
with for any types
Figure imgf000012_0008
α,β.
Now it is shortly describe how the inventors extend the parsing expressions from definition 1 to incorporate semantic values.
- An empty expression e has a semantic value of type /.
Any character expression [•] and a terminal expression both
Figure imgf000012_0005
have a semantic value of type char.
For non-terminals the inventors use a function P >T Jy^pe which
Figure imgf000012_0002
gives types of semantic values of all productions.
- A sequence has semantic values of type α*β where α (resp. β) is
Figure imgf000012_0003
the type of semantic values of
Figure imgf000012_0007
A prioritized choice has a semantic values of type α where
Figure imgf000012_0004
semantic values of both e. and e are required to have type α.
A repetition expression
Figure imgf000012_0006
has a semantic value of type list a, where α is the type of semantic values of e. A not-predicate has a semantic value of type /.
The inventors add a new expression /' which takes an expression e
Figure imgf000013_0003
with semantic values of type α and a function fa— >β and gives an expression with semantic values of type β (obtained by applying/to the semantic value of e).
This leads to the following formal definition.
Definition 4: Δ is the set of extended parsing expressions, where the index α is the type of semantic values of an expression. We define it by induction in Annex B, where:
- T is the set of terminals,
V. L is the set of non-terminals,
and →Type is the function giving type of semantic values for
Figure imgf000013_0002
every non-terminal.
The definition of an extended parsing expression grammar (EPEG) is as expected (compare with Definition 2):
Definition 5: An extended parsing expressions grammar (EPEG), g, is a tuple ( P P \
-
Figure imgf000013_0001
5.2.2 Proving Termination for PEGs
Left-recursive PEGs (with direct or mutual left-recursion) lead to non- terminating parsers. In this section we will present a way to establish whether a PEG is well-formed, where well-formedness implies completeness of the grammar.
Let us fix a PEG g. We define the expression set of g as:
Figure imgf000014_0002
where E is a (non-strict) subexpression relation on parsing expressions.
The inventors define three groups of properties over parsing expressions:
0": parsing expression can succeed without consuming any input, . ">0": parsing expression can succeed after consuming some input,
Figure imgf000014_0004
parsing expression can fail.
We will write to indicate that the expression e has property "0"
Figure imgf000014_0005
(similarly for P and P ). The inventors have defined inference rules for deriving those properties in Annex C.
Then one start with empty sets of properties and apply those inference rules until reaching a fixpoint. The existence of the fixpoint is ensured by the fact that we extend the property sets monotonously and they are bounded by the finite set E{g). We summarize the semantics of those properties in the lemma below:
Lemma 6: The semantics of property sets Po, P>o and IPi is summarized as follows: and
Figure imgf000014_0001
if (e, s) I then e 6 i
Using the semantics of those properties of parsing expression we can perform the well-formedness analysis for G. We introduce a set of well-formed expressions WF and again iterate from an empty set by using derivation rules from Annex D until reaching a fixpoint.
We say that G is well-formed if E(G) = WF. We have the following result: Theorem 7: If G is well-formed then it is complete.
We conclude this section with an example:
Example 6 Let us extend the grammar from Example 4.3 with semantic actions.
Figure imgf000014_0003
The grammar expressed mathematical expressions and we attach semantic actions evaluating those expressions, hence obtaining a very simple calculator. // often happens that we want to ignore the semantic value attached to an expression This can be accomplished by coercing this value to I
Figure imgf000015_0002
Figure imgf000015_0001
This grammar will associate, as expected, the semantic value 36 with the string " (1 +2) * (3 * 4) " Of course in practice instead of evaluating the expression we would usually write semantic actions to build a parse tree of the expression for later processing 5.2.3 Interpretation of PEGs
In this section a method and system to obtain a certified parser interpreter using the formalism of PEGs (presented in previous sections) is presented. The schema of our approach is presented in Figure 1.
One way to obtain such a parser interpreter is to formally develop it in Coq and then extract a certified code from this development. In order to do that first one needs to develop a formalization of PEGs (Sections 5.1 and 5.2.1 along with their semantics PEG-SEM (annex B2), and a procedure for checking their well- formedness PEG-WF (Section 5.2.2, Annex D).
Then one needs to develop a generic interpreter for parsing input with an arbitrary, but well-formed, grammar, PEG-INT. Such an interpreting function along with the proof that it respects the semantics of PEGs can be developed rather easily as it is essentially just a straightforward realization of the semantics presented in annex B2. The only difficulty is the problem of termination which is addressed below.
In the approach of the invention to develop a certified interpreter the inventor assumes that the grammar G in question (Definition 4) is expressed in Coq, PEG (G) . That means that all semantic actions e [>→] /used in the grammar are terminating, as all Coq functions are total.
That leaves the inventors with proving that the process of parsing itself will terminate but for that the inventors use the (previously proved) fact that the grammar is well-formed and the analysis of Section 5.2.2, in particular Theorem 7.
Having all those components in place we are ready to extract from Coq a
PEG interpreter specialized to grammar G. As a result we obtain a source code of the parser for G in one of the languages supported by Coq's extraction mechanism (OCaml, Haskell and Scheme at the time of this writing).
It is important to note that the fact that the parser interpreter of the invention is totally correct is provided by (a) the grammar G is well-formed and (b) all its semantic actions are terminating. This is some key features of the invention.
In the approach of this embodiment of the invention those conditions are verified within Coq before extracting an interpreter for G, so that it's certain that those conditions are satisfied. In principle, a generic parser interpreter for PEGs can also be extracted from the development. Then the grammar G instead of being developed in Coq could be provided from within the language used for extraction. However then, if one of the conditions (a) or (b) is not meet the resulting parser may not be terminating.
The main shortcoming of this approach is that in order to obtain a parser for G one needs to write the PEG for G, including its semantic actions; in Coq (unless we resort to the approach sketched in the preceding paragraph but then we cannot guarantee total correctness). That means that the use of our parser interpreter involves an expertise in Coq, hence making it much less accessible than traditional parser generators. We will show how to overcome this shortcoming in the following section.
We conclude this section with an example:
Example 9 After defining appropriate notations and coercions, the transcription of Example 6 in Coq could look as follows:
D f t d t ( d) T
Figure imgf000017_0001
Figure imgf000018_0001
Example 10 We present an alternative version of the grammar from Example 9, where the semantic actions are used to build an abstract syntax tree
Figure imgf000018_0002
Program Defini tion digListToNa t (ds : list char) : na t : = . . .
Figure imgf000018_0003
5.2.4 Parser Generator for PEGs
In this section a method and system of developing TRX is presented: TRX is a parser generator that on top of the functionality offered by traditional parser generators will provide total correctness guarantees for all generated parsers.
That makes it especially suitable for use in all types of critical software, where such strong correctness is called for. But as the use of this generator gives safety guarantees at no additional effort, it can be a very attractive alternative to traditional parser generators in essentially all applications. The schema of the approach of the inventors is presented in Figure 2.
The target language of our parser generator is Q. It will be mainly interested in functional programming languages, but most of the ideas presented below can be used for an arbitrary target language Q.
A number of things changes compared with the approach from the previous section (interpreter). To begin with instead of extracting from Coq an interpreter for a particular PEG G, the approach wants to be able to extract a parser generator, PGEN, that will take as its input a description of a grammar G,
PEG (G) , and will produce a parser for G as a source code in Q, PARSER (G) .
In order to achieve that we need a parser for PEGs themselves,
PARSER(PEG) , as well as a parser for Q, PARSER (Q) , as the productions in the grammar will be expressed as a source code in Q. One way to obtain those parsers is by developing certified interpreters for them using the approach described in
Section 5.2.3.
The treatment of termination also changes. The grammar G now comes from an external file without any guarantees, so after parsing it with the certified parser PARSER ( PEG) , it is needed to check its well-formedeness. The inventors do this as before with the component PEG-WF, but now it will not be invoked in Coq but will become part of the extracted code, comprising the parser generator PGEN.
But the real difficulty lies in the fact that to establish termination of produced parsers it's not only needed to know that the grammar is well-formed, but also that all semantic actions used within it are terminating which involves termination analysis of Q programs, WF (Q) . In the approach of Section 5.2.3 one got termination of semantic actions for free as they were expressed in Coq (all Coq functions are total). One way to tackle this problem is to formally develop a termination checker for Q (necessarily incomplete as the termination problem is undecidable for any Turing-complete language).
This is difficult and the inventors opt for an easier approach. They choose a language which is a subset of Q designed in such a way that all Q
Figure imgf000020_0001
Figure imgf000020_0003
programs are terminating (which obviously is prove in Coq). For instance for an ML style pure functional programming language one can obtain this restricted language by forbidding recursion (which is the only source of non-termination). Now we only allow semantic actions to be expressed in
Figure imgf000020_0002
This is quite a restriction but the role of semantic actions in a grammar is to construct a parse tree of the input, which often involves little more than choosing parts of the parse trace and enclosing it in appropriate algebraic datatypes. To somewhat ease this restriction we develop a very simple "standard library for parsing",
Figure imgf000020_0005
comprising of basic data-types (lists, trees, ...) and basic operations on them (map, fold, ...), which we prove terminating in Coq. Now we can allow semantic actions written in but making use of this library
Figure imgf000020_0004
and we still are able to prove termination of generated parsers.
The next step is to write a parser generator in Coq. The process of generating a recursive descent parser for a PEG is relatively straightforward. The basic idea is that the set of productions of G is mapped one-to-one to a mutually recursive set of parse functions in Q. Parsing every PEG operand consists of turning operational semantics rules of Annex B2 into an executable code.
Now we need to prove total correctness for such generated parsers. All the reasoning will be performed with the formal semantic of Q, SEM (Q) . This semantics together with the semantics of PEGs, PEG-SEM, will be used to prove that generated parsers are correct. As for their termination, termination analysis of Q, WF (Q) , will be used to ensure termination of semantic actions and combined with well-formedness analysis for PEG grammars, PEG-WF.
We will now present a few examples. Example 11 In this example we illustrate a possible concent of the library LIB (Q) , where Q is again taken to be OCaml. Such a library could consist of the following functions taken from the standard library of OCaml:
Module Pervasives:
Figure imgf000021_0001
All of those functions would need to be proven terminating in Coq.
Example 12 In this example we will present a PEG grammar PEG (G) , equivalent to that from Example 6 but rendered as an ASCII file to be processed by the parser generator. We again take OCaml as the target language Q, so semantic actions of the grammar are expressed as pieces of code in OCaml, where recursion is not allowed.
Figure imgf000021_0002
We use the { . . . } annotation for semantic actions in place of the PEG operator e [→ ] f.
Example 13 We present an alternative version of the grammar from Example 12, where the semantic actions are used to build an abstract syntax tree
(AST) of mathematical expressions, instead of evaluating them.
{ {
Figure imgf000021_0003
Figure imgf000022_0001
Lets conclude this section with a summary of the differences between TRX and any other (unverified) parser generator from the points of view of: the TRX end user and the TRX developer.
5.2.4.1 TRX from the point of view of the end user
From the point of view of the user of our formally verified parser generator TRX the process of generating a parser will essentially be indistinguishable from this process with any other such (unverified) tool and will consist of the following steps:
- Writing a text file with a PEG grammar G, including its semantic actions as a code in Q.
Running our parser generator to generate a parser for G expressed as a source code in Q.
The parser generator will reject the grammar if it is syntactically incorrect or incorrect with respect to Definition 5 (for instance if it contains references to undefined non-terminals). It will also reject the input if the grammar G is not well-formed, Le., it is left-recursive. This last check is also performed, though its correctness cannot be guaranteed, by some of the existing parser generators based on the PEG formalism.
- The parser generator will also reject the grammar if the semantic actions contain recursion and hence may be potentially non-terminating (we will only allow calls to a predefined library of recursive functions with some basic combinators for basic data-types, to improve expressivity of acceptable semantic actions). This is the only difference with using TRX compared to other unverified parser generators, which typically do not try to ensure termination of the generated code (in fact they often do not even check whether semantic actions are syntactically correct and just copy it verbatim to the generated parser).
If no errors are discovered TRX will produce a parser for G expressed as a source code in Q, pretty much as any other parser generator would do. The difference is that the parser generated by TRX is formally proved to be totally correct, i.e., the parser is terminating and correct with respect to the grammar G and the semantics of PEGs.
5.2.4.2 TRX from the point of view of developing a parser generator
In contrast to the previous section, developing TRX involves substantially more effort, compared to an un-certified parser generator. The steps leading to generating a certified parser are as follows:
- Reading and parsing a text file with a PEG G including semantic actions expressed in Q. The difference here is that TRX will use certified parsers for parsing PEGs and the code in Q.
Checking that the grammar G is well-formed. This check is often performed by other PEG-based parser generators but in case of TRX this procedure will be formally proved correct in Coq.
Checking that the semantic actions are terminating, by disallowing recursive calls. Calls to a predefined library of (recursive) functions are allowed. This step is completely missing in typical parser generators. It is necessary in TRX to ensure termination of the generated parser and hence its total correctness. This step will be formally proved correct in Coq.
After ensuring that the grammar is correct it is transformed to a recursive descent parser in Q. hi TRX this step will be accompanied by a proof that this transformation produces a terminating parser, which is correct with respect to the grammar G and the semantics of PEGs (Annex B2). Finally, TRX will be developed using dependent type programming in the proof assistant Coq and then the executable TRX will be extracted from this development using Coq's extraction mechanism.
5.3 Summary of three embodiments of the invention.
In this section we shortly summarize the three embodiments of this invention. The following notions are used:
PA: a proof assistant used to develop a formally verified parser interpreter/generator. Examples include: Coq, HOL4, HOL Lite, Isabelle, PVS, ....
- FPG: a formalism for expressing grammars used by the parser interpreter/generator. Examples include: context-free grammars (CFGs) and parsing expression grammars (PEGs).
Q: the target language of the parser generator.
G: the grammar in FPG format which we want to interpret (parser interpreter) or for which we want to generate a parser as a source code in
Q (parser generator).
All the three embodiments use the approach of specifying and developing a parser interpreter/generator in the PA and then extracting a parser interpreter/generator with total correctness guarantees using the extraction mechanism of the PA, hence the parser interpreter/generator is obtained as a source code in a language supported by the extraction capabilities of the PA.
5.3.1 First embodiment: parser interpreter with semantic
actions
This embodiment describes a way to obtain a formally verified parser interpreter, with the following properties:
Semantic actions are used to specify a parsing result. The grammar and its semantic actions need to be specified in the specification language of the PA.
This embodiment consists in:
1. Defining FPG and its formal semantics. 2. Developing a procedure for checking that a given grammar G belongs to a certain class of grammars for which parsing is feasible.
3. Developing a parser interpreter, that will take a grammar G with semantic actions, both specified in the PA, and, after checking that G belongs to a class for which parsing is feasible (A2), it will interpret the grammar generating a parse tree, by invoking semantic actions embedded in G.
4. Proving that the parser interpreter (A3) is correct with respect to the
semantics of FPG (Al) and the grammar G with its semantic actions.
5. Proving that the parser interpreter (A3) will always terminate. This
reasoning will use some properties of G (A2), which ensure termination of its parsing.
6. Extracting a certified parser interpreter based on the development (A3).
The interpreter is totally correct due to (A4) and (A5).
5.3.2 Second embodiment: parser generator with semantic actions
This embodiment described a way to obtain a formally verified parser generator, with the following properties:
The target language of the generator is any language Q. Semantic actions (in Q) are used to specify a parsing result.
- The grammar and its semantic actions can be specified in a simple text file.
This embodiment consists of:
1. Defining FPG and its semantics.
2. Developing a procedure for checking that a given grammar G belongs to a certain class of grammars for which a translation to a correct, terminating parser is feasible.
3. Defining Q and its formal semantics.
4. Developing a library of basic datatypes of Q and functions over them and proving that they are all terminating.
5. Developing a formally correct parser for Q (B3) (bootstrapping).1
6. Developing a formally correct parser for a grammar in FPG format (B 1 )r The grammar will include semantic actions in Q, which will be parsed with (B5).
7. Developing a termination checker for semantic actions in Q (B3).2 8. Developing a parser generator, that will read a description of some
grammar G from a text file using the certified parser (B6) and, after checking that the grammar belongs to a class for which parser generation is feasible (B2), it will generate a code of the parser in Q. 9. Proving that the code generated in (B8) is correct with respect to the given grammar G, the semantics of parsing grammars (Bl) and the formal semantics of Q (B3).
10. Proving that the code generated in (B8) will always terminate. This
reasoning will use the termination checker for semantic actions (B7) and some properties of the grammar G (B2), which ensure termination of its parser.
11. Extracting a certified parser generator based on the development (B8).
Every parser generated with this parser generator is totally correct due to (B9) and (BlO).
5.3.3 Third embodiment: parser interpreter with parsing traces This embodiment described a way to obtain a formally verified parser interpreter, with the following properties:
Parsing tags, a simple extension to the parsing grammar formalism, are used to annotate the parts of the grammar that should be collected during parsing to form a parse trace (Le., a simple parse tree in a predefined XML-like format).
The grammar and its parsing tags can be specified in a simple text file.
This embodiment consists of:
1. Defining FPG extended with parsing tags indicating the information that should be collected in the parsing trace. Defining formal semantics for such extended FPGs.
2. Developing a procedure for checking that a given grammar G belongs to a certain class of grammars for which parsing is feasible.
3. Developing a formally correct parser for a grammar in FPG format (Cl).- The grammar will include tags that indicate information (and its structure) that should be pertained in the parse tree.
4. Developing a parser interpreter, that will read a description of some
grammar G from a text file using the certified parser (C3) and, after checking that the grammar belongs to a class for which parsing is feasible (C2), it will interpret the grammar generating a parse tree, according to the parse tags embedded in the grammar (Cl).
5. Proving that the parser interpreter (C4) is correct with respect to the given grammar G (with its parsing tags) and the semantics of parsing grammars
(Cl) (including the semantics of parsing tags).
6. Proving that the parser interpreter (C4) will always terminate. This
reasoning will use some properties of the grammar G (C2), which ensure termination of its parsing. 7. Extracting a certified parser interpreter based on the development (C4). The interpreter is totally correct due to (C5) and (C6).
The following annexes are fully included in the specifications
Figure imgf000028_0001
Figure imgf000029_0001
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001

Claims

1. System for building a parser characterized in that it comprises:
a grammar input module for inputting in said parser a grammar expressed in a given formalism;
- a formalism module for expressing grammars used by said parser generator, said formalism module proving that said grammar G is well- formed;
a semantic action module defining a parsing result depending on at least some expression of said grammar, said semantic action module ensuring that all semantic actions of said grammar are terminating
a checking module for checking that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible;
and, if checking module concludes that said grammar belongs to said
- a proof assistant module for developing said parser with said formalism module and said semantic action module
2. System for building a parser, depending on claim 1, characterized in that said a formalism module forbids recursion in said grammar.
3. System for building a parser, depending on claim 1, characterized in that said grammar is a context-free grammar;
4. System for building a parser, depending on claim 1, characterized in that said grammar is a parsing expression grammar;
5. Method for building a formally verified parser generator, characterized in that it comprises:
A step of formalizing an expression of a grammar G and its semantics; - A step of checking that said grammar G belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible;
A step of defining a target language Q of said parser generator and its formal semantics;
- A step of obtaining a library of basic datatypes of Q and functions over them and proving that they are all terminating;
A step of obtaining a formally correct parser for Q;
A step of obtaining a formally correct parser for a grammar in FPG format, said including semantic actions in Q;
- A step of obtaining a termination checker for semantic actions in Q;
A step of obtaining a parser generator, that will read a description of some grammar G from a text file using said certified parser and, after checking that the grammar belongs to a class for which parser generation is feasible, it will generate a code of the parser in Q.
- A step of obtaining, from a proving module, that the code generated in is correct with respect to the given grammar G, the semantics of parsing grammars and the formal semantics of Q.
A step of obtaining, from a proving module, that the code generated in will always terminate.
6. Computer program product downloadable from a communications network and/or stored on a computer-readable medium and/or executable by a microprocessor, characterized in that it comprises program code instructions for the execution of the building method according to claim 5 when it is executed on a computer.
PCT/EP2009/059115 2009-07-15 2009-07-15 System and method for creating a parser generator and associated computer program WO2011015222A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/EP2009/059115 WO2011015222A1 (en) 2009-07-15 2009-07-15 System and method for creating a parser generator and associated computer program
EP09780676A EP2454661A1 (en) 2009-07-15 2009-07-15 System and method for creating a parser generator and associated computer program
US13/384,326 US20120191446A1 (en) 2009-07-15 2009-07-15 System and method for creating a parser generator and associated computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2009/059115 WO2011015222A1 (en) 2009-07-15 2009-07-15 System and method for creating a parser generator and associated computer program

Publications (1)

Publication Number Publication Date
WO2011015222A1 true WO2011015222A1 (en) 2011-02-10

Family

ID=41395775

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/059115 WO2011015222A1 (en) 2009-07-15 2009-07-15 System and method for creating a parser generator and associated computer program

Country Status (3)

Country Link
US (1) US20120191446A1 (en)
EP (1) EP2454661A1 (en)
WO (1) WO2011015222A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663094A (en) * 2014-07-11 2017-05-10 洛林·G·克雷默三世 Method and system for linear generalized LL recognition and context-aware parsing
US10664655B2 (en) 2014-07-11 2020-05-26 Loring G. Craymer, III Method and system for linear generalized LL recognition and context-aware parsing

Families Citing this family (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7962495B2 (en) 2006-11-20 2011-06-14 Palantir Technologies, Inc. Creating data in a data store using a dynamic ontology
US8688749B1 (en) 2011-03-31 2014-04-01 Palantir Technologies, Inc. Cross-ontology multi-master replication
US8515912B2 (en) 2010-07-15 2013-08-20 Palantir Technologies, Inc. Sharing and deconflicting data changes in a multimaster database system
US8554719B2 (en) 2007-10-18 2013-10-08 Palantir Technologies, Inc. Resolving database entity information
FR2934388A1 (en) * 2008-07-25 2010-01-29 Proviciel Mlstate METHOD FOR CREATING COMPUTER PROGRAM
US10747952B2 (en) 2008-09-15 2020-08-18 Palantir Technologies, Inc. Automatic creation and server push of multiple distinct drafts
US8364642B1 (en) 2010-07-07 2013-01-29 Palantir Technologies, Inc. Managing disconnected investigations
US8732574B2 (en) 2011-08-25 2014-05-20 Palantir Technologies, Inc. System and method for parameterizing documents for automatic workflow generation
US8782004B2 (en) 2012-01-23 2014-07-15 Palantir Technologies, Inc. Cross-ACL multi-master replication
US9348677B2 (en) 2012-10-22 2016-05-24 Palantir Technologies Inc. System and method for batch evaluation programs
US9081975B2 (en) 2012-10-22 2015-07-14 Palantir Technologies, Inc. Sharing information between nexuses that use different classification schemes for information access control
US9501761B2 (en) 2012-11-05 2016-11-22 Palantir Technologies, Inc. System and method for sharing investigation results
US10140664B2 (en) 2013-03-14 2018-11-27 Palantir Technologies Inc. Resolving similar entities from a transaction database
US8909656B2 (en) 2013-03-15 2014-12-09 Palantir Technologies Inc. Filter chains with associated multipath views for exploring large data sets
US9740369B2 (en) 2013-03-15 2017-08-22 Palantir Technologies Inc. Systems and methods for providing a tagging interface for external content
US8868486B2 (en) 2013-03-15 2014-10-21 Palantir Technologies Inc. Time-sensitive cube
US8855999B1 (en) * 2013-03-15 2014-10-07 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US9898167B2 (en) 2013-03-15 2018-02-20 Palantir Technologies Inc. Systems and methods for providing a tagging interface for external content
US8930897B2 (en) 2013-03-15 2015-01-06 Palantir Technologies Inc. Data integration tool
US8924388B2 (en) 2013-03-15 2014-12-30 Palantir Technologies Inc. Computer-implemented systems and methods for comparing and associating objects
US8903717B2 (en) * 2013-03-15 2014-12-02 Palantir Technologies Inc. Method and system for generating a parser and parsing complex data
US8886601B1 (en) 2013-06-20 2014-11-11 Palantir Technologies, Inc. System and method for incrementally replicating investigative analysis data
US9710360B2 (en) 2013-06-27 2017-07-18 Nxp Usa, Inc. Optimizing error parsing in an integrated development environment
US8601326B1 (en) 2013-07-05 2013-12-03 Palantir Technologies, Inc. Data quality monitors
US9223773B2 (en) 2013-08-08 2015-12-29 Palatir Technologies Inc. Template system for custom document generation
US8938686B1 (en) 2013-10-03 2015-01-20 Palantir Technologies Inc. Systems and methods for analyzing performance of an entity
US9710243B2 (en) * 2013-11-07 2017-07-18 Eagle Legacy Modernization, LLC Parser that uses a reflection technique to build a program semantic tree
US9105000B1 (en) 2013-12-10 2015-08-11 Palantir Technologies Inc. Aggregating data from a plurality of data sources
US10579647B1 (en) 2013-12-16 2020-03-03 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US9009827B1 (en) 2014-02-20 2015-04-14 Palantir Technologies Inc. Security sharing system
US8935201B1 (en) 2014-03-18 2015-01-13 Palantir Technologies Inc. Determining and extracting changed data from a data source
US9836580B2 (en) 2014-03-21 2017-12-05 Palantir Technologies Inc. Provider portal
US10572496B1 (en) 2014-07-03 2020-02-25 Palantir Technologies Inc. Distributed workflow system and database with access controls for city resiliency
US9229952B1 (en) 2014-11-05 2016-01-05 Palantir Technologies, Inc. History preserving data pipeline system and method
US9361075B2 (en) * 2014-11-12 2016-06-07 International Business Machines Corporation Contraction aware parsing system for domain-specific languages
US9483546B2 (en) 2014-12-15 2016-11-01 Palantir Technologies Inc. System and method for associating related records to common entities across multiple lists
US11302426B1 (en) 2015-01-02 2022-04-12 Palantir Technologies Inc. Unified data interface and system
US10803106B1 (en) 2015-02-24 2020-10-13 Palantir Technologies Inc. System with methodology for dynamic modular ontology
US9727560B2 (en) 2015-02-25 2017-08-08 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
US10103953B1 (en) 2015-05-12 2018-10-16 Palantir Technologies Inc. Methods and systems for analyzing entity performance
US10198465B2 (en) 2015-05-14 2019-02-05 Deephaven Data Labs Llc Computer data system current row position query language construct and array processing query language constructs
US10628834B1 (en) 2015-06-16 2020-04-21 Palantir Technologies Inc. Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces
US9418337B1 (en) 2015-07-21 2016-08-16 Palantir Technologies Inc. Systems and models for data analytics
US9392008B1 (en) 2015-07-23 2016-07-12 Palantir Technologies Inc. Systems and methods for identifying information related to payment card breaches
US9996595B2 (en) 2015-08-03 2018-06-12 Palantir Technologies, Inc. Providing full data provenance visualization for versioned datasets
US10127289B2 (en) 2015-08-19 2018-11-13 Palantir Technologies Inc. Systems and methods for automatic clustering and canonical designation of related data in various data structures
US10853378B1 (en) 2015-08-25 2020-12-01 Palantir Technologies Inc. Electronic note management via a connected entity graph
US9984428B2 (en) 2015-09-04 2018-05-29 Palantir Technologies Inc. Systems and methods for structuring data from unstructured electronic data files
US9576015B1 (en) 2015-09-09 2017-02-21 Palantir Technologies, Inc. Domain-specific language for dataset transformations
US9760556B1 (en) 2015-12-11 2017-09-12 Palantir Technologies Inc. Systems and methods for annotating and linking electronic documents
US9514414B1 (en) 2015-12-11 2016-12-06 Palantir Technologies Inc. Systems and methods for identifying and categorizing electronic documents through machine learning
US10248722B2 (en) 2016-02-22 2019-04-02 Palantir Technologies Inc. Multi-language support for dynamic ontology
US10698938B2 (en) 2016-03-18 2020-06-30 Palantir Technologies Inc. Systems and methods for organizing and identifying documents via hierarchies and dimensions of tags
CN107229616B (en) * 2016-03-25 2020-10-16 阿里巴巴集团控股有限公司 Language identification method, device and system
US10007674B2 (en) 2016-06-13 2018-06-26 Palantir Technologies Inc. Data revision control in large-scale data analytic systems
US11106692B1 (en) 2016-08-04 2021-08-31 Palantir Technologies Inc. Data record resolution and correlation system
US10133588B1 (en) 2016-10-20 2018-11-20 Palantir Technologies Inc. Transforming instructions for collaborative updates
US10102229B2 (en) 2016-11-09 2018-10-16 Palantir Technologies Inc. Validating data integrations using a secondary data store
US9946777B1 (en) 2016-12-19 2018-04-17 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US9922108B1 (en) 2017-01-05 2018-03-20 Palantir Technologies Inc. Systems and methods for facilitating data transformation
US11074277B1 (en) 2017-05-01 2021-07-27 Palantir Technologies Inc. Secure resolution of canonical entities
US10956406B2 (en) 2017-06-12 2021-03-23 Palantir Technologies Inc. Propagated deletion of database records and derived data
US10481881B2 (en) * 2017-06-22 2019-11-19 Archeo Futurus, Inc. Mapping a computer code to wires and gates
US9996328B1 (en) * 2017-06-22 2018-06-12 Archeo Futurus, Inc. Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code
US10691729B2 (en) 2017-07-07 2020-06-23 Palantir Technologies Inc. Systems and methods for providing an object platform for a relational database
US10198469B1 (en) 2017-08-24 2019-02-05 Deephaven Data Labs Llc Computer data system data source refreshing using an update propagation graph having a merged join listener
US10956508B2 (en) 2017-11-10 2021-03-23 Palantir Technologies Inc. Systems and methods for creating and managing a data integration workspace containing automatically updated data models
US10235533B1 (en) 2017-12-01 2019-03-19 Palantir Technologies Inc. Multi-user access controls in electronic simultaneously editable document editor
US11061874B1 (en) 2017-12-14 2021-07-13 Palantir Technologies Inc. Systems and methods for resolving entity data across various data structures
US10838987B1 (en) 2017-12-20 2020-11-17 Palantir Technologies Inc. Adaptive and transparent entity screening
US10754822B1 (en) 2018-04-18 2020-08-25 Palantir Technologies Inc. Systems and methods for ontology migration
US11461355B1 (en) 2018-05-15 2022-10-04 Palantir Technologies Inc. Ontological mapping of data
US11061542B1 (en) 2018-06-01 2021-07-13 Palantir Technologies Inc. Systems and methods for determining and displaying optimal associations of data items
US10795909B1 (en) 2018-06-14 2020-10-06 Palantir Technologies Inc. Minimized and collapsed resource dependency path
EP3633468B1 (en) * 2018-10-04 2021-12-22 Technische Universität München Distributed automated synthesis of correct-by-construction controllers
US10713016B1 (en) * 2020-05-04 2020-07-14 Loyalty Juggernaut, Inc Method of implementing rules on visual language using visual blocks
US11360748B2 (en) * 2020-05-04 2022-06-14 Loyalty Juggernaut, Inc. System and method of collectively tracking a behavior of a member across one or more dimensions
CN116820564B (en) * 2023-07-06 2024-04-02 四川大学 Unified form semanticalization method of program language
CN116974573B (en) * 2023-07-10 2024-09-20 中国人民解放军陆军工程大学 Compiling method for application program of fully distributed intelligent building system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001098942A2 (en) * 2000-06-19 2001-12-27 Lernout & Hauspie Speech Products N.V. Package driven parsing using structure function grammar
US6606625B1 (en) * 1999-06-03 2003-08-12 University Of Southern California Wrapper induction by hierarchical data analysis

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4686623A (en) * 1985-06-07 1987-08-11 International Business Machines Corporation Parser-based attribute analysis
JP4451435B2 (en) * 2006-12-06 2010-04-14 本田技研工業株式会社 Language understanding device, language understanding method, and computer program
US8027946B1 (en) * 2006-12-22 2011-09-27 Avaya Inc. Higher order logic applied to expert systems for alarm analysis, filtering, correlation and root cause
US8868479B2 (en) * 2007-09-28 2014-10-21 Telogis, Inc. Natural language parsers to normalize addresses for geocoding
US8863101B2 (en) * 2008-12-10 2014-10-14 International Business Machines Corporation Compiler generator

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6606625B1 (en) * 1999-06-03 2003-08-12 University Of Southern California Wrapper induction by hierarchical data analysis
WO2001098942A2 (en) * 2000-06-19 2001-12-27 Lernout & Hauspie Speech Products N.V. Package driven parsing using structure function grammar

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663094A (en) * 2014-07-11 2017-05-10 洛林·G·克雷默三世 Method and system for linear generalized LL recognition and context-aware parsing
CN106663094B (en) * 2014-07-11 2020-03-27 洛林·G·克雷默三世 Method and system for linear generalized LL recognition and context-aware parsing
US10664655B2 (en) 2014-07-11 2020-05-26 Loring G. Craymer, III Method and system for linear generalized LL recognition and context-aware parsing

Also Published As

Publication number Publication date
EP2454661A1 (en) 2012-05-23
US20120191446A1 (en) 2012-07-26

Similar Documents

Publication Publication Date Title
EP2454661A1 (en) System and method for creating a parser generator and associated computer program
Ford Packet parsing: a practical linear-time algorithm with backtracking
US8843907B2 (en) Compiler with error handling
AU2012203071B2 (en) Computer-implemented method, system and computer program product for displaying a user interface component
US11481201B2 (en) Integrated development environment for developing and compiling query language schemas for application program interfaces
Omar et al. Safely composable type-specific languages
Sestoft Programming language concepts
Duregård et al. Embedded parser generators
Blaudeau et al. A verified packrat parser interpreter for parsing expression grammars
Warth et al. Modular semantic actions
Laurent et al. Taming context-sensitive languages with principled stateful parsing
Jia et al. A derivative-based parser generator for visibly Pushdown grammars
Dinkelaker et al. Incremental concrete syntax for embedded languages with support for separate compilation
Fisher et al. The next 700 data description languages
Zaytsev Recovery, convergence and documentation of languages
AT&T
Stump et al. Strong functional pearl: Harper’s regular-expression matcher in Cedille
De Santo et al. A Coq Mechanization of JavaScript Regular Expression Semantics
Wu et al. Component-based LR parsing
Winter et al. Generative programming techniques for Java library migration
Rau et al. A verified Earley parser
M. Cardoso et al. Type-based Termination Analysis for Parsing Expression Grammars
Hermann et al. Solving the FIXML2Code-case Study with HenshinTGG.
Bodin Certified semantics and analysis of JavaScript
Arnoldus An illumination of the template enigma: software code generation with templates

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09780676

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2009780676

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 13384326

Country of ref document: US