[lexer] keyword protected quotation token for arbitrary text #9733

gares · 2019-03-10T21:11:22Z

This PR implements a feature I could use in elpi. Today the only token that can carry arbitrary text is STRING that suffers from escaping shortcomings. This PR implements a class of tokens that can carry arbitrary text, so that 3rd party languages can be easily embedded in .v files.

From the last commit:
One can now register a quotation using a grammar rule with
QUOTATION "name:". In that case name: becomes a keyword and the token is
generated for name: followed by a an identifier or a parenthesized
text. Eg

  constr:x
  string:[....]
  ltac:(....)
  ltac:{....}

The delimiter is made of 1 or more occurrences of the same parenthesis,
eg ((.....)) or [[[[....]]]]. The idea being that if the text happens to
contain the closing delimiter, one can make the delimiter longer and avoid
confusion (no escaping). Eg

string:[[ .. ']' .. ]]

Nesting the delimiter is allowed, eg ((..((...))..)) is OK.

The text inside the quotation is returned as a string (including the
parentheses), so that a third party parser can take care of it.

Keywords don't need to end in :.

ppedrot · 2019-03-11T08:14:36Z

Note that this is probably going to conflict with #8764, it might be worthwhile to factor out the common parts.

gares · 2019-03-11T08:32:49Z

I'm fine taking the conflicts, but #8764 seems to have stalled, I'll ping.

gares · 2019-03-11T08:33:18Z

Wrt the changes, they are quite orthogonal as far as I can tell.

proux01 · 2019-03-11T09:07:50Z

Wrt the changes, they are quite orthogonal as far as I can tell.

Indeed, but Pierre Marie is right in that some conflicts may arise (c.f. 4bc83c8).

gares · 2019-03-11T11:39:27Z

Sure. I was just offering to give you precedence. My change is smaller and it is hence easier to rebase. Of course this works for me only if you push your PR ahead, hence my ping.

gares · 2019-03-11T16:48:34Z

OK, I've tested it in elpi and it works for me. I've added the needs:documentation because I don't know where to put it (the commit message/PR header contains quite some doc already).

gares · 2019-03-11T16:51:59Z

@ppedrot BTW some of this code and/or the delimiting convention could be used in coqpp too. I had to write ... -> { ... '\x7b' ... } somewhere in elpi in a grammar extend, since '}' would confuse coqpp by terminating the ML block too early. The convention proposed here would have let me just write ... -> {{ ... '}' ... }}.

gares · 2019-03-15T11:22:44Z

ping @coq/parsing-maintainers (and @coq/doc-maintainers to tell me where I should put the doc of the QUOTATION token, especially the non-escaping rules).

Zimmi48 · 2019-03-15T12:26:13Z

This should probably go in the “Syntax extension” chapter (https://coq.inria.fr/refman/user-extensions/syntax-extensions.html) if it impacts the users directly. If it only concerns the plugin writers, then it should go in the non-existing plugin writing documentation chapter 😈. Until then, you can just create a new Markdown file in dev/doc.

Zimmi48 · 2019-03-15T12:34:57Z

The index in https://github.com/coq/coq/tree/master/dev#miscellaneous-information-about-the-code-devdoc should be updated if you create a new file in dev/doc.

gares · 2019-03-18T14:13:26Z

It does not concern users, unless they use an extension language that uses the feature.
So I'll write the comment in CLexer or Tok for now.

gares · 2019-03-18T14:19:43Z

ping @coq/parsing-maintainers for the review

gramlib/plexing.mli

ejgallego · 2019-03-21T01:02:28Z

parsing/cLexer.mli

 (** This should be functional but it is not due to the interface *)
-val add_keyword : string -> unit
+val add_keyword : ?quotation:starts_quotation -> string -> unit


Maybe it would be clearer to have two functions ? add_keyword and add_quotation ?

I thought about that, but in the current model a quotation is always initiated by a keyword. So the API would look like

val add_keyword : string -> unit val add_keyword_for_quotation : string -> unit

that is pretty much what we have in the patch...

ejgallego

I am not expert enough in this code, I let @ppedrot or @herbelin handle this.

I saw nothing wrong.

Zimmi48

Documentation is good, I only have a minor comment.

parsing/cLexer.mli

gares · 2019-03-21T17:26:56Z

This till lacks an assignee. @ppedrot can you take it?

gares · 2019-03-26T09:58:05Z

ping

gares · 2019-03-28T10:11:00Z

ping @ppedrot (time for 8.10 is running out)

ppedrot · 2019-03-30T19:35:32Z

@gares Can you rebase? I'll merge it after.

Tokens were having a double role: - the output of the lexer - the items of grammar entries, especially terminals Now tokens are the output of the lexer, and this paves the way for using a richer data type, eg including Loc.t Patterns, as in Plexing.pattern, only represent patterns (for tokens) and now have a bit more structure (eg the wildcard is represented as None, not as "", while a regular pattern for "x" as Some "x")

One can now register a quotation using a grammar rule with QUOTATION("name:"). "name:" becomes a keyword and the token is generated for name: followed by a an identifier or a parenthesized text. Eg constr:x string:[....] ltac:(....) ltac:{....} The delimiter is made of 1 or more occurrences of the same parenthesis, eg ((.....)) or [[[[....]]]]. The idea being that if the text happens to contain the closing delimiter, one can make the delimiter longer and avoid confusion (no escaping). Eg string:[[ .. ']' .. ]] Nesting the delimiter is allowed, eg ((..((...))..)) is OK. The text inside the quotation is returned as a string (including the parentheses), so that a third party parser can take care of it. Keywords don't need to end in ':'.

gares · 2019-03-31T17:21:45Z

I did rebase the overlay, the error seems from another pr

Zimmi48 · 2019-03-31T18:11:33Z

The error seems very related to this PR to me:

File "./src/Rewriter.v", line 1347, characters -3739--3739:
Warning: Not interpreting "*)" as the end of current non-terminated comment
because it occurs in a non-terminated string of the comment.
[comment-terminator-in-string,parsing]
File "./src/Rewriter.v", line 3585, characters -1686-0:
Error: Syntax Error: Lexer: Unterminated string

gares · 2019-03-31T19:17:38Z

Ah ok, the Ltac2 one is fixed now.
The one of Fiat seems indeed related, I look into it tomorrow. It is still weird that it passed CI a week ago...

proux01 · 2019-03-31T20:10:29Z

Even weirder, I just rebased #9815 over this PR and CI passes (with just a simple overlay for ltac2).

gares · 2019-03-31T20:24:16Z

Restarted, and now it is green... I guess they pushed a bad commit

…ry text Ack-by: Zimmi48 Ack-by: ejgallego Ack-by: gares Reviewed-by: ppedrot

gares changed the title ~~Quotations~~ [lexer] keywrod protected quotation token for arbitrary text Mar 10, 2019

JasonGross changed the title ~~[lexer] keywrod protected quotation token for arbitrary text~~ [lexer] keyword protected quotation token for arbitrary text Mar 10, 2019

gares force-pushed the quotations branch from f902a49 to 6ee4c60 Compare March 11, 2019 16:46

gares marked this pull request as ready for review March 11, 2019 16:47

gares requested review from ejgallego, mattam82 and a team as code owners March 11, 2019 16:47

gares added the needs: documentation Documentation was not added or updated. label Mar 11, 2019

gares added this to the 8.10+beta1 milestone Mar 11, 2019

gares force-pushed the quotations branch from 6ee4c60 to 05f1e93 Compare March 13, 2019 10:10

gares force-pushed the quotations branch from 05f1e93 to 7b50952 Compare March 18, 2019 14:19

gares removed the needs: documentation Documentation was not added or updated. label Mar 18, 2019

gares force-pushed the quotations branch from 7b50952 to b00a2f4 Compare March 20, 2019 08:01

gares added kind: enhancement Enhancement to an existing user-facing feature, tactic, etc. needs: review labels Mar 20, 2019

ejgallego reviewed Mar 21, 2019

View reviewed changes

gramlib/plexing.mli Show resolved Hide resolved

ejgallego reviewed Mar 21, 2019

View reviewed changes

ejgallego requested review from herbelin and ppedrot March 21, 2019 01:05

gares force-pushed the quotations branch from b00a2f4 to 36d10d2 Compare March 21, 2019 12:04

Zimmi48 reviewed Mar 21, 2019

View reviewed changes

parsing/cLexer.mli Outdated Show resolved Hide resolved

gares force-pushed the quotations branch from 36d10d2 to 3a0acaa Compare March 21, 2019 17:25

ppedrot self-assigned this Mar 21, 2019

proux01 mentioned this pull request Mar 22, 2019

Multiple payload types in tokens #9815

Merged

gares mentioned this pull request Mar 25, 2019

[parser] initialization based on Loc.t rather than Loc.source #9830

Merged

ppedrot approved these changes Mar 30, 2019

View reviewed changes

gares added 5 commits March 31, 2019 14:33

[dune] typo

ed99643

overlay for ltac2

55e89d7

documentation

f832476

gares force-pushed the quotations branch from 3a0acaa to f832476 Compare March 31, 2019 12:37

ppedrot removed the needs: review label Mar 31, 2019

ppedrot merged commit f832476 into rocq-prover:master Mar 31, 2019

ppedrot added a commit that referenced this pull request Mar 31, 2019

Merge PR #9733: [lexer] keyword protected quotation token for arbitra…

5dd3c18

…ry text Ack-by: Zimmi48 Ack-by: ejgallego Ack-by: gares Reviewed-by: ppedrot

gares mentioned this pull request Jun 17, 2019

[ide] chop sentences taking into account QUOTATION token #10394

Merged

pi8027 mentioned this pull request Aug 8, 2019

Coq PG doesn't recognize the new quotation mechanism of coq/coq#9733 ProofGeneral/PG#437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lexer] keyword protected quotation token for arbitrary text #9733

[lexer] keyword protected quotation token for arbitrary text #9733

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[lexer] keyword protected quotation token for arbitrary text #9733

[lexer] keyword protected quotation token for arbitrary text #9733

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!