8000 [lexer] keyword protected quotation token for arbitrary text by gares · Pull Request #9733 · rocq-prover/rocq · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[lexer] keyword protected quotation token for arbitrary text #9733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 31, 2019

Conversation

gares
Copy link
Member
@gares gares commented Mar 10, 2019

This PR implements a feature I could use in elpi. Today the only token that can carry arbitrary text is STRING that suffers from escaping shortcomings. This PR implements a class of tokens that can carry arbitrary text, so that 3rd party languages can be easily embedded in .v files.

From the last commit:
One can now register a quotation using a grammar rule with
QUOTATION "name:". In that case name: becomes a keyword and the token is
generated for name: followed by a an identifier or a parenthesized
text. Eg

  constr:x
  string:[....]
  ltac:(....)
  ltac:{....}

The delimiter is made of 1 or more occurrences of the same parenthesis,
eg ((.....)) or [[[[....]]]]. The idea being that if the text happens to
contain the closing delimiter, one can make the delimiter longer and avoid
confusion (no escaping). Eg

string:[[ .. ']' .. ]]

Nesting the delimiter is allowed, eg ((..((...))..)) is OK.

The text inside the quotation is returned as a string (including the
parentheses), so that a third party parser can take care of it.

Keywords don't need to end in :.

@gares gares changed the title Quotations [lexer] keywrod protected quotation token for arbitrary text Mar 10, 2019
@JasonGross JasonGross changed the title [lexer] keywrod protected quotation token for arbitrary text [lexer] keyword protected quotation token for arbitrary text Mar 10, 2019
@ppedrot
Copy link
Member
ppedrot commented Mar 11, 2019

Note that this is probably going to conflict with #8764, it might be worthwhile to factor out the common parts.

@gares
Copy link
Member Author
gares commented Mar 11, 2019

I'm fine taking the conflicts, but #8764 seems to have stalled, I'll ping.

@gares
Copy link
Member Author
gares commented Mar 11, 2019

Wrt the changes, they are quite orthogonal as far as I can tell.

@proux01
Copy link
Contributor
proux01 commented Mar 11, 2019 8000

Wrt the changes, they are quite orthogonal as far as I can tell.

Indeed, but Pierre Marie is right in that some conflicts may arise (c.f. 4bc83c8).

@gares
Copy link
Member Author
gares commented Mar 11, 2019

Sure. I was just offering to give you precedence. My change is smaller and it is hence easier to rebase. Of course this works for me only if you push your PR ahead, hence my ping.

@gares gares marked this pull request as ready for review March 11, 2019 16:47
@gares gares requested review from ejgallego, mattam82 and a team as code owners March 11, 2019 16:47
@gares gares added the needs: documentation Documentation was not added or updated. label Mar 11, 2019
@gares gares added this to the 8.10+beta1 milestone Mar 11, 2019
@gares
Copy link
Member Author
gares commented Mar 11, 2019

OK, I've tested it in elpi and it works for me. I've added the needs:documentation because I don't know where to put it (the commit message/PR header contains quite some doc already).

@gares
Copy link
Member Author
gares commented Mar 11, 2019

@ppedrot BTW some of this code and/or the delimiting convention could be used in coqpp too. I had to write ... -> { ... '\x7b' ... } somewhere in elpi in a grammar extend, since '}' would confuse coqpp by terminating the ML block too early. The convention proposed here would have let me just write ... -> {{ ... '}' ... }}.

@gares
Copy link
Member Author
gares commented Mar 15, 2019

ping @coq/parsing-maintainers (and @coq/doc-maintainers to tell me where I should put the doc of the QUOTATION token, especially the non-escaping rules).

@Zimmi48
Copy link
Member
Zimmi48 commented Mar 15, 2019

This should probably go in the “Syntax extension” chapter (https://coq.inria.fr/refman/user-extensions/syntax-extensions.html) if it impacts the users directly. If it only concerns the plugin writers, then it should go in the non-existing plugin writing documentation chapter 😈. Until then, you can just create a new Markdown file in dev/doc.

@Zimmi48
Copy link
Member
Zimmi48 commented Mar 15, 2019

The index in https://github.com/coq/coq/tree/master/dev#miscellaneous-information-about-the-code-devdoc should be updated if you create a new file in dev/doc.

@gares
Copy link
Member Author
gares commented Mar 18, 2019

It does not concern users, unless they use an extension language that uses the feature.
So I'll write the comment in CLexer or Tok for now.

@gares gares removed the needs: documentation Documentation was not added or updated. label Mar 18, 2019
@gares
Copy link
Member Author
gares commented Mar 18, 2019

ping @coq/parsing-maintainers for the review

@gares gares added kind: enhancement Enhancement to an existing user-facing feature, tactic, etc. needs: review labels Mar 20, 2019
(** This should be functional but it is not due to the interface *)
val add_keyword : string -> unit
val add_keyword : ?quotation:starts_quotation -> string -> unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be clearer to have two functions ? add_keyword and add_quotation ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but in the current model a quotation is always initiated by a keyword. So the API would look like

val add_keyword : string -> unit
val add_keyword_for_quotation : string -> unit

that is pretty much what we have in the patch...

Copy link
Member
@ejgallego ejgallego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not expert enough in this code, I let @ppedrot or @herbelin handle this.

I saw nothing wrong.

Copy link
Member
@Zimmi48 Zimmi48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is good, I only have a minor comment.

@gares
Copy link
Member Author
gares commented Mar 21, 2019

This till lacks an assignee. @ppedrot can you take it?

@gares
Copy link
Member Author
gares commented Mar 26, 2019

ping

@gares
Copy link
Member Author
gares commented Mar 28, 2019

ping @ppedrot (time for 8.10 is running out)

@ppedrot
Copy link
Member
ppedrot commented Mar 30, 2019

@gares Can you rebase? I'll merge it after.

gares added 5 commits March 31, 2019 14:33
Tokens were having a double role:
- the output of the lexer
- the items of grammar entries, especially terminals

Now tokens are the output of the lexer, and this paves the way for
using a richer data type, eg including Loc.t

Patterns, as in Plexing.pattern, only represent patterns (for tokens)
and now have a bit more structure (eg the wildcard is represented
as None, not as "", while a regular pattern for "x" as Some "x")
One can now register a quotation using a grammar rule with
QUOTATION("name:"). "name:" becomes a keyword and the token is
generated for name: followed by a an identifier or a parenthesized
text. Eg

  constr:x
  string:[....]
  ltac:(....)
  ltac:{....}

The delimiter is made of 1 or more occurrences of the same parenthesis,
eg ((.....)) or [[[[....]]]]. The idea being that if the text happens to
contain the closing delimiter, one can make the delimiter longer and avoid
confusion (no escaping). Eg

  string:[[ .. ']' .. ]]


Nesting the delimiter is allowed, eg ((..((...))..)) is OK.

The text inside the quotation is returned as a string (including the
parentheses), so that a third party parser can take care of it.

Keywords don't need to end in ':'.
@gares
Copy link
Member Author
gares commented Mar 31, 2019

I did rebase the overlay, the error seems from another pr

@Zimmi48
Copy link
Member
Zimmi48 commented Mar 31, 2019

The error seems very related to this PR to me:

File "./src/Rewriter.v", line 1347, characters -3739--3739:
Warning: Not interpreting "*)" as the end of current non-terminated comment
because it occurs in a non-terminated string of the comment.
[comment-terminator-in-string,parsing]
File "./src/Rewriter.v", line 3585, characters -1686-0:
Error: Syntax Error: Lexer: Unterminated string

@gares
Copy link
Member Author
gares commented Mar 31, 2019

Ah ok, the Ltac2 one is fixed now.
The one of Fiat seems indeed related, I look into it tomorrow. It is still weird that it passed CI a week ago...

@proux01
Copy link
Contributor
proux01 commented Mar 31, 2019

Even weirder, I just rebased #9815 over this PR and CI passes (with just a simple overlay for ltac2).

@gares
Copy link
Member Author
gares commented Mar 31, 2019

Restarted, and now it is green... I guess they pushed a bad commit

@ppedrot ppedrot merged commit f832476 into rocq-prover:master Mar 31, 2019
ppedrot added a commit that referenced this pull request Mar 31, 2019
…ry text

Ack-by: Zimmi48
Ack-by: ejgallego
Ack-by: gares
Reviewed-by: ppedrot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Enhancement to an existing user-facing feature, tactic, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0